ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

聚类kmeans案例

2021-09-21 21:29:51  阅读:166  来源: 互联网

标签:... 00 01 kmeans 案例 pd 聚类 csv data


注:本案例为黑马的课堂案例,上传仅为方便查看

# 1.获取数据
# 2.数据基本处理
# 2.1 合并表格
# 2.2 交叉表合并
# 2.3 数据截取
# 3.特征工程 — pca
# 4.机器学习(k-means)
# 5.模型评估
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
# 1.获取数据
order_product = pd.read_csv("./data/instacart/order_products__prior.csv")
products = pd.read_csv("./data/instacart/products.csv")
orders = pd.read_csv("./data/instacart/orders.csv")
aisles = pd.read_csv("./data/instacart/aisles.csv")
# 2.数据基本处理
# 2.1 合并表格
table1 = pd.merge(order_product, products, on=["product_id", "product_id"])
table2 = pd.merge(table1, orders, on=["order_id", "order_id"])
table = pd.merge(table2, aisles, on=["aisle_id", "aisle_id"])
table.shape
(32434489, 14)
table.head()
order_idproduct_idadd_to_cart_orderreorderedproduct_nameaisle_iddepartment_iduser_ideval_setorder_numberorder_doworder_hour_of_daydays_since_prior_orderaisle
023312011Organic Egg Whites8616202279prior3598.0eggs
1263312050Organic Egg Whites8616153404prior20167.0eggs
212033120130Organic Egg Whites861623750prior116810.0eggs
33273312051Organic Egg Whites861658707prior21698.0eggs
439033120281Organic Egg Whites8616166654prior480129.0eggs
# 2.2 交叉表合并
data = pd.crosstab(table["user_id"], table["aisle"])
data.head()
aisleair fresheners candlesasian foodsbaby accessoriesbaby bath body carebaby food formulabakery dessertsbaking ingredientsbaking supplies decorbeautybeers coolers...spreadsteatofu meat alternativestortillas flat breadtrail mix snack mixtrash bags linersvitamins supplementswater seltzer sparkling waterwhite winesyogurt
user_id
10000000000...1000000001
20300002000...31100002042
30000000000...4100000200
40000000000...0001000100
50200000000...0000000003

5 rows × 134 columns

data.shape
(206209, 134)
# 2.3 数据截取
new_data = data[:1000]
new_data.shape
(1000, 134)
# 3.特征工程 — pca
transfer = PCA(n_components=0.9)
trans_data = transfer.fit_transform(new_data)
trans_data.shape
(1000, 22)
trans_data
array([[-2.27452872e+01, -7.32942365e-01, -2.48945893e+00, ...,
        -4.78491473e+00, -3.10742945e+00, -2.45192316e+00],
       [ 5.28638801e+00, -3.00176267e+01, -1.11226906e+00, ...,
         9.24145693e+00, -3.11309382e+00,  2.20144174e+00],
       [-6.52593099e+00, -3.87333123e+00, -9.23859508e+00, ...,
        -1.33929081e+00,  1.25062993e+00,  6.12717485e-01],
       ...,
       [ 1.31226615e+01, -2.77296885e+01, -4.62403246e+00, ...,
         7.40793534e+00,  1.03829352e+00, -1.39058393e+01],
       [ 1.64905900e+02, -8.54916188e+01,  1.90577481e-02, ...,
        -5.62014943e+00, -1.38488891e+01, -7.11424774e+00],
       [-1.60244724e+00,  1.82037661e+00,  8.55756408e+00, ...,
         3.69860152e+00,  2.82248188e+00, -3.79491023e+00]])
# 4.机器学习(k-means)
estimator = KMeans(n_clusters=5)
y_pre = estimator.fit_predict(trans_data)
# 5.模型评估
silhouette_score(trans_data, y_pre)

0.4472179873751538

标签:...,00,01,kmeans,案例,pd,聚类,csv,data
来源: https://blog.csdn.net/qq_44268986/article/details/120405706

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有