标签:Clustering clustering means -- THU object cluster medoid 聚类
What can we do with unlabeled data?
- Data clustering
- Partition examples into groups when no pre-defined categories/classes are available
- Dimensionality reduction
- Reduce the number of variables under consideration
- Outlier detection
- Identification of new or unknown data or signal that a machine learning system is not aware of during training
- Modeling the data density
what is clustering
- “Birds of a feather flock together. ”
- small intra-cluster distance
- large inter-cluster distance
- Soft clustering vs. hard clustering
- Soft: same object can belong to different clusters
- Hard: same object can only belong to single cluster
Hierarchical clustering
Agglomerative (层次凝聚式聚类)
凝聚式层次聚类算法:
cluster similarity:
Divisive (层次划分式聚类)
discussion on hierarchical clustering
K-means
步骤:
(step 1)
(step2)
K-means 一定能收敛,但不一定是最优解
How can we decide K?
discussion on K-means:
K-medoid clustering
与k-means不同的是,k-中值clustering的"中心点"必须是一个真实存在的点,而不能是一个虚拟的"中心点"。这个真实存在的点应该是该聚类里到其他点距离之和最小的那个点。
The basic strategy:
- first arbitrarily find a representative object (medoid) for each cluster
- Iteration:
- Each remaining object is clustered with the medoid to which it is the most similar
- Replaces one of the medoids by one of the non-medoids as long as the quality of the resulting clustering is improved (The quality of the cluster is estimated by a cost function: the average dissimilarity(object, the medoid))
标签:Clustering,clustering,means,--,THU,object,cluster,medoid,聚类 来源: https://blog.csdn.net/weixin_41332009/article/details/112480389
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。