引用本文:李武,赵娇燕,严太山.基于平均差异度优选初始聚类中心的改进K-均值聚类算法[J].控制与决策,2017,32(4):759-762
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】 附件
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 375次   下载 485 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于平均差异度优选初始聚类中心的改进K-均值聚类算法
李武,赵娇燕,严太山
(湖南理工学院信息与通信工程学院,湖南岳阳414006)
摘要:
针对K-均值聚类算法对初始聚类中心存在依赖性的缺陷,提出一种基于数据空间分布选取初始聚类中心的改进算法.该算法首先定义样本距离、样本平均差异度和样本集总体平均差异度;然后将每个样本按平均差异度排序,选择平均差异度较大且与已选聚类中心的差异度大于样本集总体平均差异度的样本作为初始聚类中心.实验表明,改进后的算法不仅提高了聚类结果的稳定性和正确率,而且迭代次数明显减少,收敛速度快.
关键词:  K-均值聚类  初始聚类中心  样本差异度
DOI:10.13195/j.kzyjc.2016.0274
分类号:N945
基金项目:国家自然科学基金项目(61473118);湖南省自然科学基金项目(2015JJ2074);湖南省高校创新平台开放基金项目(13K102);湖南省科技计划项目(2016TP1021).
Improved K-means clustering algorithm optimizing initial clustering centers based on average difference degree
LI Wu,ZHAO Jiao-yan,YAN Tai-shan
(College of Information and Communication Engineering,Hu'nan Institute of Science and Technology,Yueyang 414006,China)
Abstract:
Aiming at the dependence on initial clustering centers of the K-means clustering algorithm, an improved algorithm is proposed. In the improved K-means algorithm, the initial clustering centers are selected according to the distribution of data spatial. The distance between two samples, the average difference of each sample, and total average difference of sample set are defined. Then the average difference of each sample is sorted. The sample with larger average difference is selected as the initial clustering center if its difference from the selected cluster is larger than average difference. Experimental results show that the stability and accuracy of the clustering results are increased by using the improved algorithm, and the convergence speed is also accelerated.
Key words:  K-means clustering  initial clustering center  sample difference

用微信扫一扫

用微信扫一扫