面向密度分布不均数据的相对密度和多簇合并的密度峰值聚类算法
作者:
作者单位:

南昌工程学院

作者简介:

通讯作者:

中图分类号:

TP301.6

基金项目:

国家自然科学基金资助项目(62069014,62066030)江西省杰出青年基金资助项目(2018ACB21029)


Density Peaks Clustering based on Relative Density and Multi Cluster Merging for Uneven Density Datasets
Author:
Affiliation:

Nanchang Institute of Technology

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    密度峰值聚类(DPC)算法是一种新颖的基于密度的聚类算法,其原理简单,运行效率高。但DPC算法的局部密度只考虑了样本之间的距离,忽略了样本所处的环境,导致算法对密度分布不均数据的聚类效果不理想;同时,样本分配过程易产生错误连带效应。针对上述问题,本文提出了一种面向密度分布不均数据的相对密度和多簇合并的密度峰值聚类算法(DPC-RD-MCM)。DPC-RD-MCM算法结合K近邻和相对密度思想,定义了相对K近邻的局部密度,以降低类簇疏密程度对类簇中心的影响,避免稀疏区域没有类簇中心;重新定义了微簇间相似性度量准则,通过多簇合并策略得到最终聚类结果,避免了分配错误连带效应。在密度分布不均数据集、复杂形态数据集和UCI真实数据集上,将DPC-RD-MCM算法与DPC及其改进算法进行了对比,实验结果表明:DPC-RD-MCM算法能够在密度分布不均数据上获得十分优异的聚类效果,在复杂形态数据集和UCI真实数据集的聚类性能高于对比算法。

    Abstract:

    Density peaks clustering (DPC) algorithm is a novel clustering algorithm based on density, which has simple principle and high efficiency. However, the definition of local density of samples in DPC algorithm only considers the distance between samples and ignores the environment of samples, which leads to the unsatisfactory clustering effect of the algorithm for data with uneven density distribution; At the same time, the process of sample allocation is prone to produce error associated effect. To solve the above problems, this paper proposes a density peaks clustering algorithm based on relative density and multi cluster merging for uneven density datasets(DPC-RD-MCM). DPC-RD-MCM algorithm defines the local density of relative K-nearest neighbor based on the idea of K-nearest neighbor and relative density, so as to reduce the influence of cluster density on the selection of cluster centers and avoid the absence of cluster centers in sparse regions; The similarity measure between micro clusters is redefined, and the final clustering result is obtained by multi cluster merging strategy, which avoids the joint effect of allocation errors. DPC-RD-MCM algorithm is compared with DPC and its improved algorithm on uneven density datasets, complex morphological datasets and UCI real datasets. The experimental results show that DPC-RD-MCM algorithm can achieve excellent clustering effect on uneven density datasets, and the clustering performance of complex morphological datasets and UCI real datasets is higher than other comparison algorithms.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-07-22
  • 最后修改日期:2022-02-13
  • 录用日期:2022-02-25
  • 在线发布日期: 2022-03-09
  • 出版日期: