一种基于GMM-EM的非平衡数据的概率增强算法
CSTR:
作者:
作者单位:

(大连海事大学理学院,辽宁大连116026)

作者简介:

通讯作者:

E-mail: chengang@dlmu.edu.cn.

中图分类号:

TP273

基金项目:

国家自然科学基金项目(11571056).


An enhancing probability algorithm for imbalanced datasets based on GMM-EM
Author:
Affiliation:

(School of Science,Dalian Maritime University,Dalian116026,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    非平衡数据的分类问题是机器学习领域的一个重要研究课题.在一个非平衡数据里,少数类的训练样本明显少于多数类,导致分类结果往往偏向多数类.针对非平衡数据分类问题,提出一种基于高斯混合模型-均值最大化方法(GMM-EM)的概率增强算法.首先,通过高斯混合模型(GMM)与均值最大化算法(EM)建立少数类数据的概率密度函数;其次,根据高概率密度的样本生成新样本的能力比低概率密度的样本更强的性质,建立一种基于少数类样本密度函数的过采样算法,该算法保证少数类数据集在平衡前后的概率分布的一致性,从数据集的统计性质使少数类达到平衡;最后,使用决策树分类器对已经达到平衡的数据集进行分类,并且利用评价指标对分类效果进行评判.通过从UCI和KEEL数据库选出的8组数据集的分类实验,表明了所提出算法比现有算法更有效.

    Abstract:

    The classification of imbalanced datasets has been recognized as a vital issue in the field of machine learning. In an imbalanced dataset, there are obviously fewer training examples of the minority class compared to the majority class so that the result of classification may be biased towards the latter. As a result, the classification performance of whole dataset has a tendency to be poor. Facing on the problem, an enhanced probability algorithm based on the Gaussian mixture model-expectation maximization(GMM-EM) method is proposed for imbalanced datasets. Firstly, the probability density functions(PDFS) of the minority class are obtained by using GMM and EM algorithms. Secondly, because original samples with high probability density have more powerful ability to generate new instances than low probability density samples according to the basic rule of probability theory, an enhanced probability algorithm is given based on PDF of the minority class. The algorithm ensures that the PDFs of the new balanced minority class are in accordance with the original minority class, and makes the minority class balanced in the sense of statistics. Finally, the proposed algorithm and other methods are applied together with a decision tree classifier for assessment. By choosing eight imbalanced datasets from UCI and KEEL repositories, experimental results show that the proposed algorithm is more effective than other methods.

    参考文献
    相似文献
    引证文献
引用本文

陈刚,吴振家.一种基于GMM-EM的非平衡数据的概率增强算法[J].控制与决策,2020,35(3):763-768

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-02-22
  • 出版日期:
文章二维码