不平衡数据分类方法综述
CSTR:
作者:
作者单位:

( 重庆大学自动化学院,重庆400044)

作者简介:

李艳霞(1991-), 女, 博士生, 从事稀疏表示、数据分类的研究;尹宏鹏(1981-), 男, 教授, 博士, 从事模式识别与智能系统等研究.

通讯作者:

E-mail: yinhongpeng@gmail.com.

中图分类号:

TP13

基金项目:

国家自然科学基金项目(61633005,61773080);重庆大学科研后备拔尖人才计划项目(cqu2018CDHB1B04).


Review of imbalanced data classification methods
Author:
Affiliation:

( College of Automation,Chongqing University,Chongqing400044,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着信息技术的快速发展,各领域的数据正以前所未有的速度产生并被广泛收集和存储,如何实现数据的智能化处理从而利用数据中蕴含的有价值信息已成为理论和应用的研究热点.数据分类作为一种基础的数据处理方法,已广泛应用于数据的智能化处理.传统分类方法通常假设数据类别分布均衡且错分代价相等,然而,现实中的数据通常具有不平衡特性,即某一类的样本数量要小于其他类的样本数量,且少数类具有更高错分代价.当利用传统的分类算法处理不平衡数据时,由于多数类和少数类在数量上的倾斜,以总体分类精度最大为目标会使得分类模型偏向于多数类而忽略少数类,造成少数类的分类精度较低.如何针对不平衡数据分类问题设计分类算法,同时保证不平衡数据中多数类与少数类的分类精度,已成为机器学习领域的研究热点,并相继出现了一系列优秀的不平衡数据分类方法.鉴于此,对现有的不平衡数据分类方法给出较为全面的梳理,从数据预处理层面、特征层面和分类算法层面总结和比较现有的不平衡数据分类方法,并结合当下机器学习的研究热点,探讨不平衡数据分类方法存在的挑战.最后展望不平衡数据分类未来的研究方向.

    Abstract:

    With the rapid development of information technology, there are so much amount of data produced and collected in different domains. How to efficiently discover knowledge from these data has been a research focus. Classification algorithms, as a basic of data processing methods, are wildly used in data intelligent processing area. Traditional classification algorithms generally assume that the training sets are well-balanced with equal misclassification cost. However, data are usually class-imbalanced in real-world domains, which means one or some of the classes have less number of examples than others. Moreover, the minority class implies heavy cost when it is not well classified. The standard classification algorithms guided by global classification accuracy are often biased towards the majority class due to the imbalanced classification distribution. Therefore, it is required to enhance the accuracy of both the minority classes and the majority classes. The imbalanced data classification problem has received much attention from the machine learning community, and various approaches have been proposed to deal with the problem. This paper reviews the state-of-the-art imbalanced data classification methods in recent years, and analyzes and compares them comprehensively in accordance with essential difference from the data-preprocessing-level, feature-level and algorithm-level respectively. Then, considering the research focuses of machine learning field, the challenge of imbalanced data processing is discussed, followed with the prospects for future work.

    参考文献
    相似文献
    引证文献
引用本文

李艳霞,柴毅,胡友强,等.不平衡数据分类方法综述[J].控制与决策,2019,34(4):673-688

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2019-03-21
  • 出版日期:
文章二维码