一种混合CGAN与SMOTEENN的不平衡数据处理方法
CSTR:
作者:
作者单位:

昆明理工大学 机电工程学院,昆明 650500

作者简介:

通讯作者:

E-mail: zhubo20110720@163.com.

中图分类号:

TP181

基金项目:

国家自然科学基金项目(52065033).


An imbalanced data processing method based on hybrid CGAN and SMOTEENN
Author:
Affiliation:

Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650500,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    CGAN能够从数据中学习其分布特性,被引入不平衡数据处理中对少数类样本进行过采样,可以生成符合原始数据分布的新样本,因此比传统的重采样方法具有更好的处理效果.然而,CGAN对数据分布特性的学习易受限于样本规模,在少数类样本规模较小时不能充分学习其分布特性,难以保证生成样本的质量.针对这一问题,提出一种将CGAN与SMOTEENN相结合的不平衡数据平衡化处理方法.首先,从既有的少数类样本出发,采用SMOTEENN方法生成一定规模的少数类样本;然后,在此基础上训练CGAN模型,保证其能够生成符合原始少数类样本分布特征的新样本;最后,再利用CGAN重新生成符合原始少数类样本分布的新样本构建平衡数据集.为验证所提出方法的有效性,基于公开的不平衡数据集开展对比实验研究.实验结果表明,相对几种经典的不平衡数据处理方法与近期文献报道的方法,所提出方法在几项不平衡数据分类评价指标上表现出明显的优势.

    Abstract:

    Conditional generative adversarial networks(CGAN) can learn its distribution characteristics from the data, and is introduced into the imbalanced data processing to oversample the minority class samples, which can generate new samples that conform to the original data distribution, so it has a better processing effect than traditional resampling methods. However, the learning of data distribution characteristics by a CGAN is easily limited by the sample size. When the sample size of the minority class is small, its distribution characteristics cannot be fully learned, and it is difficult to ensure the quality of the generated samples. To solve this problem, this paper proposes an unbalanced data balance processing method combined with the CGAN and the synthetic minority over-sampling technique edited nearest neighbor(SMOTEENN). Firstly, starting from the existing minority class samples, the SMOTEENN method is used to generate a certain scale of minority class samples, and then the CGAN model is trained on this basis to ensure that it can generate consistent the new samples with the distribution characteristics of the original minority class samples. Finally, the CGAN is used to regenerate new samples that conform to the original minority class sample distribution to construct a balanced dataset. The experimental results show that, compared with several classical imbalanced data processing methods and methods reported in recent literature, the proposed method has obvious advantages in several imbalanced data classification evaluation indicators.

    参考文献
    相似文献
    引证文献
引用本文

刘宁,朱波,阴艳超,等.一种混合CGAN与SMOTEENN的不平衡数据处理方法[J].控制与决策,2023,38(9):2614-2621

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-09-04
  • 出版日期: 2023-09-20
文章二维码