一种混合CGAN与SMOTEENN的不平衡数据处理方法
DOI:
作者:
作者单位:

昆明理工大学

作者简介:

通讯作者:

中图分类号:

TP181

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


An Imbalanced Data Processing Method Based on Hybrid CGAN and SMOTEENN
Author:
Affiliation:

Kunming University of Science and Technology

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    CGAN能够从数据中学习到其分布特性,被引入到不平衡数据处理中对少数类样本进行过采样,可以生成符合原始数据分布的新样本,因此比传统的重采样方法具有更好的处理效果.然而,CGAN对数据分布特性的学习易受限于样本规模,在少数类样本规模较小时不能充分学习其分布特性,难以保证生成样本的质量.针对这一问题,本文提出了一种将CGAN和SMOTEENN相结合的不平衡数据平衡化处理方法.首先,从既有的少数类样本出发,采用SMOTEENN方法生成一定规模的少数类样本,然后,在此基础上训练CGAN模型,保证其能生成符合原始少数类样本分布特征的新样本,最后,再利用CGAN重新生成符合原始少数类样本分布的新样本构建平衡数据集.为验证所提方法的有效性,基于公开的不平衡数据集开展对比实验研究.实验结果表明,相对几种经典的不平衡数据处理方法和近期文献报道的方法,所提方法在几项不平衡数据分类评价指标上表现出明显优势.

    Abstract:

    CGAN can learn its distribution characteristics from the data,and is introduced into the imbalanced data processing to oversample the minority class samples,whic can generate new samples that conform to the original data distribution,so it has a better processing effect than the traditional resampling method.However,the learning of data distribution characteristics by CGAN is easily limited by the sample size.When the sample size of the minority class is small,its distribution characteristics cannot be fully learned,and it is difficult to ensure the quality of the generated samples.To solve this problem,this paper proposes an Unbalanced data balance processing method combined with CGAN and SMOTEENN.First,starting from the existing minority class samples,the SMOTEENN method is used to generate a certain scale of minority class samples,and then the CGAN model is trained on this basis to ensure that it can generate consistent the new samples with the distribution characteristics of the original minority class samples,finally,use CGAN to regenerate new samples that conform to the original minority class sample distribution to construct a balanced dataset.Research. The experimental results show that,compared with several classical imbalanced data processing methods and methods reported in recent literature,the proposed method has obvious advantages in several imbalanced data classification evaluation indicators.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-10-15
  • 最后修改日期:2022-04-11
  • 录用日期:2022-04-15
  • 在线发布日期:
  • 出版日期: