基于变分贝叶斯高斯混合模型的自适应不均衡数据综合采样法
DOI:
作者:
作者单位:

湖南师范大学

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Adaptive hybrid sampling for imbalanced data processing based on variation Bayesian-optimized Gaussian mixture model
Author:
Affiliation:

Hunan Normal University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    实际的分类数据往往是分布不均衡的.传统的分类器往往会倾向多数类而忽略少数类,导致分类性能恶化.提出一种基于变分贝叶斯推断最优高斯混合模型(Varition Bayesian-optimized Optimal Gaussian Mixture Model, VBoGMM)的自适应不均衡数据综合采样法. VBoGMM可自动衰减到真实的高斯成分数,实现任意数据的最优分布估计;进而基于所获得的分布特性对少数类样本进行自适应综合过采样,并采用Tomek-link对准则对采样数据进行清洗以获得相对均衡的数据集用于后续的分类模型学习.在多个公共不均衡数据集上进行了大量的验证性和对比实验,结果表明:所提方法能在实现样本均衡化的同时,维持多数类与少数类样本空间分布特性,因而能有效提升传统分类模型在不均衡数据集上的分类性能.

    Abstract:

    In actual pattern classification tasks, the processing data is generally imbalanced.Traditional pattern classification models tend to learn towards the majority class and ignore the minority class samples, leading to classifier performance deterioration. This paper proposes a novel adaptive hybrid sampling method for the imbalanced data processing using a variation Bayesian optimized GMM (VBoGMM) estimation method. VBoGMM can automatically attenuate to the real number of Gaussian components to achieving the optimal estimation of any distribution. Based on the spatial distribution characteristics of unbalanced data sets, the adaptive synthetic sampling is performed, and the Tomek-link approach is further adopted to clean over-sampling samples to obtain a relatively balanced data set for the subsequent classifier learning. A large number of comparative experiments have been carried out on multiple public imbalanced data sets. Experimental results show that the proposed method can achieve relatively balanced samples while maintaining their spatial distribution characteristics of majority and minority samples, thus effectively improving the performance of traditional classifiers on various uneven data sets.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-08-02
  • 最后修改日期:2022-02-14
  • 录用日期:2022-02-25
  • 在线发布日期:
  • 出版日期: