基于变分贝叶斯推断最优高斯混合模型的自适应不均衡数据综合采样法
CSTR:
作者:
作者单位:

湖南师范大学 智能计算与语言信息处理湖南省重点实验室, 长沙 410081

作者简介:

通讯作者:

E-mail: xupf@hunnu.edu.cn.

中图分类号:

TP273

基金项目:

国家自然科学基金项目(61971188).


Adaptive synthetic sampling of imbalanced data based on variation Bayesian-optimized Gaussian mixture model
Author:
Affiliation:

Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing,Hunan Normal University,Changsha 410081,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    实际的分类数据往往是分布不均衡的.传统的分类器大都会倾向多数类而忽略少数类,导致分类性能恶化.针对该问题提出一种基于变分贝叶斯推断最优高斯混合模型(varition Bayesian-optimized optimal Gaussian mixture model,VBoGMM)的自适应不均衡数据综合采样法.VBoGMM可自动衰减到真实的高斯成分数,实现任意数据的最优分布估计;进而基于所获得的分布特性对少数类样本进行自适应综合过采样,并采用Tomek-link对准则对采样数据进行清洗以获得相对均衡的数据集用于后续的分类模型学习.在多个公共不均衡数据集上进行大量的验证和对比实验,结果表明:所提方法能在实现样本均衡化的同时,维持多数类与少数类样本空间分布特性,因而能有效提升传统分类模型在不均衡数据集上的分类性能.

    Abstract:

    In actual pattern classification tasks, the processing data is generally imbalanced. Traditional pattern classification models tend to learn towards the majority class and ignore the minority class samples, leading to classifier performance deterioration. This paper proposes an adaptive synthetic sampling method for the imbalanced data processing using a variation Bayesian optimized GMM(VBoGMM) estimation method. The VBoGMM can automatically attenuate to the real number of Gaussian components to achieve the optimal estimation of any distribution. Based on the spatial distribution characteristics of unbalanced data sets, the adaptive synthetic sampling is performed, and the Tomek-link approach is further adopted to clean over-sampling samples to obtain a relatively balanced data set for the subsequent classifier learning. A large number of comparative experiments have been carried out on multiple public imbalanced data sets. Experimental results show that the proposed method can achieve relatively balanced samples while maintaining their spatial distribution characteristics of majority and minority samples, thus effectively improving the performance of traditional classifiers on various uneven data sets.

    参考文献
    相似文献
    引证文献
引用本文

刘金平,杨本芳,周嘉铭,等.基于变分贝叶斯推断最优高斯混合模型的自适应不均衡数据综合采样法[J].控制与决策,2023,38(6):1653-1660

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-05-13
  • 出版日期: 2023-06-20
文章二维码