基于不平衡数据样本特性的新型过采样SVM分类算法
CSTR:
作者:
作者单位:

(贵州大学现代制造技术教育部重点实验室,贵阳550025)

作者简介:

黄海松(1977-), 女, 教授, 从事智能制造、制造业信息化等研究;魏建安(1992-), 男, 硕士生, 从事智能制造、机器学习的研究.

通讯作者:

E-mail: 1046534381@qq.com

中图分类号:

TP273

基金项目:

贵州工业攻关重点项目(黔科合GZ字[2015]3009);贵州省自然科学基金项目(黔科合J字[2015]2043);贵州省重大专项项目(黔科合JZ字[2014]2001);贵州省教育厅项目(黔教合协同创新字[2015]02);贵州大学研究生创新基金项目(研理工2017037).


New over-sampling SVM classification algorithm based on unbalanced data sample characteristics
Author:
Affiliation:

(Key Laboratory of Advanced Manufacturing Technology of Ministry of Education,Guizhou University,Guiyang550025,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对传统采样方式准确率与鲁棒性不够明显,欠采样容易丢失重要的样本信息,而过采样容易引入冗杂信息等问题,以UCI公共数据集中的不平衡数据集Pima-Indians为例,综合考虑数据集正负类样本的类间距离、类内距离与不平衡度之间的关系,提出一种基于样本特性的新型过采样方式.首先对原始数据集进行距离带的划分,然后提出一种改进的基于样本特性的自适应变邻域Smote算法,在每个距离带的少数类样本中进行新样本的合成,并将此方式推广到UCI数据集中其他5种不平衡数据集.最后利用SVM分类器进行实验验证的结果表明:在6类不平衡数据集中,应用新型过采样SVM算法,相比已有的采样方式,少(多)数类样本的分类准确率均有明显提高,且算法具有更强的鲁棒性.

    Abstract:

    Aiming at the problem that the accuracy and robustness of the traditional sampling methods are not obvious, under-sampling is easy to lose important sample information, and oversampling is easy to introduce redundant information, the Pima-Indians dataset in the UCI common unbalanced datasets is taken as an example to consider the relationship between the distance within classes, the distance within classes and the imbalance, therefore, a new type oversampling method based on sample characteristics is presented. Firstly, the algorithm divides the original data set into some distance belts. Then an improved adaptive neighborhood neighborhood(Smote) algorithm based on sample characteristics is proposed to synthesize new samples in each class with several samples, and is extended to other five unbalanced data sets of UCI dataset. Finally, experiments are conducted using the traditional SVM classifier, and the results show that, in the six categories of unbalanced data sets, compared with the existing sampling method, the proposed algorithm improves the classification accuracy of the minority or majority class samples, and has stronger robustness.

    参考文献
    相似文献
    引证文献
引用本文

黄海松,魏建安,康佩栋.基于不平衡数据样本特性的新型过采样SVM分类算法[J].控制与决策,2018,33(9):1549-1558

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-09-06
  • 出版日期:
文章二维码