嵌入重采样技术的C4.5决策树集成分类算法的临床医学预测
作者:
作者单位:

(东北大学计算机科学与工程学院,沈阳110169)

作者简介:

通讯作者:

E-mail: shenderong@cse.neu.edu.cn.

中图分类号:

TP273

基金项目:

国家自然科学基金项目(61672142,61472070,61602103,62072084,62072086);国家重点研发计划项目(2018YFB1003404).


Clinical prediction of C4.5 decision tree classification algorithm with embedded resampling technique
Author:
Affiliation:

(College of Computer Science and Engineering,Northeastern University,Shenyang110169,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    决策树作为一种经典的分类算法,因其分类规则简单易懂被广泛应用于医学数据分析中.然而,医学数据的样本不平衡问题使得决策树算法的分类效果降低.数据重采样是目前解决样本不平衡问题的常见方法,通过改变样本分布提升少数类样本的分类性能.现有重采样方法往往独立于后续学习算法,采样后的数据对于弱分类器的构建不一定有效.鉴于此,提出一种基于C4.5算法的混合采样算法.该算法以C4.5算法为迭代采样的评价准则控制过采样和欠采样的迭代过程,同时依据数据的不平衡比动态更新过采样的采样倍率,最终以投票机制组合多个弱分类器预测结果.通过在9组UCI数据集上的对比实验,表明所提出算法的有效性,同时算法也在稽留流产数据上实现了准确的预测.

    Abstract:

    As a classical classification algorithm, the decision tree algorithm is widely used in medical data analysis because its classification rules are easy to understand. However, the unbalanced sample of medical data reduces the classification effect of the decision tree algorithm. Data resampling is a common method for solving the problem of sample imbalance. It mainly improves the classification performance of minority samples by changing the sample distribution. The existing resampling methods are often independent of the subsequent learning algorithms and the sampled data may not be effective for the construction of weak classifiers. Based on the above observations, we propose a hybrid sampling algorithm based on C4.5. Specifically, this algorithm controls the iterative process of oversampling and undersampling with the evaluation criteria of iterative sampling based on the C4.5. In addition, we dynamically update the sampling ratio of the oversampling based on the unbalanced ratio of the data and eventually combine multiple weak classifiers to predict the results with a voting mechanism. The effectiveness of the proposed algorithm is proved by the comparison experiments on 9 UCI datasets, and the algorithm also achieves accurate predictions on the missed abortion data.

    参考文献
    相似文献
    引证文献
引用本文

许召召,申德荣,寇月,等.嵌入重采样技术的C4.5决策树集成分类算法的临床医学预测[J].控制与决策,2021,36(6):1342-1350

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-05-10
  • 出版日期: 2021-06-20