RareBoost:一种基于稀有区域识别的稀有值自助不平衡回归方法
DOI:
CSTR:
作者:
作者单位:

昆明理工大学

作者简介:

通讯作者:

中图分类号:

68T05, 68T10

基金项目:

国家自然科学基金(12261052, 11761041),云南省自然科学基金(202501AS070103),云南省“兴滇人才支持计划”


RareBoost: A rare value self-boosting imbalanced regression method via rare region identification
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (grant nos. 12261052, 11761041), Natural Science Foundation of Yunnan Province of China (grant no. 202501AS070103),Yunnan Province "Xingdian Talent Support Plan"

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    不平衡回归是指连续目标变量分布不平衡且焦注于稀有值预测的方法. 与不平衡分类相比, 不平衡回归的一个额外挑战是如何界定稀有值并与普通值加以区分, 这是准确预测低密度区域稀有值的基础. 针对这一挑战, 本文首先提出了一种新的稀有区域识别策略(KK-means), 该方法结合核密度估计和K-means 聚类, 能够将目标变量空间中的稀疏样本点系统性地识别并合并为连续的稀有区间. 然后, 基于KK-means, 本文提出了一种基于稀有区域识别的稀有值自助不平衡回归方法(RareBoost). RareBoost先通过标签密度比加权从识别出的稀有区间中提取信息, 并在自助采样过程中动态调整样本权重, 增强模型对稀有区域的关注, 从而在训练过程中能够更好地学习稀有值样本的特征; 然后通过Stacking元学习器集成这些具有"稀有值感知"能力的基学习器, 形成兼顾全局效率与局部精度的稀有值预测模型. 实验结果表明, RareBoost在ANLL、RMSE与$R^{2} $等关键指标上优于传统方法. 因此, RareBoost为解决不平衡回归任务提供了一个有效的工具, 并在该领域展示了强大的竞争力.

    Abstract:

    Imbalanced regression refers to methods that focus on predicting rare values in cases where the distribution of the continuous target variable is imbalanced. Compared to imbalanced classification, an additional challenge in imbalanced regression is how to define rare values and distinguish them from regular values, which is essential for accurately predicting rare values in low-density regions. To address this challenge, this paper first proposes a novel rare region identification strategy (KK-means). This method combines kernel density estimation with K-means clustering, allowing for the systematic identification and merging of sparse sample points in the target variable space into continuous rare intervals. Based on KK-means, the paper then introduces a rare value self-boosting imbalanced regression method via rare region identification (RareBoost). RareBoost first extracts information from the identified rare intervals through label density ratio weighting and dynamically adjusts sample weights during bootstrap sampling to enhance the model""s focus on the rare regions, allowing the model to better learn the characteristics of rare value samples during training. Subsequently, a Stacking meta-learner is used to integrate base learners with "rare value awareness" capabilities, forming a rare value prediction model that balances global efficiency and local accuracy. Experimental results demonstrate that RareBoost outperforms traditional methods in key metrics such as ANLL, RMSE, and $R^{2}$. Therefore, RareBoost provides an effective tool for addressing imbalanced regression tasks and demonstrates strong competitiveness in this field.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-17
  • 最后修改日期:2026-03-02
  • 录用日期:2026-03-03
  • 在线发布日期:
  • 出版日期:
文章二维码