极端不均衡分布下强化学习驱动的 TBM 主轴承故障辨识方法

doi:10.13195/j.kzyjc.2025.1142

首页 > 过刊浏览>年第0卷第期 >. DOI:10.13195/j.kzyjc.2025.1142

极端不均衡分布下强化学习驱动的 TBM 主轴承故障辨识方法
DOI:
                        10.13195/j.kzyjc.2025.1142
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1.西南交通大学机械工程学院;2.西南交通大学利兹学院;3.School of Mechanical Engineering, University of Leeds;4.中铁工程服务有限公司;5.西南交通大学摩擦学研究所
作者简介:
通讯作者:
中图分类号:TH17; TP277
基金项目:国家自然科学基金项目（52405220, 52475218）；中央高校基本科研业务费-科技创新项目（2682024CX066）；中国博士后科学基金面上项目（2025M771394）

A Reinforcement Learning-Driven Method for TBM Main Bearing Fault Identification under Extreme Class Imbalance

Author:

Affiliation:

Fund Project:

National Natural Science Foundation of China（52405220, 52475218）；Central University Basic Research Fund - Scientific and Technological Innovation Project（2682024CX066）；China Postdoctoral Science Foundation General Fund （2025M771394）

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

全断面隧道掘进机（Tunnel Boring Machine, TBM）主轴承的运行状态直接关系到整机掘进安全与效率, 其故障辨识至关重要. 然而, 工程中主动维护策略的实施致使状态监测数据呈现极端类别不均衡特性，故障样本比例往往低于 1%, 难以提取稀疏故障特征, 漏检风险极高. 为此, 本文提出一种深度强化学习故障辨识模型 (DRLimb). 该方法首先将故障辨识过程建模为马尔可夫决策过程，通过双网络架构（在线 Q 网络与目标 Q 网络）及软更新机制确保训练稳定性. 继而, 设计了非对称奖励函数, 通过为故障类样本分配更高的奖惩权重, 迫使智能体聚焦于稀疏但关键的故障模式. 理论分析证明，将多数类奖励系数设置为类别不均衡比率，可实现类间梯度贡献的均衡化. 在多个极端不均衡比率的 TBM 主轴承数据集上的实验表明,DRLimb 的 G-mean 值与 F1-Score 均稳定超过 93.2%，显著优于主流不均衡学习诊断模型与基线模型.

Abstract:

The operational state of the Tunnel Boring Machine (TBM) main bearing is critical to tunneling safety and efficiency, making fault identification essential. However, proactive maintenance strategies result in extremely imbalanced monitoring data, with fault samples often comprising less than 1%, which complicates sparse fault feature extraction and increases the risk of missed detection. To address this, a Deep Reinforcement Learning-based fault identification model (DRLimb) is proposed. This approach formulates fault identification as a Markov Decision Process and employs a dual-network architecture (online and target Q-networks) with soft updates to ensure stable training. An asymmetric reward function is designed to prioritize fault samples by assigning higher penalties and rewards, directing the agent’s focus toward sparse yet critical fault patterns. Theoretical analysis shows that setting the reward coefficient for the majority class equal to the class imbalance ratio balances gradient contributions across classes. Experiments on multiple severely imbalanced TBM main bearing datasets demonstrate that DRLimb consistently achieves G-mean and F1-Score values above 93.2%, significantly outperforming mainstream imbalanced learning diagnostic models and baseline approaches.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-11-03
最后修改日期:2026-01-28
录用日期:2026-01-29
在线发布日期: 2026-02-26
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码