分阶段奖励优化的变曲率匝道深度强化学习车队协同控制

doi:10.13195/j.kzyjc.2025.1145

首页 > 过刊浏览>年第0卷第期 >. DOI:10.13195/j.kzyjc.2025.1145

分阶段奖励优化的变曲率匝道深度强化学习车队协同控制
DOI:
                        10.13195/j.kzyjc.2025.1145
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1.重庆交通大学;2.重庆邮电大学
作者简介:
通讯作者:
中图分类号:TP273
基金项目:国家自然科学基金项目（青年项目）,中国博士后科学基金

Phase-Based Reward Optimization for Deep Reinforcement Learning–Based Platoon Cooperative Control on Variable Curvature Ramps

Author:

Affiliation:

Fund Project:

The National Natural Science Foundation of China (Youth Program),China Postdoctoral Science Foundation

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对车队在变曲率匝道等复杂道路场景下行驶易出现队形不稳、控制迟滞及安全性下降的问题,本文提出一种基于分阶段奖励优化的双延迟深度确定性策略梯度算法(Stage Reward Shaping Twin Delayed DDPG,SRTD3),该算法结合车联网信息与多智能体强化学习框架,通过在进入、保持和驶出三个曲率阶段分别设计安全、平滑与效率导向的奖励函数,实现车队在变曲率道路下的稳定协同控制.在算法结构上采用集中训练与分布执行架构,融合参数共享与优先经验回放机制,并在Critic网络中引入残差正则项以抑制价值波动.基于CARLA仿真平台的实验结果表明,与现有的五种算法相比,SR-TD3算法在收敛速度、稳定性及跟车精度方面均有显著提升.其中,在道路曲率过渡阶段车速均方根误差较五种算法分别降低了41.21%、 86.49%、 67.53%、 80.92%和83.33%;在整个变曲率匝道上,车距控制较前五者分别提升了27.30%、 54.52%、 61.57%、 77.51%和76.37%,同时奖励曲线收敛更快、波动更小,表现出更高的控制稳定性与学习效率.仿真结果表明,SR-TD3算法能够在变曲率道路场景下实现更优的协同控制性能.

Abstract:

To address the issues of unstable formation, control lag, and degraded safety when vehicle platoons drive in complex road scenarios such as varying-curvature ramps, this paper proposes a Stage Reward Shaping Twin Delayed Deep Deterministic Policy Gradient (SR-TD3) algorithm. By combining connected vehicle information with a multi-agent reinforcement learning framework, this algorithm achieves stable cooperative control of platoons on varying-curvature roads. It accomplishes this by designing safety-, smoothness-, and efficiency-oriented reward functions specifically for the three curvature stages: entering, maintaining, and exiting the road. Structurally, the algorithm adopts a Centralized Training with Decentralized Execution architecture, integrates parameter sharing and prioritized experience replay mechanisms, and introduces a residual regularization term into the Critic network to suppress value fluctuations. Experimental results based on the CARLA simulation platform demonstrate that, compared with five existing algorithms, the SR-TD3 algorithm achieves significant improvements in convergence speed, stability, and car-following precision. Specifically, during the road curvature transition stage, the root mean square error of vehicle speed is reduced by 41.31%, 86.53%, 58.25%, 73.08%, and 81.60% respectively compared to the five baseline algorithms. Throughout the entire varying-curvature ramp, inter-vehicle distance control is improved by 27.30%, 54.52%, 36.66%, 77.51%, and 76.37% respectively. Furthermore, the reward curve converges faster with smaller fluctuations, demonstrating superior control stability and learning efficiency.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-11-04
最后修改日期:2026-03-11
录用日期:2026-03-12
在线发布日期: 2026-03-26
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码