Abstract:To address the issues of unstable formation, control lag, and degraded safety when vehicle platoons drive in complex road scenarios such as varying-curvature ramps, this paper proposes a Stage Reward Shaping Twin Delayed Deep Deterministic Policy Gradient (SR-TD3) algorithm. By combining connected vehicle information with a multi-agent reinforcement learning framework, this algorithm achieves stable cooperative control of platoons on varying-curvature roads. It accomplishes this by designing safety-, smoothness-, and efficiency-oriented reward functions specifically for the three curvature stages: entering, maintaining, and exiting the road. Structurally, the algorithm adopts a Centralized Training with Decentralized Execution architecture, integrates parameter sharing and prioritized experience replay mechanisms, and introduces a residual regularization term into the Critic network to suppress value fluctuations. Experimental results based on the CARLA simulation platform demonstrate that, compared with five existing algorithms, the SR-TD3 algorithm achieves significant improvements in convergence speed, stability, and car-following precision. Specifically, during the road curvature transition stage, the root mean square error of vehicle speed is reduced by 41.31%, 86.53%, 58.25%, 73.08%, and 81.60% respectively compared to the five baseline algorithms. Throughout the entire varying-curvature ramp, inter-vehicle distance control is improved by 27.30%, 54.52%, 36.66%, 77.51%, and 76.37% respectively. Furthermore, the reward curve converges faster with smaller fluctuations, demonstrating superior control stability and learning efficiency.