Abstract:To address the green development goals of cold chain logistics and the trend toward vehicle electrification, this study focuses on optimizing cross-regional routes for electric refrigerated vehicles under multi-objective considerations, including traffic congestion, charging costs, and energy consumption. An improved Q-learning method is proposed, which integrates a heuristic reward mechanism and dynamic strategies combining cosine annealing learning rates and exponential decay exploration rates to enhance algorithm performance. Simulation experiments and comparative analyses are conducted to validate the approach. Experimental data demonstrate that the improved reinforcement learning algorithm effectively optimizes cross-regional cold chain delivery routes by accounting for traffic conditions, the initial battery level of electric refrigerated vehicles, and energy consumption rates. Compared to three other Q-learning algorithms, the proposed method significantly reduces both total travel distance and energy consumption (p$ < $0.05, Welch’s t-test) across six distinct testing scenarios. The results indicate that the proposed method exhibits strong adaptability and robustness in various environmental modeling scenarios, including highways, urban roads, and charging station deployment.