Abstract:To address the problems of low path planning success rate and suboptimal performance caused by slow convergence speed, low exploration efficiency, and insufficient sample utilization of traditional deep reinforcement learning methods in multi-Automated Guided Vehicle path planning research, this paper proposes an improved Multi-Agent Deep Deterministic Policy Gradient algorithm based on Adaptive Exploration and Curriculum Learning (AECL-MADDPG). Firstly, an adaptive exploration strategy based on dynamic congestion awareness is designed. In this strategy, the Q-value discrepancy between dual-Critic networks is adopted as the metric for decision-making uncertainty, which jointly drives the dynamic adjustment of exploration intensity with the environmental congestion level, thus improving the dynamic adaptability of the exploration strategy. Secondly, a prioritized experience replay (PER) mechanism based on curriculum learning is constructed. This mechanism deeply integrates the "progressive task difficulty" of curriculum learning with the "sample value ranking" of prioritized experience replay, and realizes the smooth transition of samples across difficulty levels through curriculum weighting, so as to avoid policy oscillation. Thirdly, a multi-dimensional curriculum advancement and rollback mechanism is developed. It breaks through the limitation of the crude advancement criterion based on a single success-rate threshold, by introducing a comprehensive multi-metric evaluation system including collision rate and path efficiency, as well as performance rollback protection, thus significantly improving training stability. Finally, simulation experiments are conducted in a warehousing environment, and comparative validation between the proposed algorithm and mainstream baseline algorithms is performed. The comparison results of key indicators including convergence speed, task success rate, and average path length fully verify the feasibility and effectiveness of the proposed path planning algorithm.