基于自适应探索与课程学习的多AGV路径规划算法
CSTR:
作者:
作者单位:

昆明理工大学

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Multi-AGV path planning algorithm based on adaptive exploration and curriculum learning
Author:
Affiliation:

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对多自动导引车路径规划研究中传统深度强化学习方法收敛速度缓慢、探索效率低、样本利用不充分等问题导致的路径规划成功率低且效果不佳,提出一种基于自适应探索与课程学习的改进型多智能体深度确定性策略梯度算法(AECL-MADDPG).首先,设计基于动态拥塞感知的自适应探索策略,将双Critic的Q值差异作为决策不确定性度量,与环境拥塞度联合驱动探索强度动态调整,提高探索策略的动态适应性;其次,构建基于课程学习的优先经验回放机制(PER),将课程学习的"任务难度递进"与优先经验回放的"样本价值排序"深度融合,通过课程权重实现跨难度等级样本的平滑过渡,避免策略震荡;再次,设计多维度课程晋级与回调机制,突破单一成功率阈值的粗糙晋级标准,引入碰撞率和路径效率等多指标综合评估及性能回退保护,提升训练稳定性;最后,在仓储环境中开展仿真实验,并将所提算法与主流算法进行对比验证,通过对比在收敛速度、成功率、平均路径长度等关键指标上的差异,验证了所提出路径规划算法的可行性和有效性.

    Abstract:

    To address the problems of low path planning success rate and suboptimal performance caused by slow convergence speed, low exploration efficiency, and insufficient sample utilization of traditional deep reinforcement learning methods in multi-Automated Guided Vehicle path planning research, this paper proposes an improved Multi-Agent Deep Deterministic Policy Gradient algorithm based on Adaptive Exploration and Curriculum Learning (AECL-MADDPG). Firstly, an adaptive exploration strategy based on dynamic congestion awareness is designed. In this strategy, the Q-value discrepancy between dual-Critic networks is adopted as the metric for decision-making uncertainty, which jointly drives the dynamic adjustment of exploration intensity with the environmental congestion level, thus improving the dynamic adaptability of the exploration strategy. Secondly, a prioritized experience replay (PER) mechanism based on curriculum learning is constructed. This mechanism deeply integrates the "progressive task difficulty" of curriculum learning with the "sample value ranking" of prioritized experience replay, and realizes the smooth transition of samples across difficulty levels through curriculum weighting, so as to avoid policy oscillation. Thirdly, a multi-dimensional curriculum advancement and rollback mechanism is developed. It breaks through the limitation of the crude advancement criterion based on a single success-rate threshold, by introducing a comprehensive multi-metric evaluation system including collision rate and path efficiency, as well as performance rollback protection, thus significantly improving training stability. Finally, simulation experiments are conducted in a warehousing environment, and comparative validation between the proposed algorithm and mainstream baseline algorithms is performed. The comparison results of key indicators including convergence speed, task success rate, and average path length fully verify the feasibility and effectiveness of the proposed path planning algorithm.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-11
  • 最后修改日期:2026-04-03
  • 录用日期:2026-04-05
  • 在线发布日期: 2026-04-15
  • 出版日期:
文章二维码