约束环境下基于深度强化学习的协同路径规划研究
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

国家自然科学基金项目(61903022, 62173155);载运工具先进制造与测控技术教育部重点实验室(北京交通大学)开放课题基金项目.


Collaborative pathfinding research based on deep reinforcement learning in constrained environments
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    对于部分可观马尔可夫决策过程下的多智能体路径规划任务, 现有研究基于栅格或质点环境, 与真实物理环境有较大差距. 鉴于此, 研究如何在更加接近实际物理约束环境下提升多智能体协同路径规划的效果. 一方面, 在考虑真实物理约束的情况下, 根据执行器饱和以及欠驱动等构建多约束动作空间, 根据距离和位置等搭建多源输入状态空间, 设计抗冗余奖励函数来减小无人车行驶过程中动作冗余; 另一方面, 针对在Gazebo环境下训练难度高、效率低、难收敛等问题, 提出基于预训练-微调方法的多智能体双延迟深度确定性策略梯度算法, 利用预训练使得模型获得一个更优的初始值, 提升训练效率, 同时, 利用微调对预训练先验模型进行针对性优化, 增强模型训练过程抵抗环境非平稳性能力. 在Gazebo仿真环境中, 通过与PMATD3、MATD3、MADDPG等算法对比, 验证所提出算法的有效性.

    Abstract:

    In the domain of multi-agent pathfinding within the framework of partially observable Markov decision processes, the existing research mainly focuses on grid or particle environments, which are far from the real-world physical environments. This paper delves into enhancing the performance of collaborative multi-agent pathfinding in environments closer to actual physical constraints. In consideration of the realities of physical limitations, a multiple constraint action space that accounts for actuator saturation and underactuation is constructed. Concurrently, a multi-source input state space based on distance and spatial coordinates is developed. Furthermore, an anti-redundancy reward function is designed to reduce the redundancy of the actions during the navigational processes of unmanned vehicles. Moreover, in dealing with the challenges of elevated training complexity, suboptimal efficiency, and convergence difficulties within the Gazebo simulation environment, we propose a pre-training and fine-tuning based multi-agent twinned delayed deep deterministic policy gradient algorithm. This method leverages pre-training to confer upon the model a more optimal initial state to improve the training efficiency. Subsequently, fine-tuning is employed to refine the pre-trained model, which further enhances the model's resilience to the non-stationary environment during the training phase. In the Gazebo simulation environment, the effectiveness of the proposed algorithm is verified by comparing it with the algorithms such as PMATD3, MATD3 and MADDPG.

    参考文献
    相似文献
    引证文献
引用本文

董立静,肖思哲,牛思,等.约束环境下基于深度强化学习的协同路径规划研究[J].控制与决策,2025,40(6):1838-1846

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-26
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-04-30
  • 出版日期: 2025-06-20
文章二维码