MADDPG算法经验优先抽取机制
CSTR:
作者:
作者单位:

(1. 中国人民解放军陆军工程大学指挥控制工程学院,南京210007;2. 海军指挥学院,南京210000)

作者简介:

通讯作者:

E-mail: qdjmzb@qq.com.

中图分类号:

TP273

基金项目:

国家重点研发计划项目(2018YFC0806900,2016YFC0800606,2016YFC0800310);江苏省自然科学基金项目(BK20161469);江苏省重点研发计划项目(BE2016904,BE2017616,BE2018754);中国博士后基金项目(2018M633757).


Multi-agent deep deterministic policy gradient algorithm via prioritized experience selected method
Author:
Affiliation:

(1. College of Command and Control Engineering,The Army Engineering University of PLA,Nanjing210007,China;2. Naval Command College,Nanjing210000,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对多智能体深度确定性策略梯度算法(MADDPG)学习训练效率低、收敛速度慢的问题,研究MADDPG算法经验优先抽取机制,提出PES-MADDPG算法.首先,分析MADDPG算法的模型和训练方法;然后,改进多智能体经验缓存池,以策略评估函数误差和经验抽取训练频率为依据,设计优先级评估函数,以优先级作为抽取概率获取学习样本训练神经网络;最后,在合作导航和竞争对抗2类环境中进行6组对比实验,实验结果表明,经验优先抽取机制可提高MADDPG算法的训练速度,学习后的智能体具有更好的表现,同时对深度确定性策略梯度算法(DDPG)控制的多智能体训练具有一定的适用性.

    Abstract:

    In order to mitigate the problem of low efficiency and slow convergence of the multi-agent deep deterministic policy gradient(MADDPG) algorithm, the prioritized experience selection mechanism of MADDPG algorithm is studied and PES-MADDPG algorithm is proposed. Firstly, the model and the training method of the MADDPG algorithm are analyzed, the multi-agent experience buffer pool is ameliorated, and the priority evaluation function is designed based on the error of critic function and the training frequency of experience. The priority is treated as the selection probability to obtain the learning sample for training neural network. Finally, six groups of comparative experiments are conducted in both cooperative navigation and competitive environment. The experiments results show that the prioritized experience selection mechanism improves the training speed of the MADDPG algorithm, and the trained agents have better performance. The prioritized experience selection mechanism also has certain applicability to the training of multi-agents controlled by the deep detcrministic policy gradient(DDPG) algorithm.

    参考文献
    相似文献
    引证文献
引用本文

何明,张斌,柳强,等. MADDPG算法经验优先抽取机制[J].控制与决策,2021,36(1):68-74

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-01-06
  • 出版日期: 2021-01-20
文章二维码