部分可观测下基于RGMAAC算法的多智能体协同
CSTR:
作者:
作者单位:

1. 北京交通大学 电子信息工程学院,北京 100091;2. 北京工业大学 信息学部,北京 100124

作者简介:

通讯作者:

E-mail: yxzhang@bjtu.edu.cn.

中图分类号:

TP181

基金项目:


Multi-agent collaboration based on RGMAAC algorithm under partial observability
Author:
Affiliation:

1. School of Electronic and Information Engineering,Beijing Jiaotong University,Beijing 100091,China;2. Department of Information Science,Beijing University of Technology,Beijing 100124,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    多智能体深度强化学习(MADRL)将深度强化学习的思想和算法应用到多智能体系统的学习和控制中,是开发具有群智能体的多智能体系统的重要方法.现有的MADRL研究主要基于环境完全可观测或通信资源不受限的假设展开算法设计,然而部分可观测性是多智能体系统实际应用中客观存在的问题,例如智能体的观测范围通常是有限的,可观测的范围外不包括完整的环境信息,从而对多智能体间协同造成困难.鉴于此,针对实际场景中的部分可观测问题,基于集中式训练分布式执行的范式,将深度强化学习算法Actor-Critic扩展到多智能体系统,并增加智能体间的通信信道和门控机制,提出recurrent gated multi-agent Actor-Critic算法(RGMAAC).智能体可以基于历史动作观测记忆序列进行高效的通信交流,最终利用局部观测、历史观测记忆序列以及通过通信信道显式地由其他智能体共享的观察进行行为决策;同时,基于多智能体粒子环境设计多智能体同步且快速到达目标点任务,并分别设计2种奖励值函数和任务场景.实验结果表明,当任务场景中明确出现部分可观测问题时, RGMAAC算法训练后的智能体具有很好的表现,在稳定性方面优于基线算法.

    Abstract:

    Multi-agent deep reinforcement learning(MADRL) applies the ideas and algorithms of deep reinforcement learning to the learning and control of multi-agent systems, which is an important method to develop multi-agent systems with swarm agents. Existing MADRL studies mainly design algorithms based on the assumption that the environment is completely observable or communication resources are not limited. However, partial observability is an objective problem in the practical application of multi-agent systems. For example, the observation range of agentsis is usually limited, and the complete environmental information is not included outside the observable range, which makes it difficult for multi-agent collaboration. Aiming at the problem of partial observability in real scenes, based on the paradigm of centralized training and distributed execution, this paper extends the deep reinforcement learning algorithm Actor-Critic to multi-agent systems and adds communication channels and gating mechanisms between agents, finally proposes a recurrent gated multi-agent Actor-Critic (RGMAAC) algorithm. Agents can communicate efficiently based on the historical action observation sequence, and finally use the local observation, the historical observation sequence and observations shared by other agents through communication channels to make behavior decisions. Meanwhile, based on the multi-agent particle environment, the multi-agent task of synchronous and fast arrival is designed, and two reward value functions and task scenarios are designed respectively. The experimental results show that the trained agent with the RGMAAC algorithm performs well and is superior to the baseline algorithm in terms of stability when some observable problems clearly appear in the task scenario.

    参考文献
    相似文献
    引证文献
引用本文

王子豪,张严心,黄志清,等.部分可观测下基于RGMAAC算法的多智能体协同[J].控制与决策,2023,38(5):1267-1277

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-04-18
  • 出版日期: 2023-05-20
文章二维码