Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法

doi:10.13195/j.kzyjc.2019.0787

首页 > 过刊浏览>2021年第36卷第1期 >75-82. DOI:10.13195/j.kzyjc.2019.0787

Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法
DOI:
                        10.13195/j.kzyjc.2019.0787
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:(沈阳理工大学自动化与电气工程学院，沈阳110159)
作者简介:
通讯作者:E-mail: 71019976@qq.com.
中图分类号:TP181
基金项目:国家重点研发计划项目(2017YFC0821004,2017YFC0821001)；辽宁省自然科学基金项目(20170540788)；辽宁省教育厅基本科研项目(LG201707).

A multi-agent reinforcement learning algorithm based on improved DDPG in Actor-Critic framework

Author:

Affiliation:

(College of Automation and Electrical Engineering,Shenyang Ligong University,Shenyang110159,China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

现实世界的人工智能应用通常需要多个agent协同工作,人工agent之间有效的沟通和协调是迈向通用人工智能不可或缺的一步.以自主开发的警员训练虚拟环境为测试场景,设定任务需要多个不同兵种agent小队互相协作或对抗完成.为保证沟通方式有效且可扩展,提出一种混合DDPG(Mi-DDPG)算法.首先,在Actor网络加入双向循环神经网络(BRNN)作为同兵种agent信息交流层;然后,在Critic网络加入其他兵种agent信息来学习多agent协同策略.另外,为了缓解训练压力,采用集中训练,分散执行的框架,同时对Critic网络里的Q函数进行模块化处理.实验中,在不同的场景下用Mi-DDPG算法与其他算法进行对比,Mi-DDPG在收敛速度和任务完成度方面有明显提高,具有在现实世界应用的潜在价值.

Abstract:

Real-world artificial intelligence (AI) applications often require multiple agents to work together, and effective communication and coordination between artificial agents is an indispensable step toward universal artificial intelligence. This paper takes the self-developed virtual environment for police training as a test scenario. Setting tasks requires multiple different service agent teams to cooperate or fight against each other. In order to ensure that the communication method is effective and scalable, this paper proposes the mixed deep deterministic policy gradient (Mi-DDPG) algorithm. Firstly, the bidirectional recurrent neural networks (BRNN) is added to the Actor network as the information exchange layer of the same type of agent, and then the other agent information is added to the Critic network to learn the multi-agent cooperation strategy. In addition, in order to alleviate the training pressure, the centralized training and distributed execution framework are adopted, and the Q function in the Critic network is modularized. In the experiment, the Mi-DDPG algorithm is compared with other algorithms in different scenarios, which shows its most advanced performance and potential value in real-world.

参考文献

相似文献

引证文献

引用本文

陈亮,梁宸,张景异,等. Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法[J].控制与决策,2021,36(1):75-82

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2021-01-06
出版日期: 2021-01-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码