基于多智能体强化学习的无人艇协同围捕方法

doi:10.13195/j.kzyjc.2022.0564

首页 > 过刊浏览>2023年第38卷第5期 >1438-1447. DOI:10.13195/j.kzyjc.2022.0564

基于多智能体强化学习的无人艇协同围捕方法
DOI:
                        10.13195/j.kzyjc.2022.0564
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1. 海军工程大学 兵器工程学院,武汉 430033;2. 海军工程大学 电子工程学院,武汉 430033
作者简介:
通讯作者:E-mail: 1580284687@qq.com.
中图分类号:TP249
基金项目:中国博士后科学基金项目(2016T45686)；湖北省自然科学基金项目(2018CFC865)；全军军事类研究项目(YJ2020B117).

Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning

Author:

Affiliation:

1. College of Weaponry Engineering,Naval University of Engineering,Wuhan 430033,China;2. College of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对多无人艇对海上逃逸目标的围捕问题,提出一种基于多智能体强化学习的围捕算法.首先,以无人艇协同进攻为背景建立无边界围捕问题的环境和运动学模型,并针对快速性和合围性的需求给出围捕成功的判定条件;然后,基于多智能体近端策略优化(MAPPO)算法建立马尔可夫决策过程框架,结合围捕任务需求分别设计兼具伸缩性和排列不变性的状态空间,围捕距离、方位解耦的动作空间,捕获奖励与步长奖励相结合的奖励函数;最后,采用集中式训练、分布式执行的架构完成对围捕策略的训练,训练时采用课程式学习训练技巧,无人艇群共享相同的策略并独立执行动作.仿真实验表明,在无人艇起始数量不同的测试条件下,所提出方法在围捕成功率和时效性上相较于其他算法更具优势.此外,当无人艇节点损毁时,剩余无人艇仍然具备继续执行围捕任务的能力,所提出方法鲁棒性强,具有在真实环境中部署应用的潜力.

Abstract:

To solve the hunting problem of multi-USVs(unmanned surface vehicles) on the sea, a multi-agent reinforcement learning hunting algorithm is proposed. Firstly, the environmental and kinematic model of the boundary-free hunting problem is established based on the background of the cooperative attack of USVs, and the criteria for successful hunting are given according to the requirements of rapidity and encirclement. Then, a Markov decision process framework is established based on the multi-agent PPO(MAPPO) algorithm. The state-space with scalability and permutation invariant, an action space with decoupling of capture distance and azimuth, and a reward function combining capture reward and step reward are designed. Finally, the framework of centralized training and distributed execution is adopted to train the policy. During the training, the skills of curriculum learning are used to make the network converge quickly, and the USVs share the same strategy and execute the action independently. Simulation shows that the proposed method has more advantages than other algorithms in the hunting success rate and timeliness under different testing conditions. In addition, when some of the USVs are failed, the remaining USVs can continue the task, which proves strong robustness and potential for deployment in a real environment.

参考文献

相似文献

引证文献

引用本文

夏家伟,朱旭芳,张建强,等.基于多智能体强化学习的无人艇协同围捕方法[J].控制与决策,2023,38(5):1438-1447

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2023-04-18
出版日期: 2023-05-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码