基于主动风险防御机制的多机器人强化学习协同对抗策略
CSTR:
作者:
作者单位:

1. 北京林业大学 工学院,北京 100083;2. 华北科技学院 机电工程学院,河北 廊坊 065201;3. 国家林业和草原局林业装备与自动化重点实验室,北京 100083

作者简介:

通讯作者:

E-mail: zhangjunguo@bjfu.edu.cn.

中图分类号:

TP24

基金项目:

国家自然科学基金项目(61703047);河北省高等学校科学技术研究项目(QN2021312).


Cooperative countermeasure strategy based on active risk defense multi-agent reinforcement learning
Author:
Affiliation:

1. School of Technology,Beijing Forestry University,Beijing 100083,China;2. School of Mechanical and Electrical Engineering,North China Institute of Science and Technology,Langfang 065201,China;3. Key Lab of State Forestry and Grassland Administration for Forestry Equipment and Automation,Beijing 100083,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    深度强化学习因其在多机器人系统中的高效表现,已经成为多机器人领域的研究热点.然而,当遭遇连续时变、风险未知的非结构场景时,传统方法暴露出风险防御能力差、系统安全性能脆弱的问题,未知风险将以对抗攻击的形式给多机器人的状态空间带来非线性入侵.针对这一问题,提出一种基于主动风险防御机制的多机器人强化学习方法(APMARL).首先,基于局部可观察马尔可夫博弈模型,建立多机记忆池共享的风险判别机制,通过构建风险状态指数提前预测当前行为的安全性,并根据风险预测结果自适应执行与之匹配的风险处理模式;特别地,针对有风险侵入的非安全状态,提出基于增强型注意力机制的Actor-Critic主动防御网络架构,实现对重点信息的分级增强和危险信息的有效防御.最后,通过广泛的多机协作对抗任务实验表明,具有主动风险防御机制的强化学习策略可以有效降低敌对信息的入侵风险,提高多机器人协同对抗任务的执行效率,增强策略的稳定性和安全性.

    Abstract:

    Deep reinforcement learning(DRL) has become a hotspot in the field of multi-robot systems due to its efficient performance. However, when encountering unstructured environment with time-varying and unknown risks, the traditional DRL methods exposes the disadvantage of poor risk defense ability and fragile system security. The unknown risk will bring nonlinear intrusion to the state space of multi-robot systems in the form of anti attack, which will pose a serious threat to the estimation of robot motion strategy. To solve this problem, this paper proposes a multi-agent reinforcement learning method based on active risk defense mechanism(ARD-MARL). Firstly, based on the locally observable Markov game model, a risk discrimination mechanism with global communication information is established to predict the current behavior state. Secondly, in the strategy deployment stage, we build an event-triggered multi risk processing scheme to implement the matching security strategy for different levels of risk prediction. Then, aiming at the dangerous state with risk intrusion, an active defense Actor-Critic network architecture based on the enhanced attention mechanism is designed. Through magnifying the important information and restraining the threat information, a safer and more efficient motion strategy is generated. Finally, extensive experiments are carried out in multi-agent cooperative and confrontation tasks. The results show that the multi-robot reinforcement learning method with active security defense mechanism can effectively enhance the stability and anti risk ability, and improve the security of information transmissions.

    参考文献
    相似文献
    引证文献
引用本文

孙辉辉,胡春鹤,张军国.基于主动风险防御机制的多机器人强化学习协同对抗策略[J].控制与决策,2023,38(5):1420-1429

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-04-18
  • 出版日期: 2023-05-20
文章二维码