事件触发式多智能体分层安全强化学习运动规划

doi:10.13195/j.kzyjc.2023.1288

首页 > 过刊浏览>2024年第39卷第11期 >3755-3762. DOI:10.13195/j.kzyjc.2023.1288

事件触发式多智能体分层安全强化学习运动规划
DOI:
                        10.13195/j.kzyjc.2023.1288
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1. 北京林业大学 工学院,北京 100083;2. 淮南师范学院 机械与电气工程学院,安徽 淮南 232038;3. 林木资源高效生产全国重点实验室,北京 100083
作者简介:
通讯作者:E-mail: huchunhe@bjfu.edu.cn.
中图分类号:TP24
基金项目:国家自然科学基金项目(61703047)；河北省高等学校科学技术研究项目(QN2021312).

Multi-agent event triggered hierarchical security reinforcement learning

Author:

Affiliation:

1. School of Technology,Beijing Forestry University,Beijing 100083,China;2. School of Mechanical and Electrical Engineering,Huainan Normal University,Huainan 232038,China;3. State Key Laboratory of Efficient Production of Forest Resources,Beijing 100083,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对深度强化学习序贯决策过程中面临的动作安全性问题,研究一种事件触发式多智能体分层安全强化学习运动规划方法.首先,基于受限马尔可夫决策模型,构建一种具备安全约束的多智能体深度确定性策略梯度框架,该框架针对不同状态空间,以事件触发的方式实现运动策略的分层学习;然后,通过引入李雅普诺夫评价网络,建立带有条件约束的目标动作选择机制,并利用拉格朗日乘子法,解决多目标约束求解困难的问题,保证机器人内部决策的安全性;最后,在多机器人强化学习场景中对所提出方法进行实验.实验结果表明:触发式多智能体分层安全强化学习方法使得机器人的状态轨迹从危险状态中快速恢复至安全空间,增强了策略的安全性和多机协同运动规划能力.

Abstract:

In order to address the security issues that may arise in the sequential decision-making process of deep reinforcement learning, this paper studies a motion planning method based on multi-agent event triggered hierarchical security reinforcement learning(MEHSRL) method. Firstly, this method constructs a multi-agent twin delayed deep deterministic policy gradient algorithm based on the constrained Markov decision model. The model uses state security events as trigger conditions to implement hierarchical reinforcement learning in different state spaces. Then, by introducing a Lyapunov evaluation network, additional safety constraint rules are constructed for the reinforcement learning network, and the safety of robot decision is ensured by multi constraint objective optimization learning. Finally, the proposed method is tested in the security reinforcement learning scenario. The results show that proposed method achieves the goal of restoring the state trajectory from the dangerous state to the safe space in a limited time, improving the security of the strategy, and the effect of motion planning is better than the comparison method.

参考文献

相似文献

引证文献

引用本文

孙辉辉,胡春鹤,张军国.事件触发式多智能体分层安全强化学习运动规划[J].控制与决策,2024,39(11):3755-3762

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-09-20
出版日期: 2024-11-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码