线性时序逻辑引导的安全强化学习

doi:10.13195/j.kzyjc.2021.1808

首页 > 过刊浏览>2023年第38卷第7期 >1835-1844. DOI:10.13195/j.kzyjc.2021.1808

线性时序逻辑引导的安全强化学习
DOI:
                        10.13195/j.kzyjc.2021.1808
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1. 中国科学技术大学 自动化系,合肥 230026;2. 理海大学 机械工程系,伯利恒 18015
作者简介:
通讯作者:E-mail: zkan@ustc.edu.cn.
中图分类号:TP242
基金项目:国家自然科学基金面上项目(62173314)；国家自然科学基金联合基金项目(U2013601).

Linear temporal logic guided safe reinforcement learning

Author:

Affiliation:

1. Department of Automation,University of Science and Technology of China,Hefei 230026,China;2. Department of Mechanical Engineering,Lehigh University,Bethlehem 18015,USA

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对动态不确定环境下机器人执行复杂任务的需求,提出一种线性时序逻辑(linear temporal logic,LTL)引导的无模型安全强化学习算法,能在最大化任务完成概率的同时保证学习过程的安全性.首先,综合考虑环境中的不确定因素,构建马尔可夫决策过程(Markov decision process, MDP),再用LTL刻画智能体的复杂任务,将其转化为有多接受集的基于转移的有限确定性广义布奇自动机(transition-based limit deterministic generalized Büchi automaton,tLDGBA),并通过接受边界函数构建可记录当前待访问接受集的约束型tLDGBA(constrained tLDGBA, ctLDGBA);其次,构建乘积MDP用于强化学习搜索最优策略;最后,基于LTL对安全性的描述和MDP的观测函数构建安全博弈,并根据安全博弈设计安全盾机制保证系统在学习过程中的安全性.严格的分析证明了所提出的算法能获得最大化LTL任务完成概率的最优策略.仿真结果验证了LTL引导的安全强化学习算法的有效性.

Abstract:

This paper presents a linear temporal logic(LTL) guided model-free safe reinforcement learning algorithm to synthesize a control policy that maximizes the satisfaction probability of complex task in an unknown stochastic environment and ensures the safety of agent during learning process. Considering environmental uncertainties, the probabilistic motion of the robot is modeled as a Markov decision process(MDP) with unknown transition probabilities. LTL is applied to describe the complex task, which can be converted to a transition-based limit deterministic generalized Büchi automaton(tLDGBA) with several accepting sets. The accepting frontier function is then designed to record the visited accepting sets, which gives rise to a constrained tLDGBA(ctLDGBA). To ensure the system safety, based on the safety fragment of the LTL formula and the observation function of the MDP, a safety game is constructed to synthesize a shield that ensures the system safety during the learning process. Rigorous analysis shows that the proposed safe reinforcement learning method is guaranteed to obtain the optimal policy that maximizes the probability of satisfying the LTL task while ensuring system safety. The effectiveness of the LTL guided safe reinforcement learning algorithm is demonstrated via simulation results.

参考文献

相似文献

引证文献

引用本文

李保罗,蔡明钰,阚震.线性时序逻辑引导的安全强化学习[J].控制与决策,2023,38(7):1835-1844

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2023-06-27
出版日期: 2023-07-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码