线性时序逻辑引导的安全强化学习
DOI:
作者:
作者单位:

1.中国科学技术大学自动化系;2.理海大学机械工程系

作者简介:

通讯作者:

中图分类号:

TP242

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Linear Temporal Logic Guided Safe Reinforcement Learning
Author:
Affiliation:

1.Department of Automation, University of Science and Technology of China;2.Department of Mechanical Engineering, Lehigh University

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对动态不确定环境下机器人执行复杂任务的需求, 本文提出了一种线性时序逻辑(LTL)引导的无模 型安全强化学习方法, 它能在最大化任务完成概率的同时保证学习过程的安全性. 首先, 综合考虑环境中的不 确定因素, 构建马尔可夫决策过程(MDP). 再用LTL刻画智能体的复杂任务, 将其转化为有多个接受集的基于转 移的有限确定性布奇自动机(tLDGBA), 并通过接受边界函数构建可记录当前待访问接受集的约束型tLDGBA (ctLDGBA). 其次, 构建乘积MDP用于强化学习搜索最优策略. 最后, 基于LTL对安全性的描述和MDP的观测函数 构建安全博弈, 并根据安全博弈设计安全盾机制保证系统在学习过程中的安全性. 严格的分析证明了本文提出的 算法能获得最大化LTL任务完成概率的最优策略. 仿真结果验证了LTL引导的安全强化学习算法的有效性.

    Abstract:

    This paper presents a Linear Temporal Logic (LTL) guided model-free safe reinforcement learning algorithm to synthesize a control policy that maximizes the satisfaction probability of complex task in an unknown stochastic environment and ensures the safety of agent during learning process. Considering environmental uncertainties, the probabilistic motion of the robot is modeled as a Markov Decision Process (MDP) with unknown transition probabilities. LTL is applied to describe the complex task, which can be converted to a transition-based limit deterministic generalized Buchi automaton (tLDGBA) with several accepting sets. The accepting frontier function is ¨ then designed to record the visited accepting sets, which gives rise to a constrained tLDGBA (ctLDGBA). To ensure the system safety, based on the safety fragment of LTL formula and the observation function of MDP, a safety game is constructed to synthesize a shield that ensures the system safety during the learning process. Rigorous analysis shows that our proposed safe reinforcement learning method is guaranteed to obtain the optimal policy that maximizes the probability of satisfying LTL task while ensuring system safety. The effectiveness of LTL guided safe reinforcement learning algorithm is demonstrated via simulation results.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-10-21
  • 最后修改日期:2022-03-16
  • 录用日期:2022-03-28
  • 在线发布日期:
  • 出版日期: