基于前向动力学预测与非对称强化学习的安全导航方法
DOI:
CSTR:
作者:
作者单位:

1.哈尔滨工程大学,智能科学与工程学院;2.小米EV;3.中国科学院沈阳自动化研究所;4.哈尔滨工程大学

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:


A Safe Navigation Method Based on Forward Dynamics Prediction and Asymmetric Reinforcement Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对移动机器人在复杂动态环境中易受到感知不完备与传感器噪声的影响,以及由此导致的对动态障碍物未来状态难以预测和决策不确定性增大的问题,本文提出一种融合环境前向动力学预测的非对称强化学习自主导航框架(Forward-dynamics-assisted Asymmetric Navigation,FANav)。首先,通过环境前向动力学模型(Environment Forward Dynamics Model, E-FDM)在训练阶段学习机器人—环境交互的短期状态演化关系,并在决策优化中加入预测的环境变化与碰撞风险,引导策略实现前瞻性决策;在部署阶段,E-FDM 基于局部观测与当前动作在线提供短期环境预测,辅助实时避障决策。其次,为应对模型预测误差及感知噪声引发的安全风险,在执行层引入融合短期风险预测的控制屏障函数(Control Barrier Function, CBF)安全过滤机制,利用前瞻风险信息在线调制安全约束强度,并通过二次规划对策略输出进行最小干预修正,实现形式化安全约束。最后,采用非对称近端策略优化框架(Proximal Policy Optimization, PPO),利用前向预测等特权信息优化价值网络,以提升策略学习效率与训练稳定性。实验结果表明,所提方法在避障安全性、决策鲁棒性和运动平滑性方面显著优于现有方法。

    Abstract:

    To address the difficulty of predicting the future states of dynamic obstacles and the increased decision uncertainty caused by incomplete perception and sensor noise for mobile robots in complex dynamic environments, this paper proposes Forward-dynamics-assisted Asymmetric Navigation (FANav), an autonomous navigation framework that integrates environment forward-dynamics prediction with asymmetric reinforcement learning. Specifically, an Environment Forward Dynamics Model (E-FDM) is introduced to learn the short-term evolution of robot–environment interactions during training, while predicted environmental changes and collision risks are incorporated into policy optimization to promote anticipatory decision-making. During deployment, the E-FDM generates online short-term environment predictions from local observations and the current action to assist real-time obstacle avoidance. To mitigate the safety risks induced by model prediction errors and perception noise, a Control Barrier Function (CBF)-based safety filter incorporating short-term risk prediction is further introduced at the execution layer. Using predictive risk information, the filter adjusts safety constraint strength online and applies minimally invasive corrections to policy outputs via quadratic programming, thereby ensuring formal safety guarantees. In addition, an asymmetric Proximal Policy Optimization (PPO) framework is adopted, where privileged information such as forward predictions is used to optimize the value network, improving learning efficiency and training stability. Experimental results show that the proposed method significantly outperforms baseline methods in obstacle-avoidance safety, decision robustness, and motion smoothness

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2026-03-13
  • 最后修改日期:2026-06-02
  • 录用日期:2026-06-03
  • 在线发布日期:
  • 出版日期:
文章二维码