基于非对称强化学习的移动机器人自主导航算法研究

doi:10.13195/j.kzyjc.2025.0787

首页 > 过刊浏览>年第0卷第期 >. DOI:10.13195/j.kzyjc.2025.0787

基于非对称强化学习的移动机器人自主导航算法研究
DOI:
                        10.13195/j.kzyjc.2025.0787
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:1.江苏省南京市江宁区将军大道29号南京航空航天大学;2.江苏省南京市江宁区将军大道29号南京航空航天大学，广西省电网有限责任公司电力科学研究院;3.江苏省南京市江宁区将军大道29号南京航空航天大学，广东电网有限责任公司东莞供电局
作者简介:
通讯作者:
中图分类号:TP242
基金项目:广西电网公司2024年科技创新专业科技项目， 编号：GXKJXM20240152

Autonomous Navigation Algorithm for Mobile Robots Based on Asymmetric Reinforcement Learning

Author:

Affiliation:

Fund Project:

Guangxi Power Grid Company 2024 Science and Technology Innovation Projects,Number:GXKJXM20240152

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对动态非结构化环境中移动机器人感知不确定性与策略泛化能力不足的挑战，本文提出一种基于非对称强化学习的鲁棒自主导航策略优化框架（Robust Asymmetric Navigation, RANav）。该方法融合隐式环境估计、域随机化与非对称强化学习机制，提升机器人对动态环境的建模与决策能力。首先，构建多模态融合的隐式环境估计网络，以精确提取动态障碍物特征并提升场景表征能力；其次，引入基于行为域随机化机制，提升策略的Sim-to-Real迁移能力；最后，采用非对称近端策略优化（PPO）算法，利用特权信息优化Critic网络以提升策略学习效率。在多组仿真与真实场景实验中，RANav在导航成功率、避障鲁棒性与路径效率方面均显著优于现有方法，充分验证其在复杂非结构环境中的鲁棒泛化能力与实际部署潜力。

Abstract:

To address the challenges of perceptual uncertainty and limited policy generalization in dynamic, unstructured environments, this paper proposes a robust autonomous navigation policy optimization framework based on asymmetric reinforcement learning, termed Robust Asymmetric Navigation (RANav). The framework integrates implicit environment estimation, domain randomization, and asymmetric reinforcement learning to enhance the robot’s modeling and decision-making capabilities in dynamic settings. Specifically, a multimodal implicit environment estimation network is designed to accurately extract dynamic obstacle features and improve scene representation. A behavior-driven domain randomization mechanism is introduced to facilitate Sim-to-Real policy transfer. Finally, an asymmetric proximal policy optimization (PPO) algorithm is employed, where privileged information is provided to the Critic network during training to improve policy learning efficiency. Extensive simulations and real-world experiments demonstrate that RANav significantly outperforms existing methods in terms of navigation success rate, obstacle avoidance robustness, and path efficiency, verifying its strong generalization and deployment potential in complex, unstructured environments.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-07-25
最后修改日期:2025-11-07
录用日期:2025-11-07
在线发布日期: 2025-12-03
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码