基于分层深度强化学习的移动机器人导航方法
作者:
作者单位:

中国科学技术大学信息科学技术学院

作者简介:

通讯作者:

中图分类号:

TP242

基金项目:

中国科学技术大学优秀引进人才基金,国家自然科学基金项目(61971393、61871361)


Navigation Method for Mobile Robot Based on Hierarchical Deep Reinforcement Learning
Author:
Affiliation:

School of Information Science and Technology,University of Science and Technology of China

Fund Project:

Scientific Research Foundation for the Talents of USTC and the National Natural Science Foundation of China(61971393,61871361)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有基于深度强化学习(deep reinforcement learning, DRL)的分层导航方法在包含长廊、死角等结构的复杂环境下导航效果不佳的问题, 本文提出了一种基于option-based分层深度强化学习(hierarchical deep reinforcement learning, HDRL)的移动机器人导航方法. 该方法的模型框架分为高层和低层两部分, 其中低层的避障和目标驱动控制模型分别实现避障和目标接近两种行为策略, 而高层的行为选择模型可自动学习稳定、可靠的行为选择策略, 从而有效避免对人为设计调控规则的依赖. 此外, 本文方法通过对避障控制模型进行优化训练, 使学习到的避障策略更加适用于复杂环境下的导航任务. 在与现有DRL方法的对比实验中, 该方法在本文使用的全部仿真测试环境中均取得了最高的导航成功率, 同时在其它指标上也具有整体优势, 表明了本文方法可有效解决复杂环境下导航效果不佳的问题, 且具有较强的泛化能力. 此外, 真实环境下的测试进一步验证了本文方法的潜在应用价值.

    Abstract:

    In order to solve the problem that existing hierarchical navigation methods based on deep reinforcement learning (DRL) perform poorly in complex environments contain structures like long corridors and dead corners, in this study, we propose a navigation method for mobile robot based on option-based hierarchical deep reinforcement learning (HDRL). The framework of the proposed method consists of two low-level control models to obtain policies for avoiding obstacles and reaching the goal respectively and a high-level behavior selection model for automatically learning stable and reliable behavior selection policy, which does not rely on manually designed control rules. In addition, a training method for optimizing the obstacle avoidance control model is proposed, which make the learned obstacle avoidance policy more suitable for the navigation task in complex environments. In comparison with existing DRL-based navigation methods, the proposed method achieves the highest navigation success rate in all simulated test environments used in this paper and shows better overall performance on other metrics, which demonstrates the proposed method can effectively solve the problem of poor navigation performance in complex environments and has strong generalization ability. Moreover, experiments in real-world environment also verify the potential application value of the proposed method.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-06-09
  • 最后修改日期:2021-11-24
  • 录用日期:2021-11-26
  • 在线发布日期: 2022-01-02
  • 出版日期: