基于强化学习的扑翼飞行器路径规划算法

doi:10.13195/j.kzyjc.2020.1574

首页 > 过刊浏览>2022年第37卷第4期 >851-860. DOI:10.13195/j.kzyjc.2020.1574

基于强化学习的扑翼飞行器路径规划算法
DOI:
                        10.13195/j.kzyjc.2020.1574
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:浙江大学 航空航天学院,杭州 310027
作者简介:
通讯作者:E-mail: duchangping@zju.edu.cn.
中图分类号:TP242
基金项目:装备预研教育部联合基金重点项目(6141A02011803).

Local planner for flapping wing micro aerial vehicle based on deep reinforcement learning

Author:

Affiliation:

School of Aeronautics and Astronautics,Zhejiang University,Hangzhou 310027,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对扑翼飞行器机动性能弱的问题,提出一种在未知环境下示教学习辅助的强化学习局部路径规划算法(IL-PPO2).首先,基于扑翼飞行器的受限视角的双目感知系统,提出一种心形避障算法,降低避障时对扑翼飞行器控制精度的要求,提高避障鲁棒性;其次,根据心形避障算法的特性,提出一种U型障碍的避障策略;最后,提出一种示教学习辅助的强化学习局部路径规划算法,将心形避障算法与局部路径规划算法相结合,实现扑翼飞行器的局部路径规划.仿真结果表明:与TD3fD强化学习算法相比,IL-PPO2算法能够缩短模型训练时间,路径规划效率与成功率明显高于TD3fD算法;与动态窗口法(DWA)相比,IL-PPO2算法能够提高路径规划的成功率,并且有效融合心形算法,提高路径的平滑程度.

Abstract:

For the poor maneuverability of flapping wing micro aerial vehicles(FWMAVs), a deep reinforcement learning(DRL) based local path planning method(IL-PPO2) is proposed with the assistant of demonstration learning in an unknown environment. Firstly, due to the limited visual angle of a stereo camera on a FWMAV, a“Heart” algorithm is proposed to reduce the requirement for control accuracy and meanwhile improve robustness. Then, according to the characteristics of the Heart algorithm, a U trap avoidance framework is developed. Finally, with the help of demonstration learning, a DRL based local path planning method is put forward, which is realized with the combination of the Heart algorithm and local planner. According to the simulation results, compared to the TD3fD DRL method, the path planning efficiency and success rate of the IL-PPO2 is higher than the TD3fD with shorter training time. Besides, compared to the dynamic window approach(DWA), the success rate of the IL-PPO2 is improved, and the path smoothness is promoted considering the integration of the Heart algorithm.

参考文献

相似文献

引证文献

引用本文

王思鹏,杜昌平,郑耀.基于强化学习的扑翼飞行器路径规划算法[J].控制与决策,2022,37(4):851-860

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2022-04-28
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

分享

文章指标

历史

文章二维码