基于强化学习的挖掘机时间最优轨迹规划
CSTR:
作者:
作者单位:

太原科技大学 电子信息工程学院,太原 030024

作者简介:

通讯作者:

E-mail: zys8128@163.com.

中图分类号:

TP241

基金项目:

山西省重点研发计划项目(201903D121130);山西省自然科学基金项目(201901D111265);山西省研究生创新项目(2021Y670);太原科技大学科研启动基金项目(20192014).


Time optimal trajectory planning of excavator based on deep reinforcement learning
Author:
Affiliation:

Department of Electronics and Information Engineering,Taiyuan University of Science and Technology,Taiyuan 030024,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对挖掘机的自主作业场景,提出基于强化学习的时间最优轨迹规划方法.首先,搭建仿真环境用于产生数据,以动臂、斗杆和铲斗关节的角度、角速度为状态观测变量,以各关节的角加速度值为动作信息,通过状态观测信息实现仿真环境与自主学习算法的交互;然后,设计以动臂、斗杆和铲斗关节运动是否超出允许范围、完成任务 总时间和目标相对距离为奖励函数对策略网络参数进行训练;最后,利用改进的近端策略优化算法(proximal policy optimization, PPO)实现挖掘机的时间最优轨迹规划.与此同时,与不同连续动作空间的强化学习算法进行对比,实验结果表明:所提出优化算法效率更高,收敛速度更快,作业轨迹更平滑,可有效避免各关节受到较大冲击,有助于挖掘机高效、平稳地作业.

    Abstract:

    Aiming at the autonomous operation scenarios of excavators, a time optimal trajectory planning method based on reinforcement learning is proposed. This method builds a simulation environment to generate data. The angle and velocity of the boom, arm and bucket joints are used as state observation variables, and the angle acceleration of each joint is used as action information, and the simulation environment and autonomous learning are realized through the state observation information. The interaction of the algorithm is designed to train the policy network parameters using whether the joint motion of the boom, arm and bucket exceeds the allowable range, the total time to complete the task and the relative distance of the target as the reward function to train the policy network parameters. Finally, using the improved proximal policy optimization(PPO) realizes the time optimal trajectory planning of the excavator. At the same time, compared with the results of the different reinforcement learning algorithms with continuous action spaces, the experimental results show that the proposed optimization algorithm has higher efficiency, faster convergence speed, and smoother operation trajectory, which can effectively avoid the large impact on each joint and contribute to the efficient and stable operation of the excavator.

    参考文献
    相似文献
    引证文献
引用本文

张韵悦,孙志毅,孙前来,等.基于强化学习的挖掘机时间最优轨迹规划[J].控制与决策,2024,39(5):1433-1440

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-04-17
  • 出版日期: 2024-05-20
文章二维码