基于深度强化学习求解作业车间机器与 AGV联合调度问题
CSTR:
作者:
作者单位:

重庆大学 机械传动国家重点实验室,重庆 400044

作者简介:

通讯作者:

E-mail: leiqi@cqu.edu.cn.

中图分类号:

TP8

基金项目:

国家自然科学基金项目(51205429).


Deep reinforcement learning for solving the joint scheduling problem of machines and AGVs in job shop
Author:
Affiliation:

State Key Laboratory of Mechanical Transmission,Chongqing University,Chongqing 400044,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对作业车间中自动引导运输车(automated guided vehicle,AGV)与机器联合调度问题,以完工时间最小化为目标,提出一种基于卷积神经网络和深度强化学习的集成算法框架.首先,对含AGV的作业车间调度析取图进行分析,将问题转化为一个序列决策问题,并将其表述为马尔可夫决策过程.接着,针对问题的求解特点,设计一种基于析取图的空间状态与5个直接状态特征;在动作空间的设置上,设计包含工序选择和AGV指派的二维动作空间;根据作业车间中加工时间与有效运输时间为定值这一特点,构造奖励函数来引导智能体进行学习.最后,设计针对二维动作空间的2D-PPO算法进行训练和学习,以快速响应AGV与机器的联合调度决策.通过实例验证,基于2D-PPO算法的调度算法具有较好的学习性能和可扩展性效果.

    Abstract:

    Aiming at the joint scheduling problem of automated guided vehicle(AGV) and machines in the job shop, an integrated algorithm framework based on convolutional neural network and deep reinforcement learning is proposed with the goal of minimizing the completion time. Firstly, the job shop scheduling disjunction graph containing an AGV is analyzed, and the problem is transformed into a sequential decision problem, which is expressed as the Markov decision process. Then, according to the solving characteristics of the problem, a spatial state and five direct state features based on the disjunctive graph are designed. In the setting of the action space, a two-dimensional action space including process selection and AGV assignment is designed. According to the characteristics of fixed value of processing time and effective transportation time in the work workshop, a reward function is constructed to guide the agent to learn. Finally, a 2D-PPO algorithm for two-dimensional action space is designed for training and learning to quickly respond to the joint scheduling decision of the AGV and machine. Through case verification, the scheduling algorithm based on the 2D-PPO algorithm has good learning performance and scalability effect.

    参考文献
    相似文献
    引证文献
引用本文

孙爱红,雷琦,宋豫川,等.基于深度强化学习求解作业车间机器与 AGV联合调度问题[J].控制与决策,2024,39(1):253-262

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-12-14
  • 出版日期: 2024-01-20
文章二维码