基于改进近端策略优化算法的柔性作业车间调度
CSTR:
作者:
作者单位:

沈阳工业大学

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金青年基金(No. 62003221);辽宁省教育厅科研基金重点攻关计划项目(No. LJKZZ20220021);辽宁省科技计划联合基金项目(2023-MSLH-255);辽宁省教育厅科研基金面上项目(LJKMZ20220509);东北大学流程工业综合自动化国家重点实验室开放课题(2023-kfkt-02).


Flexible job-shop scheduling based on improved proximal policy optimization algorithm
Author:
Affiliation:

Shenyang University of Technology

Fund Project:

Youth Fund of the National Natural Science Foundation of China (Grant No. 62003221) ; Key research projects for basic scientific research in universities of Liaoning Provincial Department of Education (Grant No. LJKZZ20220021);The Joint Fund Project of the Science and Technology Plan of Liaoning Province in China(Grant No. 2023-MSLH-255);Open Project of National Key Laboratory of Integrated Automation for Process Industries, Northeastern University (2023-kfkt-02).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    柔性作业车间调度是经典且复杂的组合优化问题,对于离散制造系统的生产优化具有重要的理论和实际意义.基于多指针图网络框架和近端策略优化算法设计了一种求解柔性作业车间调度问题的深度强化学习算法.首先,将“工序-机器”分配调度过程表征成由选择工序和分配机器两类动作构成的马尔可夫决策过程.其次,通过解耦策略解除动作之间的耦合关系,并设计了新的损失函数和贪婪采样策略以提高算法的验证推理能力.在此基础上,扩充了状态空间,使评估网络能够更全面地感知与评估,从而进一步提升算法的学习和决策能力.在随机生成算例及基准算例上进行仿真和对比分析,验证了算法的良好性能及泛化能力.

    Abstract:

    Flexible job-shop scheduling is a classical and complex combinational optimization problem, which has important theoretical and practical significance for the production optimization of discrete manufacturing systems. A Deep Reinforcement Learning algorithm for flexible job-shop scheduling problem is designed based on Muti-pointer Graph Networks framework and Proximal Policy Optimization algorithm. Firstly, the operation-machine assignment scheduling is represented as a Markov Decision Process which is composed of two kinds of actions, namely selection operation and allocation machine. Secondly, the coupling relationship between actions is removed by decoupling strategy, and a new loss function and greedy sampling strategy are designed to improve the verification inference performance. Moreover, the state space is expanded to enable the critic network to perceive and evaluate the state more comprehensively, thereby further improving the learning and decision-making capabilities of the algorithm. Simulations and comparations on randomly generated examples and benchmarks show the superior performance and generalization ability of the presented approach.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-27
  • 最后修改日期:2024-12-16
  • 录用日期:2024-12-17
  • 在线发布日期: 2024-12-31
  • 出版日期:
文章二维码