Fuzzy job shop scheduling problem based on deep reinforcement learning
Author:
Affiliation:
College of Electrical Engineering,Xinjiang University,Urumqi 830047,China
Fund Project:
摘要
|
图/表
|
访问统计
|
参考文献
|
相似文献
|
引证文献
|
资源附件
|
文章评论
摘要:
针对具有模糊加工时间和模糊交货期的作业车间调度问题,以最小化最大完工时间为目标,以近端策略优化(PPO)算法为基本优化框架,提出一种LSTM-PPO(proximal policy optimization with Long short-term memory)算法进行求解.首先,设计一种新的状态特征对调度问题进行建模,并且依据建模后的状态特征直接对工件工序进行选取,更加贴近实际环境下的调度决策过程;其次,将长短期记忆(LSTM)网络应用于PPO算法的行动者-评论者框架中,以解决传统模型在问题规模发生变化时难以扩展的问题,使智能体能够在工件、工序、机器数目发生变化时,仍然能够获得最终的调度解.在所选取的模糊作业车间调度的问题集上,通过实验验证了该算法能够取得更好的性能.
Abstract:
For the job shop scheduling problem with fuzzy processing time and fuzzy delivery time, this paper uses the proximal policy optimization(PPO) algorithm as the basic optimization framework with the objective of minimizing the maximum completion time. An LSTM-PPO(proximal policy optimization with long short-term memory) algorithm is proposed to solve the problem. Firstly, a new state feature is designed to model the scheduling problem, and the process is selected directly based on the modeled state feature, which is closer to the actual scheduling decision process. Them, the long short-term memory(LSTM) network is applied to the actor-commentator framework of the PPO algorithm, which solves the problem that the traditional model is difficult to scale up when the problem size changes, and enables the intelligent body to obtain the final scheduling solution even when the number of workpieces, processes, and machines changes. On the selected problem set of fuzzy job shop scheduling, it is experimentally verified that the algorithm can achieve better performance.