Abstract:Flexible job-shop scheduling is a classical and complex combinational optimization problem, which has important theoretical and practical significance for the production optimization of discrete manufacturing systems. A deep reinforcement learning algorithm for the flexible job-shop scheduling problem is designed based on a muti-pointer graph networks framework and a proximal policy optimization algorithm. Firstly, the operation-machine assignment scheduling is represented as a Markov decision process which is composed of two kinds of actions, namely selection operation and allocation machine. Then, the coupling relationship between actions is removed using a decoupling strategy, and a new loss function and a greedy sampling strategy are designed to improve the verification inference performance. Moreover, the state space is expanded to enable the critic network to perceive and evaluate the state more comprehensively, thereby further improving the learning and decision-making capabilities of the algorithm. Simulations and comparations on randomly generated examples and benchmarks show the superior performance and generalization ability of the proposed algrithm.