Abstract:A two-stage multi-agent deep reinforcement learning method is proposed to address the scheduling challenge of the distributed heterogeneous flexible job shop, posed by multi-model mixed-flow production, complex processes, significant process route variations, and highly heterogeneous manufacturing resources in rail vehicle assembly operations. The scheduling process is modeled as a multi-stage Markov decision process, where decisions encompass job allocation, operation sequencing, and machine selection, and agents are guided by reward design to minimize the global makespan. The upper-level agent, based on a hierarchical heterogeneous graph attention network, extracts the global state of the production line to achieve reasonable job allocation and load balancing across different assembly lines or work zones. The lower-level agent utilizes a dual-agent collaboration strategy and an encoder-decoder structure based on a graph neural network to capture dependencies such as precedence constraints between operations and resource occupancy, enabling local optimization. Based on data from actual operational scenarios, the effectiveness of the proposed method in shortening the manufacturing cycle is validated computationally, and it exhibits good generalization capability.