Abstract:The high-speed railway has achieved vigorous development in recent years due to its advantages of large transport capacity, fast speed and all-weather. But unexpected events such as bad weather will cause train delays, and even the delay will continue to spread along the road network. The domino effect will cause large-area trains to fail to operate according to the plan. At present, the dynamic scheduling method relying on manual experience is difficult to meet the actual requirements. Therefore, this paper aims at the problem of dynamic scheduling of high-speed train, setting the minimum sum of the delays of all trains at each station as the optimization goal. At the same time, a mixed-integer nonlinear programming (MINLP) model under traversable conditions is constructed, and a policy gradient reinforcement learning method is proposed including establishment of environment, definition of state and action set, policy network, action selection method, reward function and combined with the specific problems, the error amplification and threshold setting of REINFORCE algorithm are improved. Finally, the convergence and the performance improvement of the algorithm are studied and compared with the Q-learning algorithm. The results show that the method proposed in this paper can effectively reschedule high-speed trains, minimize the impact of delays, and improve the efficiency of train operation.