Aiming at the reentrant hybrid flow shop scheduling problem with machine failure constraints (RHFSP-MFC), a proximal policy optimization algorithm GTrXL-PPO (gated transformer xl-proximal policy optimization) based on the gated Transformer model framework is proposed to minimize the maximum completion time as the optimization objective. First, a mathematical model that includes the probability distribution of machine failures is established. Multiple rescheduling strategies are designed for machine failure situations. Then considering workpiece status and machine operating status as input states, and allocating suitable machines to workpieces as actions during scheduling, an innovative dual reward mechanism comprising immediate rewards and task completion rewards is designed, which effectively guides scheduling decisions to achieve intelligent scheduling. By conducting simulation tests on single machine faults and multi-machine faults in different scenarios, the superiority of the proposed algorithm is verified, demonstrating its effectiveness and adaptability in complex scheduling environments.