Abstract:For the complex scheduling problem involving multi-stage goal preferences with dynamic changes and random disturbances in integrated systems of aviation complex equipment, a multi-objective scheduling method based on a bidirectional collaborative optimization framework integrating NSGA-II and Proximal Policy Optimization (PPO) is proposed. This approach enables continuous self-evolution of scheduling strategies by establishing an "offline global optimization – online dynamic decision-making" closed-loop mechanism. First, the design of a reinforcement learning agent based on PPO, capable of real-time perception of system states and disturbances, enabling dynamic adjustment of optimization weights for time, quality, and cost to capture evolving preference priorities and disturbance response requirements.Subsequently, an enhanced version of the NSGA-II algorithm incorporating improved non-dominated sorting and crowding distance calculation, where real-time dynamic weights are embedded through a preference-based dominance relation and weighted crowding distance to guide population convergence toward the Pareto-optimal region aligned with current operational preferences. These two components are tightly coupled via dual interaction loops: "offline rule-based knowledge injection" and "online experience feedback from learning." Empirical research demonstrates that the proposed method achieves a 20.1% improvement in the hypervolume (HV) metric compared to traditional fixed-weight approaches, reduces the average disturbance recovery time by 41.7%, and significantly outperforms benchmark algorithms in key performance indicators such as order delay rate, rework rate, and cost overrun rate. Furthermore, the method exhibits strong generalizability across various complex equipment integration systems, and its core algorithm can be extended to intelligent optimization problems in aerospace and shipbuilding domains, demonstrating broad prospects for engineering applications.