面向多阶段动态目标的航空复杂装备集成系统智能优化方法
CSTR:
作者:
作者单位:

1.中航贵州飞机有限责任公司、南京航空航天大学经管学院;2.成都飞机工业(集团)有限责任公司;3.南京航空航天大学

作者简介:

通讯作者:

中图分类号:

TP301.6; TP391.9

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Intelligent Optimization Method for Integrated System of Aviation Complex Equipment Oriented to Multi-stage Dynamic Objectives
Author:
Affiliation:

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对航空复杂装备集成系统多阶段目标偏好动态变化与随机扰动实时响应的复杂调度问题,提出了一种基于NSGA-II与近端策略优化(PPO)的强化学习双向协同优化机制的多目标调度方法.通过构建“离线全局优化-在线动态决策”双向闭环,实现了调度策略的持续自进化.首先,设计基于 PPO的强化学习智能体,实时感知系统状态与扰动,动态调整时间、质量、成本的优化权重,以捕获动态偏好与扰动响应需求;其次,改进NSGA-II算法的非支配排序与拥挤度计算,将实时动态权重以偏好支配关系和加权拥挤度的形式深度嵌入进化过程,引导种群向符合当前战略偏好的Pareto区域收敛.二者通过“离线规则知识注入”与“在线学习经验反馈”双回路紧密耦合.实证研究表明,本方法在超体积(HV)指标上较传统固定权重方法提升20.1%,扰动平均恢复时间缩短41.7%,并在订单延误率、返工率及成本超支率等关键绩效指标上均显著优于对比算法.该方法对于复杂装备集成系统具有良好的通用性,核心算法可拓展至航天、船舶等复杂装备集成的智能优化问题.

    Abstract:

    For the complex scheduling problem involving multi-stage goal preferences with dynamic changes and random disturbances in integrated systems of aviation complex equipment, a multi-objective scheduling method based on a bidirectional collaborative optimization framework integrating NSGA-II and Proximal Policy Optimization (PPO) is proposed. This approach enables continuous self-evolution of scheduling strategies by establishing an "offline global optimization – online dynamic decision-making" closed-loop mechanism. First, the design of a reinforcement learning agent based on PPO, capable of real-time perception of system states and disturbances, enabling dynamic adjustment of optimization weights for time, quality, and cost to capture evolving preference priorities and disturbance response requirements.Subsequently, an enhanced version of the NSGA-II algorithm incorporating improved non-dominated sorting and crowding distance calculation, where real-time dynamic weights are embedded through a preference-based dominance relation and weighted crowding distance to guide population convergence toward the Pareto-optimal region aligned with current operational preferences. These two components are tightly coupled via dual interaction loops: "offline rule-based knowledge injection" and "online experience feedback from learning." Empirical research demonstrates that the proposed method achieves a 20.1% improvement in the hypervolume (HV) metric compared to traditional fixed-weight approaches, reduces the average disturbance recovery time by 41.7%, and significantly outperforms benchmark algorithms in key performance indicators such as order delay rate, rework rate, and cost overrun rate. Furthermore, the method exhibits strong generalizability across various complex equipment integration systems, and its core algorithm can be extended to intelligent optimization problems in aerospace and shipbuilding domains, demonstrating broad prospects for engineering applications.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-11-09
  • 最后修改日期:2026-01-05
  • 录用日期:2026-01-05
  • 在线发布日期: 2026-01-17
  • 出版日期:
文章二维码