面向智能空中博弈的大语言模型-强化学习分层决策算法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金项目(62293510/62293513);天津市自然科学基金项目(22JCZDJC00810).


LLM-RL hierarchical decision-making algorithm for intelligent aerial combat
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在多机智能空中博弈等复杂且高对抗性的场景下, 同时具备精准微操决策能力与高效战术推理能力, 是实现多机紧密协同并夺取制胜优势的关键. 针对现有强化学习方法在多机智能空中博弈过程中面临的策略泛化性差且缺乏高层推理能力的挑战, 提出一种融合大语言模型与深度强化学习的分层决策算法(LRHDF). 首先, 借鉴人类飞行员的决策机制, 构建“大语言模型-强化学习”(大脑-躯干)分层决策架构, 有效提高算法的底层微操决策性能与上层认知推理能力; 其次, 基于大语言模型反思的提示迭代机制, 利用环境反馈作为优化信号, 驱动提示指令的持续自主进化; 最后, 受人类团队协同决策机理启发, 设计序贯协同决策机制, 显式建模多智能体协作模式, 提高多智体间协同效率. 在高保真空中博弈平台下的仿真结果与消融结果表明, 相较于传统强化学习类算法, 所提出算法在多类博弈场景下表现出更强的博弈性能与泛化能力, 为多机空中博弈问题的求解提供了一条可行的技术路径.

    Abstract:

    In complex and highly adversarial scenarios such as multi-UAV intelligent aerial combat, simultaneous mastering precise micro-operation decision-making and efficient tactical reasoning is essential for achieving close coordination and gaining dominant advantages. To address the limitations of poor policy generalization and inadequate high-level reasoning capabilities in existing reinforcement learning (RL) methods for such scenarios, this paper introduces a hierarchical decision-making framework integrating large language models (LLMs) with RL (LRHDF). First, inspired by human pilots' decision-making processes, a “LLM-RL” hierarchical framework is constructed, which effectively enhances both low-level micro-operation performance and high-level cognitive reasoning ability. Then, a reflection-based prompt iteration mechanism is implemented, which uses environmental feedback as an optimization signal to optimize prompt instructions continuously. Finally, drawing from human team collaboration, a sequential cooperative decision-making module is developed to explicitly model multi-agent collaboration patterns, thereby improving coordination efficiency. Simulation results and ablation studies on a high-fidelity aerial combat platform demonstrate that the proposed algorithm outperforms traditional RL methods in adversarial performance and generalization across diverse combat scenarios, providing a viable solution for multi-UAV aerial combat challenges.

    参考文献
    相似文献
    引证文献
引用本文

骞晨旭,张雪波,李论,等.面向智能空中博弈的大语言模型-强化学习分层决策算法[J].控制与决策,2026,41(3):855-864

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-11-24
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-03-04
  • 出版日期: 2026-03-10
文章二维码