一种时限可达域引导的航天器博弈决策学习方法

doi:10.13195/j.kzyjc.2025.0313

首页 > 过刊浏览>2025年第40卷第12期 >3678-3688. DOI:10.13195/j.kzyjc.2025.0313

一种时限可达域引导的航天器博弈决策学习方法
DOI:
                        10.13195/j.kzyjc.2025.0313
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:V474.2+8
基金项目:国家自然科学基金项目(62273277).

A time-limited reachable domain-guided learning method for spacecraft game decision-making

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对在复杂空间博弈场景中, 脉冲推力驱动的航天器追逃博弈决策实时性受限和传统奖励函数较难适应远距离和高动态对抗学习环境的问题, 对航天器博弈对抗的智能机动决策和燃料优化展开研究. 首先, 建立轨道博弈动力学和机动约束模型; 然后, 提出一种具有时间约束的航天器单脉冲可达域求解方法, 并结合神经网络对轨道危险区进行量化拟合; 接着, 基于分布式系统架构设计层次强化学习框架, 采用近端策略优化(PPO)算法开展红蓝对抗学习训练; 最后, 对所提出机动策略进行验证. 仿真结果表明, 在二体动力学轨道博弈场景中, 危险区策略可使得平均燃料消耗降低33.81%, 博弈策略相较于传统方法打靶率平均可提升38.41%.

Abstract:

To address the issues of real-time decision-making limitations in impulse-thrust-driven spacecraft pursuit-evasion games and the incapability of traditional reward functions to adapt to long-distance high-dynamic adversarial learning environments, this paper investigates intelligent maneuver decision-making and fuel optimization for spacecraft game confrontations. Firstly, the orbital game dynamics and maneuver constraint model are established. Secondly, a time-constrained single-impulse reachable domain solving method for spacecraft is proposed, and neural networks are integrated to perform quantitative fitting of orbital danger zones. Furthermore, a hierarchical reinforcement learning control framework is designed based on a distributed system architecture, and the proximal policy optimization (PPO) algorithm is employed to carry out red-blue adversarial learning training. Finally, the proposed maneuver strategies are validated. Simulation results demonstrate that in the two-body dynamics orbital game scenario, the danger zone strategy reduces average fuel consumption by 33.81%, and the game strategies improve the hit rate by an average of 38.41% compared with traditional strategies.

参考文献

相似文献

引证文献

引用本文

乔贝贝,刘薛怡,钱寒雨,等.一种时限可达域引导的航天器博弈决策学习方法[J].控制与决策,2025,40(12):3678-3688

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-03-26
最后修改日期:
录用日期:
在线发布日期: 2025-11-10
出版日期: 2025-12-10

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码