基于SANER-PPO算法的无人机集群干扰资源分配方法
CSTR:
作者:
作者单位:

1. 国防科技大学 电子对抗学院,合肥 230037;2. 电子制约技术安徽省重点实验室,合肥 230037

作者简介:

通讯作者:

E-mail: yangjunan@ustc.edu.

中图分类号:

TN975

基金项目:

国家自然科学基金项目(62201601).


SANER-PPO algorithm-based jamming resource allocation for UAV swarm
Author:
Affiliation:

1. College of Electronic Engineering,National University of Defense Technology,Hefei 230037,China;2. Anhui Province Key Laboratory of Electronic Restriction,Hefei 230037,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对高动态通信对抗场景下无人机集群协同干扰资源分配问题,提出一种结合状态正态化、优势标准化、熵正则化机制和近端策略优化算法(state normalization,advantage normalization and entropy regularization-based proximal policy optimization,SANER-PPO)的干扰资源分配方法.首先,以无人机集群有效干扰的目标电台数量最大化和消耗的干扰功率最小化为目标函数,建立干扰资源分配优化问题;然后,将无人机集群映射为智能体,根据干扰资源分配模型建立马尔科夫决策过程;最后,利用SANER-PPO算法求解资源分配优化问题,生成无人机集群的干扰波束和干扰功率的优化决策结果.相比于原始PPO算法,SANER-PPO算法将状态正态化机制引入智能体的决策阶段以增强算法的有效性,将优势标准化机制和熵正则化机制引入更新阶段来提升算法的收敛速度和稳定性.结果表明,所提出算法能有效解决协同干扰资源分配问题,相较于原始PPO和柔性演员评论家两种算法,在资源消耗量和有效干扰的成功率方面具有明显优势.进一步,通过逐步移除所提出算法的改进机制来进行消融实验,验证了3种改进机制的有效性.

    Abstract:

    This paper proposes an approach of jamming resource allocation based on an enhanced proximal policy optimization(PPO) algorithm to handle the jamming resource allocation problem of UAV swarms in the scenario of high-dynamic communication countermeasure. The enhanced PPO algorithm combines state normalization, advantage normalization, and entropy regularization mechanisms with the PPO algorithm, which is referred to as the SANER-PPO algorithm in this paper. Firstly, we aim at maximizing the number of target radios which are jammed by a UAV swarm successfully, while minimizing the sum of jamming power consumption of the UAV swarm. Then, the UAV swarm is modeled as agents, and a Markov decision process is established based on the jamming resource allocation model. Finally, an SANER-PPO algorithm is proposed to obtain optimal decisions of jamming beamforming and power allocation. When compared to the original PPO algorithm, the SANER-PPO algorithm not only incorporates a state normalization mechanism into the decision stage of the agent to improve its effectiveness, but also introduces advantage normalization and entropy regularization mechanisms to the update stage to improve the convergence speed and stability of the algorithm. Numerical results demonstrate that the performance of the proposed algorithm outperforms the original PPO algorithm and the soft actor-critic algorithm in terms of successful interference rate and jamming power consumption. In addition, ablation experiments are conducted by gradually removing the three proposed mechanisms in the algorithm, which validate the effectiveness of these mechanisms.

    参考文献
    相似文献
    引证文献
引用本文

刘旖菲,李小帅,杨俊安,等.基于SANER-PPO算法的无人机集群干扰资源分配方法[J].控制与决策,2024,39(12):3937-3945

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-11-20
  • 出版日期: 2024-12-20
文章二维码