面向多动态目标基于拍卖机制与MASAC的AUV协同围捕
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP249

基金项目:

四川省科技厅重点研发计划项目(2023YFG0285);国家自然科学基金项目(52075456).


AUV cooperative hunting based on auction mechanism and MASAC for multiple dynamic targets
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对多动态目标的自主水下航行器集群协同围捕决策与控制问题, 提出一种融合拍卖机制与多智能体深度强化学习的围捕算法. 该方法将围捕任务分解为目标分配和运动控制两个阶段: 首先, 基于最优控制理论中的配点法, 综合考虑围捕态势、最短时间和最低能耗等优化目标, 生成训练数据与竞标值标签, 并利用监督学习训练拍卖神经网络, 实现了自主水下航行器的实时目标分配; 接着, 构建分配后的个体状态空间, 设计多目标围捕奖励函数, 采用多智能体柔性演员-评论家算法, 优化了围捕策略. 高效、自适应的拍卖算法确保了动态复杂环境下的快速目标分配, 多智能体强化学习则提升了群体的协同控制快速响应能力. 最后, 开展不同场景中的围捕实验. 实验结果表明, 所提方法能够显著提高围捕策略的表现效果, 在应对2、3和4个动态目标时, 平均围捕成功率分别为79.04%、89.78%和90.43%, 相较于基线方法, 分别提升了48.41%、54.00%和53.93%, 即所提算法在处理不同规模围捕任务时均具有更好的效果.

    Abstract:

    To address the decision-making and control problem of collaborative hunting of autonomous underwater vehicle (AUV) swarms with multiple dynamic targets, this paper proposes a hunting algorithm integrating auction mechanisms and multi-agent deep reinforcement learning. The method decomposes the hunting task into two stages: target allocation and motion control. Firstly, based on the point-matching method from optimal control theory, training data and bid value labels are generated, taking into account optimization objectives such as hunting posture, minimum time, and minimum energy consumption. The auction neural network is trained using supervised learning, achieving real-time target allocation for the AUVs. Next, the allocated individual state space is constructed, a multi-target hunting reward function is designed, and a multi-agent soft actor-critic algorithm is employed to optimize the hunting strategy. The efficient and adaptive auction algorithm ensures rapid target allocation in dynamic and complex environments, while multi-agent reinforcement learning enhances the swarm's rapid response capability in collaborative control. Finally, hunting experiments are conducted in various scenarios. Experimental results show that the proposed method can significantly improve the performance of the hunting strategy. When dealing with 2, 3 and 4 dynamic targets, the average roundup success rates are 79.04%, 89.78% and 90.43%, respectively. Compared with the baseline method, they are increased by 48.41%, 54.00% and 53.93%, respectively. In other words, the proposed algorithm has better performance in handling hunting tasks of different scales.

    参考文献
    相似文献
    引证文献
引用本文

谢地杰,李敏,曾祥光,等.面向多动态目标基于拍卖机制与MASAC的AUV协同围捕[J].控制与决策,2026,41(5):1229-1241

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-05
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-04-17
  • 出版日期: 2026-05-10
文章二维码