基于多模态大模型推理的无人机未知环境目标搜索
DOI:
CSTR:
作者:
作者单位:

南开大学

作者简介:

通讯作者:

中图分类号:

T

基金项目:

国家重点研发计划青年科学家项目(2022YFB4701800);国家自然科学基金项目(62303249);京津冀基础研究合作专项项目(24JCZXJC00390);中国博士后科学基金(2024M751526).


Target Searching in Unknown Environments by an Unmanned Aerial Vehicle Based on Multimodal Large Model Inference
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    面向未知环境下的零样本目标搜索任务, 提出一种多模态大模型推理与自主探索相融合的无人机导航方法. 首先, 针对多模态大模型难以处理三维数据的问题, 提出了一种空间-视觉逆映射方法,通过构建具备显式三维坐标约束的场景图像作为多模态大模型输入, 赋予多模态大模型同时理解场景图像与定位关键区域的能力. 然后, 针对现有目标搜索方法泛化性差的问题, 设计了一种蕴含“辨识—评估—转移”逻辑的提示词, 引导无人机实现跨场景条件下的零样本目标搜索. 最后, 针对现有目标搜索方法存在显著仿真—真实差距, 在无人机自主探索框架中引入几何–语义异步增益融合机制与动态评估策略, 实现“空间自主探索”与“语义规律利用”自适应平衡. 仿真结果表明, 在三类 Gazebo 场景中, 所提方法在路径长度、搜索时间及成功率等指标上均明显优于基线方法.此外, 室外未知场景实验验证了所提方法在零样本目标搜索任务中的有效性.

    Abstract:

    This paper proposes a navigation method for unmanned aerial vehicles (UAVs) that integrates multimodal large model reasoning with autonomous exploration technique for tackling zero-shot target searching in unknown environments. First, to address the difficulty of multimodal large models in directly processing 3-D data, a space–vision inverse mapping approach is introduced. By constructing scene images with explicit 3D coordinate constraints as inputs, the multimodal large model is endowed with the capability to simultaneously understand scene imagery and localize key regions. Second, to overcome the poor generalization of existing target searching methods, a prompting strategy embedding a “recognition–evaluation–transfer” logic is designed to guide the UAV in performing zero-shot target searching across diverse scenarios. Finally, to mitigate the significant sim-to-real gap that existing methods are currently struggling with, a geometric–semantic asynchronous gain fusion mechanism and a dynamic evaluation strategy are incorporated into an UAV autonomous exploration framework, achieving an adaptive balance between “spatial autonomous exploration”and “semantic regularity exploitation”. Simulation results in three Gazebo environments show that the proposed method significantly outperforms baseline approaches in terms of path length, search time, and success rate. In addition, real-world experiments conducted in unknown outdoor environments demonstrate the effectiveness of the proposed method on zero-shot target searching tasks.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-25
  • 最后修改日期:2026-05-10
  • 录用日期:2026-05-13
  • 在线发布日期:
  • 出版日期:
文章二维码