双模板跨模态交互与前景选择的高效RGB-T目标跟踪
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

黑龙江省交通运输厅科技项目(HJK2024B002);黑龙江省“优秀青年教师基础研究支持计划”重点项目(YQJH2024064).


Efficient RGB-T object tracking network with dual-template cross-modality interaction and foreground selection
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    利用可见光(RGB)和热红外(TIR)双模态信息间的互补性可以克服单模态跟踪在恶劣环境下的局限性. 目前基于RGB-T的目标跟踪方法不能充分利用模态间信息, 而且额外模态的引入会导致计算量增大. 为此, 提出双模板跨模态交互与前景选择的高效RGB-T目标跟踪网络, 对两个模态的模板图像进行融合构建融合模板图像分支, 利用融合模板图像特征和两个模态模板图像特征作为模态交互的纽带, 克服不同模态图像中心存在的偏差导致两种模态信息利用不充分问题; 利用极性感知线性注意力构建Transformer编码器, 减少ViT(Vision Transformer)中的多头注意力机制带来的复杂计算量, 提高模型的效率; 通过极性感知线性注意力返回的注意力构建前景选择模块, 去除无关背景特征, 提高跟踪精度的同时减少背景特征带来的计算量. 实验结果表明, 所提出网络在LasHeR数据集上跟踪成功率达到57.1%, 精确率达到71.2%, 相较于模板连接搜索区域交互算法(TBSI)分别提升1.1%和1.5%, 跟踪速度相较于TBSI提升3.5%, 在RGB-T目标跟踪任务中取得了较好效果.

    Abstract:

    To overcome single-modal tracking limitations in adverse environments, we exploit complementary information from visible (RGB) and thermal infrared (TIR) modalities. However, existing visible and thermal infrared (RGB-T) tracking frameworks often inadequately leverage inter-modal correlations or efficiently mitigate computational overhead from dual-modal fusion. We propose an efficient RGB-T tracker with dual-template cross-modality interaction and foreground selection. The template images from the two modalities are fused to construct a merged template branch, and both the fused template features and the individual modal template features are used as a bridge for cross-modal interaction, thereby addressing the center misalignment between modalities and fully leveraging information from both sources. To reduce computational burden from Vision Transformer(ViT)'s multi-head attention, we construct a polarity-aware linear attention Transformer encoder. Additionally, a foreground selection module processes PolaFormer's attention maps to eliminate background features, enhancing precision while lowering computational load. On the LasHeR dataset, the proposed method achieves success rate and precision of 57.1% and 71.2%, respectively. This represents an improvement of 1.1% and 1.5% over TBSI, with a tracking speed 3.5% higher, surpassing state-of-the-art RGB-T tracking approaches.

    参考文献
    相似文献
    引证文献
引用本文

柳长源,范培栋,兰朝凤.双模板跨模态交互与前景选择的高效RGB-T目标跟踪[J].控制与决策,2025,40(12):3725-3733

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-05-13
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-11-10
  • 出版日期: 2025-12-10
文章二维码