基于混合注意力的Transformer视觉目标跟踪算法
CSTR:
作者:
作者单位:

1. 西安邮电大学 计算机学院,西安 710121;2. 西安邮电大学 通信与信息工程学院,西安 710121

作者简介:

通讯作者:

E-mail: hzq@xupt.edu.cn.

中图分类号:

TP391.4

基金项目:

国家自然科学基金项目(62072370).


Transformer visual object tracking algorithm based on mixed attention
Author:
Affiliation:

1. School of Computer,Xián University of Posts & Telecommunications,Xián 710121,China;2. School of Communication and Information Engineering,Xián University of Posts & Telecommunications,Xián 710121,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    基于Transformer的视觉目标跟踪算法能够很好地捕获目标的全局信息,但是,在对目标特征的表述上还有进一步提升的空间.为了更好地提升对目标特征的表达能力,提出一种基于混合注意力的Transformer视觉目标跟踪算法.首先,引入混合注意力模块捕捉目标在空间和通道维度中的特征,实现对目标特征上下文依赖关系的建模;然后,通过多个不同空洞率的平行空洞卷积对特征图进行采样,以获得图像的多尺度特征,增强局部特征表达能力;最后,在Transformer编码器中加入所构建的卷积位置编码层,为跟踪器提供精确且长度自适应的位置编码,提升跟踪定位的精度.在OTB100、VOT2018和LaSOT等数据集上进行大量实验,实验结果表明,通过基于混合注意力的Transformer网络学习特征间的关系,能够更好地表示目标特征.与其他主流目标跟踪算法相比,所提出算法具有更好的跟踪性能,且能够达到26帧/s的实时跟踪速度.

    Abstract:

    The Transformer-based visual object tracking algorithm can capture the global information of the target well, but there is a possibility of further improvement in the presentation of the object features. To better improve the expression ability of object features, a Transformer visual object tracking algorithm based on mixed attention is proposed. First, the mixed attention module is introduced to capture the features of the object in the spatial and channel dimensions, so as to model the contextual dependencies of the target features. Second, the feature maps are sampled by multiple parallel dilated convolutions with different dilation rates to obtain the multi-scale features of the images, and enhance the local feature representation. Finally, the convolutional position encoding constructed is added to the Transformer encoder to provide accurate and length-adaptive position coding for the tracker, thereby improving the accuracy of tracking and positioning. The experimental results of the proposed algorithm on OTB100, VOT2018 and LaSOT show that by learning the relationship between features through the Transformer network based on mixed attention, the object features can be better represented. Compared with other mainstream object tracking algorithms, the proposed algorithm has better tracking performance and achieves a real-time tracking speed of 26 frames per second.

    参考文献
    相似文献
    引证文献
引用本文

侯志强,郭凡,杨晓麟,等.基于混合注意力的Transformer视觉目标跟踪算法[J].控制与决策,2024,39(3):739-748

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-02-25
  • 出版日期: 2024-03-20
文章二维码