Abstract:To enhance the performance of tracking algorithms in utilizing historical frame information and articulating target features, this paper proposes the Feature Enhancement and History Frame Selection based Transformer Visual Tracking (FEHST). Firstly, a Dynamic Prediction Module is integrated into the backbone network with a sparsification strategy to enhance the self-attention mechanism"s computational efficiency, focusing on the target region"s features. Secondly, a Feature Enhancement Module is introduced, merging local and global information to improve feature representation. Finally, an adaptive history frame selection strategy is adopted to enhance focus on target dynamics and algorithm robustness. Experiments on LaSOT, TrackingNet, GOT-10K, and OTB100 datasets validate the approach, showing success rates of 70.1%, 83.0%, and 71.6%, and a 71.4% average overlap on GOT-10K, at 27 FPS.