注意力置换与通道重建的无人机城市街景实时语义分割
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金项目(11804068);黑龙江省交通运输厅科技项目(HJK2024B002).


Real-time semantic segmentation of UAV urban street scenes with attention permutation and channel reconstruction
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对无人机城市街景实时语义分割任务中轻量级算法缺乏全局信息交互导致像素类别错分的问题, 提出一种注意力置换与通道重建的无人机城市街景实时语义分割网络, 网络采用编码-解码结构. 在编码器中, 利用轻量级的置换自注意力机制来构建注意力分支, 提取全局上下文信息的同时保持较高的计算效率; 利用分裂-变换-融合的策略设计通道重建模块对注意力分支的输入进行融合压缩, 减小无关特征带来的计算量和对分割结果的影响. 在解码器阶段, 利用空间权重加权构建空间特征融合模块, 实现对有效特征最大程度上的利用; 利用置换自注意力机制和非对称卷积构建全局信息感知模块, 以克服无人机航拍图像中复杂背景的干扰. 实验结果表明: 所提模型在UAVid验证集上平均交并比达到72.3 %, 相较于UNetFormer提升了 2.3%, 分割速度达到每秒105.8帧; 在保证模型分割速度的前提下, 取得了较好的分割精度.

    Abstract:

    In response to the issue of misclassification of pixel categories caused by the lack of global information interaction in lightweight algorithms for real-time semantic segmentation of urban street scenes by drones, a real-time semantic segmentation of UAV urban street scenes with attention permutation and channel reconstruction is proposed, adopting an encoder-decoder structure. In the encoder, a lightweight permutation self-attention mechanism is utilized to construct an attention branch, extracting global context information while maintaining high computational efficiency. By employing the split-transform-merge strategy, a channel reconstruction module is designed to fuse and compress the input of the attention branch, reducing the computational complexity caused by irrelevant features and their impact on segmentation results. In the decoder stage, a spatial feature fusion block is constructed using spatially weighted fusion, maximizing the utilization of effective features. Moreover, a permutation self-attention mechanism and asymmetric convolution are utilized to construct a global information perception block to overcome the interference of complex backgrounds in UAV aerial images. Experimental results show that the proposed model achieves a mean intersection over union of 72.3% on the UAVid validation set, which is 2.3% improvement compared to UNetFormer, with the segmentation speed of 105.8 frames per second. It achieves good segmentation accuracy while ensuring model segmentation speed.

    参考文献
    相似文献
    引证文献
引用本文

柳长源,郭鹏岗,兰朝凤.注意力置换与通道重建的无人机城市街景实时语义分割[J].控制与决策,2025,40(4):1198-1206

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-27
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-03-21
  • 出版日期: 2025-04-20
文章二维码