Abstract:In response to the issue of misclassification of pixel categories caused by the lack of global information interaction in lightweight algorithms for real-time semantic segmentation of urban street scenes by drones, a real-time semantic segmentation of UAV urban street scenes with attention permutation and channel reconstruction is proposed, adopting an encoder-decoder structure. In the encoder, a lightweight permutation self-attention mechanism is utilized to construct an attention branch, extracting global context information while maintaining high computational efficiency. By employing the split-transform-merge strategy, a channel reconstruction module is designed to fuse and compress the input of the attention branch, reducing the computational complexity caused by irrelevant features and their impact on segmentation results. In the decoder stage, a spatial feature fusion block is constructed using spatially weighted fusion, maximizing the utilization of effective features. Moreover, a permutation self-att