Abstract:To address the limitations of existing lightweight object detection algorithms, such as inadequate detection accuracy, weak feature fusion capability, and suboptimal inference speed, this paper proposes a lightweight object detection algorithm based on spatial group-wise involution, built upon the YOLOv8n framework. A novel spatial group-wise involution (SGWInvo) is introduced to enhance spatial information modeling and overcome the limitations of standard involution operations. Based on SGWInvo, a lightweight backbone network named SGWInvo and Conv Net (SCNet) is designed to replace the original YOLOv8n backbone. Additionally, a dual path aggregation network (DPAN) is proposed to enhance the feature fusion capability for multi-scale objects. Finally, depth-wise separable convolutions are adopted to lighten the detection head, and a step-by-step training strategy, YOLO2YOLO, is adopted to eliminate inference latency caused by non-maximum suppression (NMS). Two detection methods are presented: SGWInvo-YOLO, with one-to-many matching, and SGWInvo-YOYO, with one-to-one matching. Experiments on the COCO dataset show that, compared to YOLOv8n, both proposed algorithms reduce the parameter count by 23.3%. SGWInvo-YOLO achieves comparable inference speed with a 3.0% improvement in mAP0.5, while SGWInvo-YOYO reduces inference latency by 10.5% and improves mAP0.5 by 2.3%