Abstract:Aiming at the problem that the existing object detectors have low accuracy for small objects. This paper proposes a one-stage small object detector SODet, which adaptively fuse the global and local image features. First, Transformer and convolutional neural network (CNN) are combined to construct a backbone network to extract global and local information of the image respectively. Then the adaptive feature selection module AFS is used to fuse the outputs of Transformer and CNN. Secondly, extra-scale feature maps are used in the feature fusion network. At the same time, the large object restraint unit is used to constrain the expression of large object features and transfer small object features. The feature maps of four scales are sent to the prediction network. Finally, in the loss function, EIOU and Focal loss are used to optimize small object detection. The experimental results show that the SODet has 31.5% in terms of APS on the MS COCO verification set, which is more competitive than other algorithms and has a higher inference speed.