Abstract:Aiming at the problem that the accuracy rate is not high due to the ineffective use of cross-scale information in the pyramid structure and the disappearance of gradient in the object detection of YOLOv5. This paper introduces the FPT (Feature Pyramid Transformer) structure to improve the FPN (Feature Pyramid Networks) structure and PAN(Path Aggregation Network) structure in the original YOLOv5 network model. The attention mechanism is used to effectively extract the cross-scale features of the network to improve the accuracy of object detection. Aiming at the gradient disappearance problem after the deepening of the network model, a skip connection structure is added at both ends of the FPT structure to improve the network object detection ability and transfer the salient features at the same time. The Mish activation function is introduced to improve the accuracy of object detection. Combined with the above structure, the fs-yolov5 network model is proposed. Experimental results on Pascal VOC datasets and MSCOCO datasets show that the detection accuracy of this algorithm is improved compared with YOLOv5.