Abstract:Addressing the detection challenges posed by small target sizes, blurred texture features, and dense distributions in aerial imagery, this research proposes a detection model based on an improved YOLO architecture, named drone imagery YOLO (DI-YOLO). Current mainstream detection methods show significant deficiencies in preserving structural information of small targets and multi-scale feature extraction and fusion. Therefore, we innovatively construct a content-aware reassembly of features (CARAFE) module, achieving adaptive fusion of cross-level features through a dynamic feature selection mechanism; simultaneously design a parallel heterogeneous feature modulator (PHFM) that effectively coordinates the relationship between global context modeling and local detail features; and introduce a shape-aware intersection over union (Shape-IoU) loss function and a tiny object detection he ad to further enhance bounding box regression accuracy and small target detection capabilities. Through comparative experiments on the VisDrone2019 and DOTAv1.5 benchmark datasets, the proposed model achieves significant improvements over the baseline YOLOv10 model: on the VisDrone2019 dataset, mAP@0.5 and mAP@0.5 : 0.95 metrics improve by 12.7% and 13.7%, respectively, while on the DOTAv1.5 dataset, corresponding improvements of 12.1% and 10.2% are achieved, with maintained advantages in computational efficiency. Ablation experiments further verify the effectiveness of each module, providing a new solution for high-precision object detection in aerial scenes.