Abstract:Defect segmentation technology based on deep learning is crucial to ensure production efficiency and improve product quality. However, there are many areas in which large-scale defect samples cannot be collected in applications, resulting in a sharp decline in the performance of traditional detection methods. In addition, defect regions suffer from small size, weak texture information, and inconspicuous contrast with non-defect regions, which hinder the application of visual detection techniques. This paper proposes a multi-modal few-shot defect segmentation method based on vision and point cloud. Cross-modal attention is used to aggregate RGB semantic information and point cloud structure information to achieve efficient fusion of the two modalities. Then, basic foreground prototypes, adaptive background prototypes and forgetting compensation prototypes are generated by combining multi-modal features and masks to improve representation ability, dynamically match prototypes and query features according to the similarity, and complete effective segmentation of unseen object defects after feature enrichment. Experiments on two few-shot defect segmentation datasets, Defect-3i and Mvtec 3D-2i, show that the mean Intersection-over-Union(mIoU) in 1-shot and 5-shot settings exceeds other advanced algorithms by 0.11% and 0.20%, 5.23% and 5.10%, respectively, verifying the rationality of the proposed few-shot architecture and the advancement of the multi-modal network.