Abstract:In camouflaged object detection, most previous studies on feature fusion have predominantly used multi-level feature integration while neglecting the differences between various features. In this paper, a global context interaction fusion Network is proposed for camouflaged object detection, which employs an improved Pyramid Vision Transformer (PVTv2) model as the backbone network to extract global context information at multiple scales. First, a Boundary Enhancement Module is designed to focus on the structural details of camouflaged objects and acquire the edge features of objects. Second, inspired by the hunting mechanisms of animals, a Feature Fusion Decoder Module is proposed, which provides position information for potential object localization to produce a rough localization map. Finally, a Global Context Aggregation Module is constructed to fully interact with multi-level information and reduce information loss during feature aggregation. Extensive experiments on four publicly available datasets demonstrate that our method surpasses that of 17 other state-of-the-art models under four evaluation metrics.