Abstract:In open-set object detection, contrastive learning methods are commonly employed to enforce intra-class compactness and inter-class separability. However, they often fail to adequately consider the distribution characteristics of unknown objects, leading to misclassification when known and unknown object features are similar.Fundamentally, this issue arises from feature confusion between known and unknown objects in visual and semantic representations under open-set scenarios, as well as the model"s tendency to make overconfident predictions near decision boundaries. To address this, we propose a novel network framework that explores sample uncertainty under multi-modal guidance, considering both semantic representations and discriminative confidence as complementary dimensions.Specifically, the network first designs a region generation module to produce a large number of category-agnostic candidate regions. Then, a region-text matching module is introduced, leveraging textual modalities to construct a region-text alignment loss, which explicitly separates known and unknown classes in the feature space under multi-modal guidance. Simultaneously, a visual feature contrastive loss is applied to further compact the semantic clusters, thereby establishing tight boundaries for known classes.On this basis, to suppress overconfident predictions near boundaries and identify potential unknown objects, a pseudo-unknown sample mining mechanism guided by region-text matching scores is developed. This mechanism jointly applies the concept of attribution gradients to estimate feature uncertainty for candidate regions, and calibrates it with visual localization quality. High-quality pseudo-unknown samples are then selected, enabling the construction of adaptive boundaries between known and unknown classes.Experimental results demonstrate that, compared with the current state-of-the-art, the proposed method improves the average precision of unknown classes by 165.14% on the VOC-COCO-60 benchmark, validating its effectiveness and superiority.