多模态引导下基于不确定挖掘的开放集目标检测
CSTR:
作者:
作者单位:

江南大学

作者简介:

通讯作者:

中图分类号:

T18

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Uncertainty mining-based open-set object detection with multimodal guidance
Author:
Affiliation:

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    开放集目标检测中通常采用对比学习方法虽能使类内聚拢类间分离,然而未能充分考虑未知目标的分布特性,导致当已知目标与未知目标特征相似时易发生误判。从本质上看,该问题源于开放集场景下已知与未知目标在视觉与语义表示层面的特征混淆,以及模型在判别边界区域容易产生过度自信预测。为此,本文从语义表示与判别置信两个互补维度出发,提出了一种在多模态引导下挖掘样本不确定性的新型网络框架。具体而言,网络首先设计区域生成模块,生成大量无类别候选框区域。其次,通过设计区域-文本匹配模块,引入文本模态构建区域-文本对齐损失,进而在多模态信息的引导下实现已知类与未知类在特征空间的显式分离;同时利用视觉特征对比损失进一步聚拢语义簇,从而构建紧凑的已知边界。在此基础上,为抑制模型在边界区域的过度自信预测,并识别潜在未知目标,在区域-文本匹配分数的引导下,建立基于双重不确定性的伪未知样本挖掘机制,联合运用归因梯度思想设计特征不确定性估计方法计算候选框的不确定性值,并通过视觉定位质量进行校准优化,最终筛选出高质量伪未知样本并构建已知与未知的自适应阈值边界。实验结果表明,相比目前的SOTA,该方法在VOC-COCO-60测试数据集上,将未知类的平均精确度提升了165.14%,充分验证了其有效性和优越性。

    Abstract:

    In open-set object detection, contrastive learning methods are commonly employed to enforce intra-class compactness and inter-class separability. However, they often fail to adequately consider the distribution characteristics of unknown objects, leading to misclassification when known and unknown object features are similar.Fundamentally, this issue arises from feature confusion between known and unknown objects in visual and semantic representations under open-set scenarios, as well as the model"s tendency to make overconfident predictions near decision boundaries. To address this, we propose a novel network framework that explores sample uncertainty under multi-modal guidance, considering both semantic representations and discriminative confidence as complementary dimensions.Specifically, the network first designs a region generation module to produce a large number of category-agnostic candidate regions. Then, a region-text matching module is introduced, leveraging textual modalities to construct a region-text alignment loss, which explicitly separates known and unknown classes in the feature space under multi-modal guidance. Simultaneously, a visual feature contrastive loss is applied to further compact the semantic clusters, thereby establishing tight boundaries for known classes.On this basis, to suppress overconfident predictions near boundaries and identify potential unknown objects, a pseudo-unknown sample mining mechanism guided by region-text matching scores is developed. This mechanism jointly applies the concept of attribution gradients to estimate feature uncertainty for candidate regions, and calibrates it with visual localization quality. High-quality pseudo-unknown samples are then selected, enabling the construction of adaptive boundaries between known and unknown classes.Experimental results demonstrate that, compared with the current state-of-the-art, the proposed method improves the average precision of unknown classes by 165.14% on the VOC-COCO-60 benchmark, validating its effectiveness and superiority.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-09-26
  • 最后修改日期:2026-03-30
  • 录用日期:2026-03-31
  • 在线发布日期: 2026-04-10
  • 出版日期:
文章二维码