基于自监督增强特征的直推式零样本图像分类
CSTR:
作者:
作者单位:

中国矿业大学 信息与控制工程学院,江苏 徐州 221116

作者简介:

通讯作者:

E-mail: chengyuhu@163.com.

中图分类号:

TP18

基金项目:

国家自然科学基金项目(62176259,61976215);江苏省自然科学基金项目(BK20221116);江苏省卓越博士后计划项目(2022ZB530).


Transductive zero-shot image classification based on self-supervised enhancement feature
Author:
Affiliation:

School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    图像的视觉特征对实现零样本图像分类有至关重要的作用.尽管目前VGG、GoogLeNet和ResNet等网络提取的深度特征在图像分类领域获得了广泛的应用,但其在零样本图像分类问题上的表现并不理想,仍旧存在较大的提升空间.此外,由于零样本学习场景下训练集与测试集不相交的设定,导致分类网络不可避免地存在领域偏移问题.为此,提出一种基于自监督增强特征的直推式零样本图像分类框架.首先,通过辅助任务构造伪标签,利用自监督学习获得图像的自监督特征并将其与无监督深度特征进行特征融合;然后,将融合特征嵌入语义空间中进行零样本图像分类,并获得未见类的初始预测标签;最后,利用未见类特征和预测标签迭代地优化视觉-语义映射.所提出框架组件可选择,框架组件自监督网络、主干网络和降维网络分别选用CFN、VGG16和PCA构成网络.在CUB、SUN和AwA2数据集上的实验结果表明,所提出网络能够增强特征的判别能力,在零样本图像分类问题上表现良好.

    Abstract:

    The visual features of images play a crucial role in realizing zero-shot image classification. Although the deep features extracted by networks such as VGG, GoogLeNet, and ResNet have been widely used in the field of image classification, their performance in zero-shot image classification is not ideal. In addition, due to the disjoint setting of the training and testing sets under the zero-shot learning scenario, the classification network inevitably suffers from the problem of domain shift. Therefor, a transductive zero-shot image classification framework based on self-supervised enhancement feature is proposed. The main idea is as follows: first, the pseudo-labels are constructed via the auxiliary task, the self-supervised features of images are obtained by using the self-supervised learning and are further fused with the unsupervised deep features; then, the fused features are embedded in the semantic space for zero-shot image classification, thus the initial predicted labels for unseen classes are obtained; finally, the features and predicted labels of unseen classes are adopted to iteratively optimize the visual-semantic mapping. The framework components proposed can be selected. The framework components self-supervised network, backbone network and reduced-dimension network are CFN, VGG16 and PCA respectively. Experiments on CUB, SUN, and AwA2 datasets show that the proposed network can enhance the discriminative capability of features and perform well on zero-shot image classification tasks.

    参考文献
    相似文献
    引证文献
引用本文

王浩宇,张欣然,王雪松,等.基于自监督增强特征的直推式零样本图像分类[J].控制与决策,2024,39(5):1707-1717

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-04-17
  • 出版日期: 2024-05-20
文章二维码