一种基于多层语义特征的图像理解方法

doi:10.13195/j.kzyjc.2020.0927

首页 > 过刊浏览>2021年第36卷第12期 >2881-2890. DOI:10.13195/j.kzyjc.2020.0927

一种基于多层语义特征的图像理解方法
DOI:
                        10.13195/j.kzyjc.2020.0927
                    
作者:
                        
                        
                    
作者单位:哈尔滨工程大学 智能科学与工程学院,哈尔滨 150001
作者简介:
通讯作者:E-mail: tianpeng0609@163.com.
中图分类号:TP181
基金项目:国家重点研发计划项目(2018AAA0102702).

An image understanding method based on multi-level semantic features

Author:

Affiliation:

College of Intelligent Systems Science and Engineering,Harbin Engineering University,Harbin 150001,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

视觉场景理解包括检测和识别物体、推理被检测物体之间的视觉关系以及使用语句描述图像区域.为了实现对场景图像更全面、更准确的理解,将物体检测、视觉关系检测和图像描述视为场景理解中3种不同语义层次的视觉任务,提出一种基于多层语义特征的图像理解模型,并将这3种不同语义层进行相互连接以共同解决场景理解任务.该模型通过一个信息传递图将物体、关系短语和图像描述的语义特征同时进行迭代和更新,更新后的语义特征被用于分类物体和视觉关系、生成场景图和描述,并引入融合注意力机制以提升描述的准确性.在视觉基因组和COCO数据集上的实验结果表明,所提出的方法在场景图生成和图像描述任务上拥有比现有方法更好的性能.

Abstract:

Visual scene understanding includes detecting and recognizing objects, reasoning the visual relationships of the detected objects, and describing image regions with sentences. In order to achieve the more comprehensive and accurate understanding of scene image, we view object detection, visual relationship detection and image captioning as three visual tasks at different semantic levels in scene understanding, so as to propose an image understanding model based on multi-level semantic features to leverage the mutual connections across the three different semantic layers to solve the scene understanding tasks jointly. The model iterates and updates the semantic features of objects, relationship phrases and image captioning simultaneously through a message pass graph. The updated semantic features are used to classify objects and visual relationships, generate scene graphs and captions, and introduce a fusion attention mechanism to improve the accuracy of captions. The experimental results on the visual genome and COCO datasets show that the proposed method outperforms the existing methods on the scene graph generation and image captioning tasks.

参考文献

相似文献

引证文献

引用本文

莫宏伟,田朋.一种基于多层语义特征的图像理解方法[J].控制与决策,2021,36(12):2881-2890

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2021-11-18
出版日期: 2021-12-20

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

分享

文章指标

历史