基于Transformer-CNN混合架构的跨模态融合抓取检测
CSTR:
作者:
作者单位:

1. 重庆理工大学 两江人工智能学院,重庆 401135;2. 同济大学 电子与信息工程学院,上海 200092

作者简介:

通讯作者:

E-mail: ywang@cqut.edu.cn.

中图分类号:

TP29

基金项目:


Cross-modal interaction fusion grasping detection based on Transformer-CNN hybrid architecture
Author:
Affiliation:

1. School of Artificial Intelligence,Chongqing University of Technology,Chongqing 401135,China;2. College of Electronic and Information Engineering,Tongji University,Shanghai 200092,China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在机器臂抓取检测领域,RGB图像和深度图像的处理效率仍有很大提升空间.鉴于此,提出一种基于Transformer-CNN混合架构的新型跨模态交互融合的机械臂抓取检测方法.为了充分利用RGB和深度图像的特征信息,开发一种高效的跨模态特征交互融合模块,用来校准RGB和深度图像相对应的特征信息,并交互增强双模态的特征.此外,设计一种Transformer与CNN并行的网络模块,结合CNN的局部建模能力和Transformer的全局建模能力,获得更好的特征表示,从而提高抓取检测性能.实验结果表明,所提方法在Cornell与Jacquard抓取数据集上分别达到了99.1%和96.2%的准确率.在真实场景下的抓取检测实验验证了所提方法可以有效预测各种场景下物品的抓取位置.

    Abstract:

    In the field of robotic grasping detection, there is still great room for improvement in the processing efficiency of RGB and depth images. This article proposes a novel RGB-D cross modal interactive fusion method for robotic grasping detection based on a Transformer-CNN hybrid architecture. In order to fully utilize the feature information of RGB and depth images, an efficient cross modal feature interaction fusion module has been developed, which can calibrate the corresponding feature information of RGB and depth images and interactively enhance the bimodal features. In addition, a parallel network module between Transformer and CNN is designed to combine the local modeling ability of CNN and the global modeling ability of Transformer to obtain better feature representation and improve the performance of grab detection. The experimental results show that this method achieves an accuracy of 99.1% and 96.2% on the Cornell dataset and Jacquard dataset, respectively. The grasp detection experiments in real scenes verify that the proposed method can effectively predict the grasp pose of objects in various scenarios.

    参考文献
    相似文献
    引证文献
引用本文

王勇,李邑灵,苗夺谦,等.基于Transformer-CNN混合架构的跨模态融合抓取检测[J].控制与决策,2024,39(11):3607-3616

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-09-20
  • 出版日期: 2024-11-20
文章二维码