CMCL-DFR:复杂背景下工业小目标的高精度6D位姿估计
DOI:
CSTR:
作者:
作者单位:

中国科学院沈阳自动化研究所

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金(52505582);辽宁省自然科学基金(2024-BSBA-55).


CMCL-DFR: High-Accuracy 6D Pose Estimation for Industrial Small Objects in Cluttered Scenes
Author:
Affiliation:

Fund Project:

The National Natural Science Foundation of China (52505582); The Natural Science Foundation of Liaoning Province (2024-BSBA-55).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    精确的6D位姿估计对于柔性制造、机器人抓取与智能装配至关重要,但仍面临三个主要局限:复杂背景下小目标检测困难;传统配准方法对初始值敏感、收敛域小,而深度学习对工业小目标泛化不足;现有神经渲染方法主要围绕RGB-D图像设计,旨在优化视图合成质量,难以直接满足工业场景中对高精度点云几何信息进行无损、精确位姿解算的需求.为此,本文提出一种“跨模态粗定位与差异化几何精配准”(CMCL-DFR)的协同估计框架.第一阶段提出基于虚拟渲染的神经位姿估计方法(VR-NPE),通过可微渲染将点云桥接至图像域,并设计几何感知多尺度网络(GMS-Net)融合多模态特征,提升小目标检测与粗定位鲁棒性.第二阶段提出位姿引导的多尺度几何感知配准方法(PG-MSGAR),通过曲率分析实现点云自适应区域分割,为不同几何显著性区域赋予差异化约束权重,并利用TEASER++抑制离群点,实现高精度位姿精化.在自建工业零件数据集(IPD)上的实验表明,本文方法平均距离(ADD)误差为0.95 mm,成功率为91.8%,与FoundationPose相比ADD误差降低48.6%.

    Abstract:

    Accurate 6D pose estimation is crucial for flexible manufacturing, robotic grasping, and intelligent assembly. However, it still faces three major limitations: First, it is difficult to detect small targets against complex backgrounds; Second, traditional registration methods are sensitive to initial estimates, and have narrow convergence basins, while deep learning has poor generalization for industrial small objects; Third, primarily designed based on RGB-D images for view synthesis, existing neural rendering approaches struggle to meet the industrial demand for lossless, precise pose estimation from geometry of high-accuracy point cloud. To address these challenges, a collaborative estimation framework termed "Cross-Modal Coarse Localization and Differentiated Fine Registration" (CMCL-DFR) is proposed. In the first stage, a Virtual Rendering-based Neural Pose Estimation (VR-NPE) method is introduced. Differentiable rendering is used to bridge the point cloud to the image domain. A designed Geometry-aware Multi-Scale Network (GMS-Net) fuses multimodal features to enhance the robustness of small-target detection and coarse localization. In the second stage, a Pose-Guided Multi-Scale Geometric-Aware Registration (PG-MSGAR) method is proposed.In this method, adaptive region segmentation of the point cloud is achieved through curvature analysis. Differential constraint weights are assigned to regions with varying geometric saliency, and TEASER++ is ultilized to suppress outliers, thereby enabling high-precision pose refinement. Experimental results on a self-built Industrial Parts Dataset (IPD) demonstrate that the proposed method achieves an Average Distance (ADD) error of 0.95 mm with a success rate of 91.8%, reducing the ADD error by 48.6% in comparison with FoundationPose.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2026-01-29
  • 最后修改日期:2026-04-20
  • 录用日期:2026-04-21
  • 在线发布日期:
  • 出版日期:
文章二维码