一种基于多假设交互的三维人体姿态估计模型
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金项目(72204267);教育厅基本科研项目面上项目(JYTMS20231576);辽宁省教育厅基本科研一般项目(LJ212410144025).


A 3D human pose estimation model based on multiple hypothesis interaction
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 基于Transformer的方法在三维人体姿态估计任务中表现出色, 然而, 现有方法虽能通过全局自注意力机制有效建模关节间长程依赖关系, 但在肢体快速运动等场景下易产生局部运动轨迹预测偏差, 存在对局部运动特征建模不足问题. 鉴于此, 提出一种结合卷积神经网络(CNN)与混合注意力机制的Transformer架构模型, 通过加入卷积特征提取, 显著增强局部关节运动表征能力. 首先, 设计混合多假设生成模块, 兼顾效率的同时生成更丰富的假设信息, 有效弥补传统全局视角方法在捕捉局部依赖关系上的不足; 然后, 使用自假设精细化模块进一步挖掘数据中的多样化信息, 确保模型能够捕捉到更多细节; 最后, 通过跨假设交互模块充分融合不同假设间的特征信息, 增强模型的鲁棒性和精度. 实验结果表明, 该模型在数据集Human3.6M上的表现相较于基准模型MHFormer提升了7.99%, 表明了所提出组件与整体架构在三维人体姿态估计领域的有效性.

    Abstract:

    In recent years, Transformer-based methods have shown excellent performance in the task of 3D human pose estimation. However, although existing methods can effectively model the long-range dependencies between joints through the global self-attention mechanism, they are prone to prediction biases of local motion trajectories in scenarios such as rapid limb movements, and there is a problem of insufficient modeling of local motion features. To address this issue, this paper proposes a Transformer architecture model that combines aconvolutional neural network (CNN) with a hybrid attention mechanism. By adding convolutional feature extraction, it significantly enhances the representation ability of local joint movements. A hybrid multi-hypothesis generation module (H-MHG) is designed to generate richer hypothesis information while taking efficiency into account, effectively making up for the deficiencies of traditional global perspective methods in capturing local dependencies. Subsequently, the self-hypothetical granular (SHG) module is used to further explore the diverse information in the data, ensuring that the model can capture more details. Finally, through the cross-hypothetical interaction (CHI) module, we fully integrate the feature information among different hypotheses, enhancing the robustness and accuracy of the model. Experimental results show that the performance of this model on the Human3.6M dataset is improved by 7.99% compared with the baseline model MHFormer, demonstrating the effectiveness of the proposed components and the overall architecture in the field of 3D human pose estimation.

    参考文献
    相似文献
    引证文献
引用本文

胡楠,张家豪,魏晓彤,等.一种基于多假设交互的三维人体姿态估计模型[J].控制与决策,2025,40(12):3704-3712

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-09-30
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-11-10
  • 出版日期: 2025-12-10
文章二维码