面向骨架行为识别的多语义动态图卷积网络
CSTR:
作者:
作者单位:

河海大学人工智能与自动化学院

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

中国航空科学基金(2024M034108001)


Multi-semantic dynamic graph convolutional networks for skeleton-based action recognition
Author:
Affiliation:

Fund Project:

Aeronautical Science Foundation of China(2024M034108001)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来,图卷积网络在人体骨架行为识别领域展现卓越性能.针对现有基于图卷积的方法存在节点复杂相关性建模的局限,以及模态间互补信息利用不足的问题.为此,本文提出一种多语义动态图卷积网络(MSD-GCN).该网络为关节-骨骼融合双流架构,并行处理关节与骨骼模态数据.双流网络由多个多语义动态图卷积算子(MSD-GC)、多尺度时间卷积算子(MS-TC)和关节-骨骼跨模态对比学习模块(JB-CMCL)组成.具体而言,MSD-GC算子通过语义感知分层图(SH-Graph)重构高语义粒度分区,并行执行跨语义空间建模模块(CSSM)捕获全局关节相关性,以及局部几何建模模块(LGM)捕捉细微运动特征,实现多尺度动态特征提取.JB-CMCL则通过跨模态特征对齐和混淆样本辨别机制,引导双流网络中关节与骨骼模态的特征融合与增强,提升模型细粒度识别能力.在NTU RGB+D、 NTU RGB+D 120和Northwestern-UCLA 数据集进行广泛的实验.结果表明,所提出的组件与整体网络具有极强的性能,能够较好地识别混淆动作.与最先进的方法相比,该模型具有极强的竞争力.

    Abstract:

    In recent years, Graph Convolutional Networks have exhibited outstanding performance in the field of skeleton-based action recognition. Nevertheless, existing GCN-based methods suffer from limitations in modeling complex node correlations and insufficient utilization of complementary information between modalities. To address these issues, this paper proposes a Multi-Semantic Dynamic Graph Convolutional Network (MSD-GCN). This network adopts a joint-bone fused dual-stream architecture, processing joint and bone modality data in parallel. The dual-stream network consists of multiple Multi-Semantic Dynamic Graph Convolution (MSD-GC) operators, multiple Multi-Scale Temporal Convolution (MS-TC) operators, and a Joint-Bone Cross-Modal Contrastive Learning (JB-CMCL) module. Specifically, the MSD-GC operator reconstructs high semantic granularity partitions through a Semantic-Aware Hierarchical Graph (SH-Graph) and executes in parallel a Cross-Semantic Space Modeling (CSSM) module to capture global joint correlations and a Local Geometry Modeling (LGM) module to capture subtle motion features. The JB-CMCL module guides feature fusion and enhancement between joint and bone modalities within the dual-stream network through cross-modal feature alignment and hard sample discrimination mechanisms, thereby improving the model"s fine-grained recognition capability. Extensive experiments were conducted on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets. The results demonstrate that the proposed components and the overall network exhibit superior performance, effectively recognizing ambiguous actions. Compared with state-of-the-art methods, our model shows strong competitiveness.

    参考文献
    相似文献
    引证文献
引用本文
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-29
  • 最后修改日期:2025-11-18
  • 录用日期:2025-11-20
  • 在线发布日期: 2025-12-03
  • 出版日期:
文章二维码