面向骨架行为识别的多语义动态图卷积网络
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

中国航空科学基金项目(2024M034108001).


Multi-semantic dynamic graph convolutional networks for skeleton-based action recognition
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 图卷积网络在人体骨架行为识别领域展现出卓越性能. 针对现有基于图卷积的方法存在节点复杂相关性建模的局限, 以及模态间互补信息利用不足的问题, 提出一种多语义动态图卷积网络(MSD-GCN). 该网络为关节-骨骼融合双流架构, 并行处理关节和骨骼模态数据. 双流网络由多个多语义动态图卷积算子(MSD-GC)、多尺度时间卷积算子(MS-TC)和关节-骨骼跨模态对比学习模块(JB-CMCL)组成. 具体而言: MSD-GC算子通过语义感知分层图(SH-Graph)重构高语义粒度分区, 并行执行跨语义空间建模模块(CSSM)捕获全局关节相关性, 以及局部几何建模模块(LGM)捕捉细微运动特征, 以实现多尺度动态特征提取; JB-CMCL则通过跨模态特征对齐和混淆样本辨别机制, 引导双流网络中关节与骨骼模态的特征融合和增强, 以提升模型细粒度识别能力. 在NTU RGB + D、NTU RGB + D 120和Northwestern-UCLA数据集进行广泛实验. 实验结果表明: 所提出组件和整体网络具有极强的性能, 能够较好地识别混淆动作; 与最先进的方法相比, 所提出模型具有极强的竞争力.

    Abstract:

    In recent years, graph convolutional networks have exhibited outstanding performance in the field of skeleton-based action recognition. Nevertheless, existing graph convolutional network (GCN) based methods suffer from limitations in modeling complex node correlations and insufficient utilization of complementary information between modalities. To address these issues, this paper proposes a multi-semantic dynamic GCN (MSD-GCN). This network adopts a joint-bone fused dual-stream architecture, processing joint and bone modality data in parallel. The dual-stream network consists of multiple MSD-GC operators, multiple multi-scale temporal convolution (MS-TC) operators, and a joint-bone cross-modal contrastive learning (JB-CMCL) module. Specifically, the MSD-GC operator reconstructs high semantic granularity partitions through a semantic-aware hierarchical graph (SH-Graph) and executes in parallel a cross-semantic space modeling (CSSM) module to capture global joint correlations and a local geometry modeling (LGM) module to capture subtle motion features. The JB-CMCL module guides feature fusion and enhancement between joint and bone modalities within the dual-stream network through cross-modal feature alignment and hard sample discrimination mechanisms, thereby improving the model’s fine-grained recognition capability. Extensive experiments are conducted on NTU RGB + D, NTU RGB + D 120, and Northwestern-UCLA datasets. The results demonstrate that the proposed components and the overall network exhibit superior performance, effectively recognizing ambiguous actions. Compared with state-of-the-art methods, the proposed model shows strong competitiveness.

    参考文献
    相似文献
    引证文献
引用本文

宋忱,钱惠敏,吴大伟.面向骨架行为识别的多语义动态图卷积网络[J].控制与决策,2026,41(6):1640-1650

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-29
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-05-13
  • 出版日期:
文章二维码