面向骨架行为识别的多语义动态图卷积网络

doi:10.13195/j.kzyjc.2025.0793

首页 > 过刊浏览>年第0卷第期 >. DOI:10.13195/j.kzyjc.2025.0793

面向骨架行为识别的多语义动态图卷积网络
DOI:
                        10.13195/j.kzyjc.2025.0793
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:河海大学人工智能与自动化学院
作者简介:
通讯作者:
中图分类号:TP391
基金项目:中国航空科学基金(2024M034108001)

Multi-semantic dynamic graph convolutional networks for skeleton-based action recognition

Author:

Affiliation:

Fund Project:

Aeronautical Science Foundation of China(2024M034108001)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

近年来,图卷积网络在人体骨架行为识别领域展现卓越性能.针对现有基于图卷积的方法存在节点复杂相关性建模的局限,以及模态间互补信息利用不足的问题.为此,本文提出一种多语义动态图卷积网络(MSD-GCN).该网络为关节-骨骼融合双流架构,并行处理关节与骨骼模态数据.双流网络由多个多语义动态图卷积算子(MSD-GC)、多尺度时间卷积算子(MS-TC)和关节-骨骼跨模态对比学习模块(JB-CMCL)组成.具体而言,MSD-GC算子通过语义感知分层图(SH-Graph)重构高语义粒度分区,并行执行跨语义空间建模模块(CSSM)捕获全局关节相关性,以及局部几何建模模块(LGM)捕捉细微运动特征,实现多尺度动态特征提取.JB-CMCL则通过跨模态特征对齐和混淆样本辨别机制,引导双流网络中关节与骨骼模态的特征融合与增强,提升模型细粒度识别能力.在NTU RGB+D、 NTU RGB+D 120和Northwestern-UCLA 数据集进行广泛的实验.结果表明,所提出的组件与整体网络具有极强的性能,能够较好地识别混淆动作.与最先进的方法相比,该模型具有极强的竞争力.

Abstract:

In recent years, Graph Convolutional Networks have exhibited outstanding performance in the field of skeleton-based action recognition. Nevertheless, existing GCN-based methods suffer from limitations in modeling complex node correlations and insufficient utilization of complementary information between modalities. To address these issues, this paper proposes a Multi-Semantic Dynamic Graph Convolutional Network (MSD-GCN). This network adopts a joint-bone fused dual-stream architecture, processing joint and bone modality data in parallel. The dual-stream network consists of multiple Multi-Semantic Dynamic Graph Convolution (MSD-GC) operators, multiple Multi-Scale Temporal Convolution (MS-TC) operators, and a Joint-Bone Cross-Modal Contrastive Learning (JB-CMCL) module. Specifically, the MSD-GC operator reconstructs high semantic granularity partitions through a Semantic-Aware Hierarchical Graph (SH-Graph) and executes in parallel a Cross-Semantic Space Modeling (CSSM) module to capture global joint correlations and a Local Geometry Modeling (LGM) module to capture subtle motion features. The JB-CMCL module guides feature fusion and enhancement between joint and bone modalities within the dual-stream network through cross-modal feature alignment and hard sample discrimination mechanisms, thereby improving the model"s fine-grained recognition capability. Extensive experiments were conducted on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets. The results demonstrate that the proposed components and the overall network exhibit superior performance, effectively recognizing ambiguous actions. Compared with state-of-the-art methods, our model shows strong competitiveness.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-07-29
最后修改日期:2025-11-18
录用日期:2025-11-20
在线发布日期: 2025-12-03
出版日期:

首页

期刊简介

编委会

作者中心

精选专辑

品牌联动

引用本文

相关视频

分享

文章指标

历史

文章二维码