Abstract:Multi-instance human parsing aims to segment multiple human instances and their corresponding parts in natural scene images. Existing methods typically rely on static convolution kernels to segment parts and instances in parallel, resulting in a lack of correlation between part and instance features, and thus limiting adaptability to the diversity of human poses and clothing appearances. To address this issue, this paper proposes a multi-instance human parsing method based on dynamic convolution and hypergraph interaction. Segmentation targets are hierarchically divided into three levels: parts, half-body, and instances, with corresponding learnable dynamic convolution kernels configured for each target. Meanwhile, a multi-scale mask attention mechanism is designed to guide the dynamic convolution kernels in aggregating image features across different levels, thereby adapting to the diversity of human poses and clothing appearances. A hypergraph interaction module is proposed, where part dynamic convolution kernels serve as nodes, and instance and half-body dynamic convolution kernels are treated as hyperedges, to model structural priors of the human body. Feature interaction between parts and instances is achieved through message passing on the hypergraph. Experimental results demonstrate that the proposed method outperforms various baseline methods on the MHP-v2.0, CIHP, and Densepose datasets, achieving average improvements of 14.6%, 5.8%, and 10.7% in $ {\rm AP}_{50}^p $, $ {\rm AP}_{\rm vol}^p $, and $ {\rm PCP}_{50} $ metrics, respectively. Furthermore, ablation and visualization experiments validate the effectiveness of the dynamic convolution kernels and the hypergraph interaction module.