Abstract:To address the challenge of ineffective knowledge transfer and poor student performance in offline knowledge distillation due to the significant scale gap between teachers and students, a multi-scale knowledge distillation method based on self-supervised adversarial learning (SAMKD) is proposed. This method leverages self-supervision and adversarial learning to further develop the potential of intermediate multi-scale features and network logits. Firstly, the paper introduces supervised network learning using multi-angle geometrically transformed images. Then, it designs a multi-branch auxiliary network to extract multi-scale features from the backbone network, thereby enhancing supervisory information. Finally, it employs a binary adversarial training approach inspired by adversarial learning for multi-stage adversarial training, effectively facilitating comprehensive knowledge transfer across multiple levels through distillation. Extensive evaluations on three challenging public datasets, CIFAR-10, CIFAR-100, and Tiny-ImageNet, demonstrate that the proposed method exhibits robust competitiveness and outperforms other state-of-the-art knowledge distillation methods.