Abstract:Self-knowledge distillation reduces the necessity of training a large teacher network, whose attention mechanism only focuses on the foreground of the image. It ignores the background knowledge with color and texture information, furthermore may lead to the omission of the foreground information due to the wrong focus of spatial attention. To address the problem, a self-knowledge distillation method based on dynamic mixed attention is proposed, which reasonably exploits both foreground and background information in images and therefore improves the classification accuracy. A mask segmentation module is designed to segment the feature map of background and foreground, which are used to extract the ignored background knowledge and the missing foreground information respectively. Moreover, a knowledge extraction module based on dynamic attention distribution strategy is proposed, which dynamically adjusts the loss ratio of background attention and foreground attention by introducing a parameter based on predictive probability distribution. The strategy guides the cooperation between foreground and background, which leads to more accurate attention map and improves the performance of classifier network. Experiments show that the method using ResNet18 and WRN-16-2 improve the accuracy on CIFAR100 by 2.15% and 1.54% respectively. For fine-grained visual recognition tasks, the accuracy on CUB200 dataset and MIT67 dataset is improved by 3.51% and 1.05% respectively, which makes its performance superior to the state-of-the-arts.