Abstract:When labeled data are deficient, semi-supervised learning uses a large number of unlabeled data to solve the bottleneck problem of labeled data. However, the unlabeled data and labeled data come from different fields, it may cause quality problem of unlabeled data, which makes the generalization ability of the model poor and leads to the degradation of classification accuracy. Therefore, based on the wordMixup method, this paper proposes the u-wordMixup method for data augmentation of unlabeled data, and a semi-supervised deep learning model based on u-wordMixup (SD-uwM) by combining the consistent training framework and Mean Teacher model. The model utilizes the u-wordMixup method to augment data of unlabeled data, which can improve the quality of unlabeled data and reduce overfitting under the constraints of supervised cross-entropy and unsupervised consistency loss. The comparative experimental results on the datasets of AGNews, THUCNews and 20 Newsgroups show that the proposed method can improve the generalization ability of the model and also effectively improve the time performance.