Abstract:Transformer has achieved excellent results on large-scale data sets, but it is too complex due to utilizing Multi Head Attention (MHA) , and its performance is poor on small-scale data sets. The study on the replacement of MHA is little in the field of Natural Language Processing, although it has made great achievements in the field of image processing. Therefore, in this paper, a new method called Multi-layer Semantics Perceptron(MSP) is proposed. Its major innovation is that instead of MHA, a simple linear sequence transformation function is used, thus achieving a better semantic feature representation with lower complexity. Additionally, a Dynamic Depth Control Framework called DDCF is proposed, which is able to optimize the depth of neural networks automatically, as a result the complexity of the model is reduced markedly. Finally, based on MSP and DDCF, the Dynamic Multi-layer Semantics Perceptron model denoted by DMSP is put forward. Compared with the Transformer model, the experimental results on multi-data sets show that the DMSP model achieves better performance significantly, meanwhile, its parameters are only 35% .