一种去注意力机制的动态多层语义感知机
作者:
作者单位:

1.山东工商学院;2.大连海事大学

作者简介:

通讯作者:

中图分类号:

TP181

基金项目:

国家自然科学基金项目(No.61976124, No.61976125,No.62176140)


A Novel Dynamic Multi-layer Semantics Perceptron Without Attention Mechanism
Author:
Affiliation:

1.Shandong Technology & Business University;2.Dalian Maritime University

Fund Project:

The National Natural Science Foundation of China (No.61976124, No.61976125,No.62176140)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    Transformer在大规模数据集上取得了优异效果,但由于使用多头注意力使得模型过于复杂,且在小规模数据集上效果不理想。对多头注意力替换的研究在图像处理领域已取得一些成果,但在自然语言处理领域还少有研究。为此,本文首先提出一种去注意力的多层语义感知机(Multi-Layer Semantics Perceptron, MSP)方法,其核心创新是使用线性序列转换函数替换Encoder中的多头注意力,降低模型复杂度,获得更好的语义表达。其次,提出一种动态深度控制框架(Dynamic Depth Control Framework, DDCF),优化模型深度,降低模型复杂度。在MSP方法和DDCF的基础上,进而提出了动态多层语义感知机(Dynamic Multi-layer Semantics Perceptron, DMSP)模型,在多种文本数据集上的对比实验结果表明,DMSP既能提升模型分类精度,又能有效降低模型复杂度。与Transformer比较,DMSP模型分类精度大幅提升,模型的参数量仅为Transformer模型的35%左右。

    Abstract:

    Transformer has achieved excellent results on large-scale data sets, but it is too complex due to utilizing Multi Head Attention (MHA) , and its performance is poor on small-scale data sets. The study on the replacement of MHA is little in the field of Natural Language Processing, although it has made great achievements in the field of image processing. Therefore, in this paper, a new method called Multi-layer Semantics Perceptron(MSP) is proposed. Its major innovation is that instead of MHA, a simple linear sequence transformation function is used, thus achieving a better semantic feature representation with lower complexity. Additionally, a Dynamic Depth Control Framework called DDCF is proposed, which is able to optimize the depth of neural networks automatically, as a result the complexity of the model is reduced markedly. Finally, based on MSP and DDCF, the Dynamic Multi-layer Semantics Perceptron model denoted by DMSP is put forward. Compared with the Transformer model, the experimental results on multi-data sets show that the DMSP model achieves better performance significantly, meanwhile, its parameters are only 35% .

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-03-28
  • 最后修改日期:2023-07-10
  • 录用日期:2022-11-02
  • 在线发布日期: 2022-11-09
  • 出版日期: