结合注意力机制的循环神经网络复述识别模型
CSTR:
作者:
作者单位:

(大连工业大学信息科学与工程学院,辽宁大连116034)

作者简介:

通讯作者:

E-mail: lixu102@aliyun.com.

中图分类号:

TP18

基金项目:

国家重点研发计划专项项目(2017YFC0821003-3);辽宁省高等学校基本科研项目(2017J049);辽宁省自然科学基金项目(20180550395);辽宁省教育厅青年科技人才“育苗”项目(J2020113).


Recurrent neural networks based paraphrase identification model combined with attention mechanism
Author:
Affiliation:

(School of Information Science and Engineering,Dalian Polytechnic University,Dalian116034,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    传统基于深度学习的复述识别模型通常以关注文本表示为核心,忽略了对多粒度交互特征的挖掘与匹配.为此,建模文本交互空间,分别利用双向长短时记忆网络对两个候选复述句按条件编码,基于迭代隐状态的输出,通过逐词软对齐的方式从词、短语、句子等多个粒度层次推理并获取句子对的语义表示,最后综合不同视角的语义表达利用softmax实现二元分类.为解决复述标注训练语料不足,在超过580000句子对的数据集上利用语言建模任务对模型参数无监督预训练,再使用预训练好的参数在标准数据集上有监督微调.与先前最佳的神经网络模型相比,所提出模型在标准数据集MSRP上准确率提高2.96%,$F_1$值改善2%.所提出模型综合文本全局和局部匹配信息,多粒度、多视角地描述文本交互匹配模式,能够降低对人工特征工程的需求,具有良好的实用性.

    Abstract:

    The traditional paraphrase identification models based on deep learning usually focus on text representation and ignore the mining and matching of multi-granular interaction features. To address the problem, we propose a recurrent neural network model with word-by-word attention mechanism. In this paper, the word embeddings are inputted into the recurrent neural networks, and the two candidate paraphrase sentences are conditionally encoded via two bidirectional. Based on the output of the iterative hidden states, the sentence-pair representation is obtained from global matching and fine-grained reason via soft-alignment of words and words in the two sentences. Finally, for classification, we use a softmax layer over the output of a non-linear projection of the output vector into the target space of the two classes. The labeled training set for paraphrase identification is small in comparison with the high complexity of the task. In order to make full use of the training data, we use a language modeling task to unsupervised pre-train the neural network parameters on the corpora of more than 580,000 pairs of sentences. This is followed by a fine-tuning stage, where we adapt the model to a specific task with labeled data. Compared with the previous state-of-art neural network model, the accuracy and the $F_1$ score of our model are improved by 2.96 percent and 2 percent on the MSRP data set respectively. The proposed model combines multiple semantic expressions of text from different perspectives and describes the multi-granular matching pattern. It is an end-to-end differentiable system that reduces manual feature engineering efforts, and has good practicability.

    参考文献
    相似文献
    引证文献
引用本文

李旭,姚春龙,范丰龙,等.结合注意力机制的循环神经网络复述识别模型[J].控制与决策,2021,36(1):152-158

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-01-06
  • 出版日期: 2021-01-20
文章二维码