本文的目标是介绍Attention Model在自然语言处理里的应用,本文的结构是:先介绍两篇经典之作,一篇NMT,一篇是Image Caption;之后介绍Attention在不同NLP Task上的应用,在介绍时有详有略。
NMT通常用encoder-decoder family的方法,把句子编码成一个定长向量,再解码成译文。作者推测定长向量是encoder-decoder架构性能提升的瓶颈,因此让模型自动寻找(与预测下一个词相关的)部分原文。
Encoder部分,作者使用了Bidirectional RNN for annotating sequences
The general intuition of the model is that some words are only relevant for predicting local context (e.g. function words), while other words are more suited for determining global context, such as the topic of the document.
The context vector ct is then derived as a weighted average over the set of source hidden states within the window [pt−D,pt+D] ; D is empirically selected.
Unlike the global approach, the local alignment vector at is now fixed-dimensional, i.e., ∈R2D+1 .(这是定义级的区别)
simply set pt=t assuming that source and target sequences are roughly monotonically aligned. at 的公式同global
In our proposed global and local approaches, the attentional decisions are made independently, which is suboptimal.
在标准的MT中,有一个coverage set记录哪些词被翻译过了,在这个模型中attentional vectors h~t are concatenated with inputs at the next time steps. 作者把它称作input-feeding approach.
The effects of having such connections are two-fold: (a) we hope to make the model fully aware of previous alignment choices and (b) we create a very deep network spanning both horizontally and vertically.
The basic idea of attention mechanism is that it assigns a weight/importance to each lower position when computing an upper level representation.
下面再看一些其他任务上Attention Model的应用。
本文要把自然语言转化成逻辑表达式,创造了2个模型:1)Sequence-to-Sequence Model把语义解析当做普通的序列转换任务;2)Sequence-to-Tree Model用层次树解码器获得逻辑形式的结构,先翻译第一层,再翻译下一层。最后在翻译的时候加入了Attention机制。
