自然语言处理教程-注意力模型|Natural Language Processing with Attention Models

Contents

  • Week 1
  • 1 Neural Machine Translation
    • 1.1 Seq2seq
    • 1.2 Seq2seq Model with Attention
    • 1.3 Queries, Keys, Values, and Attention
    • 1.4 Teacher Forcing
    • 1.5 NMT Model with Attention
    • 1.6 BLEU Score
    • 1.6 ROUGE-N Score
    • 1.7 Sampling and Decoding
    • 1.8 Beam Search
    • 1.9 Minimum Bayes Risk
  • Week 2
  • 1 Text Summarization
    • 1.1 Transformer vs. RNNs
    • 1.2 Transformers overview
    • 1.3 Transformer Applications
    • 1.4 Scaled and Dot-Product Attention
    • 1.5 Masked Self Attention
    • 1.6 Multi-head Attention
    • 1.7 Transformer Decoder
    • 1.8 Transformer Summarizer

Week 1

1 Neural Machine Translation

1.1 Seq2seq

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第1张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第2张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第3张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第4张图片

Shortcomings:

The information bottleneck.

Since seq2seq uses a fixed length memory for the hidden states, long sequences become problematic. This is due to the fact that in traditional seq2seq models, only a fixed amount of information can be passed from the encoder to the decoder no matter how much information is contained in the input sequence.

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第5张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第6张图片

1.2 Seq2seq Model with Attention

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第7张图片

  • 将encoder每一个节点的信息都保留,然后把所有的向量加起来,构成了context vector
  • (这一步并没有和传统的seq2seq有很大区别)

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第8张图片

  • Weight some encoders more than others before the point-wise addition.
  • Words are more important for the next decoder outputs would have larger weights.
  • How these weights are calculated?
  • S i − 1 S_{i-1} Si1 is the previous hidden state in the decoder, which contains information about the previous words in the output translation.
  • You can compare the decoder states with each encoder state to determine the most important inputs.

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第9张图片

  • e i j e_{ij} eij is to calculate the alignment. It is the score of how well the inputs around j j j match the excepted output i i i.

1.3 Queries, Keys, Values, and Attention

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第10张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第11张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第12张图片

  • Improving the model performance for larger model sizes and could be seen as a regularization constants.
    自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第13张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第14张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第15张图片

1.4 Teacher Forcing

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第16张图片

  • Use the ground truth words as decoder inputs instead of the decoder outputs. Even if the model makes a wrong prediction, it pretends as if it made correct one and this can continue. (make training much faster)

1.5 NMT Model with Attention

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第17张图片

1.6 BLEU Score

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第18张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第19张图片

1.6 ROUGE-N Score

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第20张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第21张图片

1.7 Sampling and Decoding

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第22张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第23张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第24张图片

1.8 Beam Search

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第25张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第26张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第27张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第28张图片

1.9 Minimum Bayes Risk

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第29张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第30张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第31张图片

Week 2

1 Text Summarization

1.1 Transformer vs. RNNs

RNN

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第32张图片

Seq2Seq Architectures

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第33张图片

  • The information tends to get lost within the network and vanishing gradients problems arise related to the length of your import sequences.
  • LSTMs and GRUs help a little with these problems within those architecture when they try to process very long sequences due to the information bottleneck.

Transformer

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第34张图片

  • Transformer don’t use RNNs
  • Only need attention

1.2 Transformers overview

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第35张图片

1.3 Transformer Applications

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第36张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第37张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第38张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第39张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第40张图片

1.4 Scaled and Dot-Product Attention

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第41张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第42张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第43张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第44张图片

1.5 Masked Self Attention

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第45张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第46张图片

  • This type of attention lets you get contextual representations of your words. In other terms, self-attention gives you a representation of the meaning of each word within the sentence.

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第47张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第48张图片

  • Queries cannot attend to the future.

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第49张图片

1.6 Multi-head Attention

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第50张图片

  • In multi-head attention, you apply in parallel the attention mechanism to multiple sets of these matrixes that you can get by transforming the original embeddings.
  • In multi-head attention, the number of times that you apply the attention mechanism is the number of heads in the model.
  • Using different sets of representations allows your model to learn multiple relationships between the words from the query and key matrices.

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第51张图片

  • First, you transform each of these matrices into multiple vector spaces. As you saw previously, the number of transformations for each matrix is equal to the number of heads in the model.
  • Then, you will apply the scale dot-product attention mechanism to every sets of value, key and query transformations. Where again, the number of sets is equal to the number of heads and the model.
  • After that, you concatenate the results from each head in the model into a single matrix.

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第52张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第53张图片

1.7 Transformer Decoder

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第54张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第55张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第56张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第57张图片

1.8 Transformer Summarizer

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第58张图片

自然语言处理教程-注意力模型|Natural Language Processing with Attention Models_第59张图片

你可能感兴趣的:(Python机器学习基础教程,CU_Courses,自然语言处理,人工智能,深度学习)