Lecture 8 Deep Learning for NLP: Recurrent Networks

目录

      • Problem of N-gram Language Model N-gram 语言模型的问题
      • Recurrent Neural Network(RNN) 循环神经网络
      • RNN Language Model: RNN 语言模型
      • Long Short-Term Memory Model (LSTM) 长短期记忆模型(LSTM)
      • Gating Vector 门向量
      • Forget Gate 忘记门
      • Input Gate 输入门
      • Update Memory Cell 更新记忆单元
      • Output Gate 输出门
      • Disadvantages of LSTM LSTM 的缺点
      • Example Applications 示例应用
      • Variants of LSTM LSTM的变种

Recurrent Networks 循环神经网络

Problem of N-gram Language Model N-gram 语言模型的问题

  • Cen be implemented using counts with smoothing 可以用平滑计数实现

  • Can be implemented using feed-forward neural networks 可以用前馈神经网络实现

  • Problem: limited context 问题:上下文限制

  • E.g. Generate sentences using trigram model: 例如:使用 trigram 模型生成句子:

在这里插入图片描述

Recurrent Neural Network(RNN) 循环神经网络

  • Allow representation of arbitrarily sized inputs 允许表示任意大小的输入

  • Core idea: processes the input sequence one at a time, by applying a recurrence formula 核心思想:一次处理一个输入序列,通过应用递归公式

  • Uses a state vector to represent contexts that have been previously processed 使用状态向量表示之前处理过的上下文

  • RNN Neuron: RNN 神经元

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第1张图片

  • RNN States: RNN 状态

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第2张图片


    Lecture 8 Deep Learning for NLP: Recurrent Networks_第3张图片


    Activation 激活函数:
    Lecture 8 Deep Learning for NLP: Recurrent Networks_第4张图片

  • RNN Unrolled: 展开的 RNN

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第5张图片

    • Same parameters are used across all time steps 同一参数 在所有时间步长中都被使用
  • Training RNN: 训练 RNN

    • An unrolled RNN is a very deep neural network. But parameters are shared across all time steps 展开的 RNN 是一个非常深的神经网络。但是参数在所有时间步中都是共享的
    • To train RNN, just need to create the unrolled computation graph given an input sequence and use backpropagation algorithm to compute gradients as usual. 要训练 RNN,只需根据输入序列创建展开的计算图,并使用反向传播算法计算梯度
    • This procedure is called backpropagation through time. 这个过程叫做时间反向传播

      E.g of unrolled equation: 展开方程的例子

      Lecture 8 Deep Learning for NLP: Recurrent Networks_第6张图片

RNN Language Model: RNN 语言模型

Lecture 8 Deep Learning for NLP: Recurrent Networks_第7张图片

  • is current word (e.g. eats) mapped to an embedding 是当前词(例如 eats)映射到一个嵌入

  • contains information of the previous words (e.g. a and cow) 包含前面词的信息(例如 a 和 cow)

  • is the next word (e.g. grass) 是下一个词(例如 grass)

  • Training:

    • Vocabulary 词汇: [a, cow, eats, grass]

    • Training example 训练样本: a cow eats grass

    • Training process 训练过程:

      Lecture 8 Deep Learning for NLP: Recurrent Networks_第8张图片




    • Losses:

      • Total loss:

  • Generation:

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第9张图片

  • Problems of RNN: RNN 的问题

    • Error Propagation: Unable to recover from errors in intermediate steps 错误传播:无法从中间步骤的错误中恢复
    • Low diversity in generated language 生成的语言多样性低
    • Tend to generate bland or generic language 倾向于生成乏味或通用的语言

Long Short-Term Memory Networks

Long Short-Term Memory Model (LSTM) 长短期记忆模型(LSTM)

  • RNN has the capability to model infinite context. But it cannot capture long-range dependencies in practice due to the vanishing gradients RNN 具有建模无限上下文的能力。但由于梯度消失,实际上无法捕捉长距离依赖性

  • Vanishing Gradient: Gradients in later steps diminish quickly during backpropagation. Earlier inputs do not get much update. 梯度消失:在反向传播过程中,后续步骤的梯度快速减小。较早的输入没有得到太多更新。

  • LSTM is introduced to solve vanishing gradients LSTM 用来解决梯度消失问题

  • Core idea: have memory cells that preserve gradients across time. Access to the memory cells is controlled by gates. 核心思想:拥有跨时间保存梯度的记忆单元。通过门控制对记忆单元的访问。

  • Gates: For each input, a gate decides: 门:对于每个输入,门决定

    • How much the new input should be written to the memory cell 应该将多少新输入写入记忆单元
    • How much content of the current memory cell should be forgotten 应该忘记当前记忆单元的多少内容
  • Comparison between simple RNN and LSTM: 简单 RNN 和 LSTM 的比较

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第10张图片

Gating Vector 门向量

  • A gate is a vector. Each element of the gate has values between 0 and 1. Use sigmoid function to produce . 门 是一个向量。门的每个元素的值在 0 到 1 之间。使用 sigmoid 函数来产生 。

  • is multiplied component-wise with vector to determine how much information to keep for 和向量 乘以 component-wise 来确定对 保留多少信息

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第11张图片

Forget Gate 忘记门

Lecture 8 Deep Learning for NLP: Recurrent Networks_第12张图片

  • Controls how much information to forget in the memory cell 控制在记忆单元 中忘记多少信息

  • E.g. Given Tha cas that the boy predict the next word likes 例如,给定 Tha cas that the boy 预测下一个词 likes

    • Memory cell was storing noun information cats 记忆单元正在存储名词信息 cats
    • The cell should now forget cats and store boy to correctly predict the singular verb likes 该单元现在应该忘记 cats 并存储 boy 以正确预测单数动词 likes

Input Gate 输入门

Lecture 8 Deep Learning for NLP: Recurrent Networks_第13张图片

  • Input gate controls how much new information to put to memory cell 输入门控制将多少新信息放入记忆单元

  • is new distilled information to be added 是要添加的新提炼信息

Update Memory Cell 更新记忆单元

Lecture 8 Deep Learning for NLP: Recurrent Networks_第14张图片

  • Use the forget and input gates to update memory cell 使用忘记门和输入门来更新记忆单元

Output Gate 输出门

Lecture 8 Deep Learning for NLP: Recurrent Networks_第15张图片

  • Output gate controls how much to distill the content of the memory cell to create the next state 输出门控制如何提炼记忆单元的内容以创建下一个状态

Disadvantages of LSTM LSTM 的缺点

  • Introduces some but not many parameters 引入了一些但并不多的参数
  • Still unable to capture very long range dependencies 仍无法捕获非常长的依赖性
  • Slower but not much slower than simple RNN 比简单的 RNN 慢,但并不比 RNN 慢太多

Applications of RNN RNN 的应用

Example Applications 示例应用

  • Shakespeare Generator 莎士比亚生成器:

    • Training data: all works fo Shakespeare 训练数据:莎士比亚的所有作品
    • Model: Character RNN, hidden dimension = 512 模型:Character RNN,隐藏维度 = 512
  • Wikipedia Generator: 维基百科生成器

    • Training data: 100MB of Wikipedia raw data 训练数据:100MB的维基百科原始数据
  • Code Generator 代码生成器

  • Text Classification 文本分类

    • RNNs can be used in variety NLP tasks. Particularly suited for tasks where order of words matter. E.g. sentiment analysis RNNs可以用于各种NLP任务。特别适合于单词顺序很重要的任务。例如,情感分析

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第16张图片

  • Sequence Labeling: E.g. POS tagging 序列标记:例如,词性标注

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第17张图片

Variants of LSTM LSTM的变种

  • Peephole connections: allow gates to look at cell state 窥视孔连接:允许门看到单元状态

  • Gated recurrent unit (GRU): Simplified variant with only 2 gates and no memory cell 门控循环单元(GRU):简化的变体,只有2个门,没有记忆单元

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第18张图片

  • Multi-layer LSTM 多层LSTM

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第19张图片

  • Bidirectional LSTM 双向LSTM

    Lecture 8 Deep Learning for NLP: Recurrent Networks_第20张图片

你可能感兴趣的:(自然语言处理,深度学习,自然语言处理,RNN,LSTM)