【Paper】GRU_LSTM introduce

链接:https://liudongdong1.github.io/2020/07/15/shi-jue-ai/model/gru-lstm/

0. RNN

【Paper】GRU_LSTM introduce_第1张图片
h t = f ( x t , h t − 1 ) h t : = t a n h ( W x h x t + W h h h t − 1 ) h_t=f(x_t,h_{t-1})\\ h_t:=tanh(W_{xh}x_t+W_{hh}h_{t-1}) ht=f(xt,ht1)ht:=tanh(Wxhxt+Whhht1)

  • 计算目标:反向传播时,损失函数 l l l t t t 时刻隐含状态向量 h t h_t ht的偏导。

奇异值分解

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qBCwLxYr-1595001102796)(C:/Users/dell/AppData/Roaming/Typora/typora-user-images/image-20200715153213869.png)]

【Paper】GRU_LSTM introduce_第2张图片

【Paper】GRU_LSTM introduce_第3张图片

1. GRU

GRU(Gate Recurrent Unit)是循环神经网络(Recurrent Neural Network, RNN)的一种。和LSTM(Long-Short Term Memory)一样,也是为了解决长期记忆和反向传播中的梯度等问题而提出来的。相比LSTM,使用GRU能够达到相当的效果,并且相比之下更容易进行训练,能够很大程度上提高训练效率,因此很多时候会更倾向于使用GRU。

【输入输出结构】

【Paper】GRU_LSTM introduce_第4张图片

【内部结构】

  • r: 控制重置门控;
  • z: 为控制更新门控;门控信号越接近1,代表”记忆“下来的数据越多;而越接近0则代表”遗忘“的越多。

【Paper】GRU_LSTM introduce_第5张图片

【Paper】GRU_LSTM introduce_第6张图片

【Paper】GRU_LSTM introduce_第7张图片

【更新表达式】
h t = ( 1 − z ) Θ h t − 1 + z Θ h ′ h^t=(1-z)\Theta h^{t-1}+z\Theta h' ht=(1z)Θht1+zΘh

  • ( 1 − z ) Θ h t − 1 (1-z)\Theta h^{t-1} (1z)Θht1:表示对原本隐藏状态的选择性“遗忘”。这里的 1 − z 1-z 1z可以想象成遗忘门(forget gate),忘记 h t − 1 h^{t-1} ht1维度中一些不重要的信息。
  • z Θ h t − 1 z\Theta h^{t-1} zΘht1 : 表示对包含当前节点信息的 h ′ h' h进行选择性”记忆“。与上面类似,这里的 ( 1 − z ) (1-z) (1z) 同理会忘记 h ′ h' h维度中的一些不重要的信息。或者,这里我们更应当看做是对 h ′ h' h维度中的某些信息进行选择。
  • h t = ( 1 − z ) Θ h t − 1 + z Θ h ′ h^t=(1-z)\Theta h^{t-1}+z\Theta h' ht=(1z)Θht1+zΘh :结合上述,这一步的操作就是忘记传递下来的 h t − 1 h^{t-1} ht1 中的某些维度信息,并加入当前节点输入的某些维度信息。

学习于:https://zhuanlan.zhihu.com/p/32481747

2. LSTM

【Paper】GRU_LSTM introduce_第8张图片

3.LSTM_paper

Paper《Long Short-Term Memory RNN Architectures for Large Scale Acoustic Modeling》

Note:

  1. first distribute training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machine.
  2. speech database TIMIT ,and the test it on a large vocabulary speech recognition task Google Voice Search Task.
  3. 【Paper】GRU_LSTM introduce_第9张图片
  4. 【Paper】GRU_LSTM introduce_第10张图片
  5. 【Paper】GRU_LSTM introduce_第11张图片
  6. how to calculate the total number of parameters and the computational complexity with a moderate number of inputs.
  7. Eigen matrix library c++ 矩阵计算库

Paper《Convolutional,Long Short-Term Memory Fully Connected Deep Neural Networks》

Note:

  1. ​ CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable space.
  2. take advantage of the complementarity of CNNs,LSTMs,DNNs by combining them into one unified architecture,and proposed architecture CLDNN on a variety of large vocabulary tasks.[LVCSR]
  3. previous paper train the three models separately and then the ouput were combined through a combination layer.In this we train in a unified structure.
  4. 【Paper】GRU_LSTM introduce_第12张图片

Paper《Improved Semantic Representations from Tree-Structured LSTM》

Note: author: Kai Sheng Tai stanford.edu

  1. models where real valued vectors are used to represent meaning fall into three class:1.Bag-of-Word 2.sequence models 3. tree structured modes

  2. Test on two task: semantic relatedness prediction on sentence pairs sentiment classification of sentences drawn from movie reviews

  3. available code: https://github.com/stanfordnlp/treelsm ,project : https://nlp.stanford.edu/projects/glove/

  4. previous work:

    1. a problem with RNNs with transition functions of this form is that during training components of the gradient vector can gow or decay exponentially over long sequences.exploding or vanishing gradients make it difficult for the RNN to learn long-distance correlations in a sequence.
    2. Bidirectional LSTM (stacked LSTM) allow the hidden state to capture both past and future information,Multilayer LSTM( deep LSTM) let the higher layers capture longer-term dependencies of the input sequences. they only allow for strictly sequential information propagation
  5. Datastructure:

    【Paper】GRU_LSTM introduce_第13张图片

    【Paper】GRU_LSTM introduce_第14张图片

    【Paper】GRU_LSTM introduce_第15张图片

    【Paper】GRU_LSTM introduce_第16张图片

    【Paper】GRU_LSTM introduce_第17张图片

    【Paper】GRU_LSTM introduce_第18张图片

    1. different from LSTM is that gating vectors and memory cell updates are dependent on the states of possibly many child units. Tree-LSTM unit contains one forget gate Fjk for each child k. to selective incorporate information from each child.
    2. Classification model:【Paper】GRU_LSTM introduce_第19张图片
    3. Semantic Relatedness of Sentence Pairs : given a sentence pair,predict a real-valued similarity score in some range.

Paper《Learning to Forget:Continual Prediction with LSTM》

cited: keyword:

Phenomenon&Challenge:

  1. backpropagated error quickly either vanishes or blows up

Chart&Analyse:

【Paper】GRU_LSTM introduce_第20张图片
【Paper】GRU_LSTM introduce_第21张图片
【Paper】GRU_LSTM introduce_第22张图片




【Paper】GRU_LSTM introduce_第23张图片

Code:

Shortcoming&Confusion:

  1. embedded Reber grammar
  2. 没怎么看懂公式推导过程

Paper《SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS

cited: keyword:

Chart&Analyse:

【Paper】GRU_LSTM introduce_第24张图片
【Paper】GRU_LSTM introduce_第25张图片

Ft表示遗忘门限,It表示输入门限, ̃Ct表示前一时刻cell状态、Ct表示cell状态(这里就是循环发生的地方),Ot表示输出门限,Ht表示当前单元的输出,Ht-1表示前一时刻单元的输出
【Paper】GRU_LSTM introduce_第26张图片
【Paper】GRU_LSTM introduce_第27张图片
【Paper】GRU_LSTM introduce_第28张图片

你可能感兴趣的:(深度学习)