NNDL 作业10:第六章课后题(LSTM | GRU)

NNDL 作业10:第六章课后题(LSTM | GRU)

  • 习题6-3 当使用公式(6.50)作为循环神经网络得状态更新公式时,分析其可能存在梯度爆炸的原因并给出解决办法.
  • 习题6-4 推导LSTM网络中参数的梯度,并分析其避免梯度消失的效果
  • 习题6-5 推导GRU网络中参数的梯度,并分析其避免梯度消失的效果
  • 附加题 6-1P 什么时候应该用GRU? 什么时候用LSTM?
  • 总结:

习题6-3 当使用公式(6.50)作为循环神经网络得状态更新公式时,分析其可能存在梯度爆炸的原因并给出解决办法.

按照上次作业的推导,误差项 δ t , k \delta _{t,k} δt,k,在计算误差项时可能会梯度过大从而导致梯度爆炸问题。

解决办法:加入门控机制来控制信息的累计速度,包括有选择地加入新的信息,并有选择地遗忘之前累积的信息。

习题6-4 推导LSTM网络中参数的梯度,并分析其避免梯度消失的效果

NNDL 作业10:第六章课后题(LSTM | GRU)_第1张图片
在这里插入图片描述

求导后, f t f_{t} ft独立不与其他相乘,所以 ∂ L ∂ W \frac{\partial L}{\partial W} WL就不太容易趋于0。同时,门控机制会遗忘一部分信息,不会全部传递梯度。可以避免梯度消失和梯度爆炸。

习题6-5 推导GRU网络中参数的梯度,并分析其避免梯度消失的效果

NNDL 作业10:第六章课后题(LSTM | GRU)_第2张图片

GRU有调节信息流动的门单元,但没有一个单独的记忆单元,GRU将输入门和遗忘门整合成一个门,通过门控制梯度。避免梯度消失。
推导有些问题,不太会,下面有网上一个可参考,但是还不明白,回头再看看。
参考:https://www.cnblogs.com/whyaza/p/15503804.html

附加题 6-1P 什么时候应该用GRU? 什么时候用LSTM?

GRU作者在Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling的回答:

These two units however have a number of differences as well. One feature of the LSTM unit that
is missing from the GRU is the controlled exposure of the memory content. In the LSTM unit, the
amount of the memory content that is seen, or used by other units in the network is controlled by the
output gate. On the other hand the GRU exposes its full content without any control.
Another difference is in the location of the input gate, or the corresponding reset gate. The LSTM
unit computes the new memory content without any separate control of the amount of information
flowing from the previous time step. Rather, the LSTM unit controls the amount of the new memory
content being added to the memory cell independently from the forget gate. On the other hand, the
GRU controls the information flow from the previous activation when computing the new, candidate
activation, but does not independently control the amount of the candidate activation being added
(the control is tied via the update gate).
From these similarities and differences alone, it is difficult to conclude which types of gating units
would perform better in general. Although Bahdanau et al. [2014] reported that these two units performed comparably to each other according to their preliminary experiments on machine translation,
it is unclear whether this applies as well to tasks other than machine translation. This motivates us
to conduct more thorough empirical comparison between the LSTM unit and the GRU in this paper.
知乎上有这段的解读:https://www.zhihu.com/question/345650042/answer/1268609513
GRU的优点是其模型的简单性 ,因此更适用于构建较大的网络。它只有两个门控,从计算角度看,它的效率更高,它的可扩展性有利于构筑较大的模型;但是LSTM更加的强大和灵活,因为它具有三个门控。
但是具体情况还是都试试为好。

总结:

这次作业对数学推导能力要求较高,我在网上找了一些推导过程,还没有全部理解,要提升自己的数学能力了。
参考:https://blog.csdn.net/qq_38147421/article/details/107692418

你可能感兴趣的:(lstm,gru,深度学习)