by:Z.H.Gao
N—L—M拓展N—L—L—M,样本数为n。
F i g . 1 E l m a n 网 络 样 本 逐 个 输 入 模 式 Fig.1 Elman网络样本逐个输入模式 Fig.1Elman网络样本逐个输入模式
F i g . 2 E l m a n 网 络 单 个 样 本 网 络 结 构 Fig.2 Elman网络单个样本网络结构 Fig.2Elman网络单个样本网络结构
h i = f ( t e m p h i ) = f ( v x i + b i n + u h i − 1 ) {h^i}{\rm{ = }}f\left( {temp{h^i}} \right) = f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right) hi=f(temphi)=f(vxi+bin+uhi−1)
y i = f ( t e m p y i ) = f ( w h i ) = f ( w ⋅ f ( v x i + b i n + u h i − 1 ) ) {y^i} = f\left( {temp{y^i}} \right) = f\left( {w{h^i}} \right) = f\left( {w \cdot f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)} \right) yi=f(tempyi)=f(whi)=f(w⋅f(vxi+bin+uhi−1))
随着输入数据的不断增加,自循环的结构把上一次的状态传递给当前输入,一起作为新的输入数据进行当前轮次的训练和学习,一直到输入或者训练结束,最终得到的输出即为最终的预测结果。
隐藏层:[ h 1 × L i h_{1 \times L}^i h1×Li]
承接层:[ h 1 × L i − 1 h_{1 \times L}^{i - 1} h1×Li−1]
承接层与隐藏层之间链接权值:[ u L × L {u_{L \times L}} uL×L]
输出链接权值:[ w L × M {w_{L \times M}} wL×M]
输出层:[ y 1 × M i y_{1 \times M}^i y1×Mi]
此时i表示样本,每次输入一个样本。因为存在承接层所以通常逐个输入样本,当然也可以逐批输入。
h i = f ( t e m p h i ) = f ( v x i + b i n + u h i − 1 ) {h^i}{\rm{ = }}f\left( {temp{h^i}} \right) = f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right) hi=f(temphi)=f(vxi+bin+uhi−1)
y i = f ( t e m p y i ) = f ( w h i ) = f ( w ⋅ f ( v x i + b i n + u h i − 1 ) ) {y^i} = f\left( {temp{y^i}} \right) = f\left( {w{h^i}} \right) = f\left( {w \cdot f\left( {v{x^i} + {b_{in}} + u{h^{i - 1}}} \right)} \right) yi=f(tempyi)=f(whi)=f(w⋅f(vxi+bin+uhi−1))
其中,i为样本序号。
设计损失函数为 J ( Y , T a r g e t ) J\left( {Y,Target} \right) J(Y,Target),那么对于单个样本i。
注:统一使用表示元素乘法,×表示矩阵乘法*
∂ J i ∂ t e m p y i = ∂ J i ∂ y i ∗ ∂ y i ∂ t e m p y i = ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) \frac{ {\partial {J^i}}}{ {\partial temp{y^i}}} = \frac{ {\partial {J^i}}}{ {\partial {y^i}}}*\frac{ {\partial {y^i}}}{ {\partial temp{y^i}}} = \frac{ {\partial {J^i}}}{ {\partial {y^i}}}*f'\left( {temp{y^i}} \right) ∂tempyi∂Ji=∂yi∂Ji∗∂tempyi∂yi=∂yi∂Ji∗f′(tempyi)
∂ J i ∂ w = ∂ J i ∂ y i ∗ ∂ y i ∂ t e m p y i × ∂ t e m p y i ∂ w = ( h i ) T × ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) \frac{ {\partial {J^i}}}{ {\partial w}} = \frac{ {\partial {J^i}}}{ {\partial {y^i}}}*\frac{ {\partial {y^i}}}{ {\partial temp{y^i}}} \times \frac{ {\partial temp{y^i}}}{ {\partial w}} = {\left( { {h^i}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial {y^i}}}*f'\left( {temp{y^i}} \right) ∂w∂Ji=∂yi∂Ji∗∂tempyi∂yi×∂w∂tempyi=(hi)T×∂yi∂Ji∗f′(tempyi)
∂ J i ∂ h i = ∂ J i ∂ y i ∗ ∂ y i ∂ t e m p y i × ∂ t e m p y i ∂ h i = ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) × w T \frac{ {\partial {J^i}}}{ {\partial {h^i}}} = \frac{ {\partial {J^i}}}{ {\partial {y^i}}}*\frac{ {\partial {y^i}}}{ {\partial temp{y^i}}} \times \frac{ {\partial temp{y^i}}}{ {\partial {h^i}}} = \frac{ {\partial {J^i}}}{ {\partial {y^i}}}*f'\left( {temp{y^i}} \right) \times {w^T} ∂hi∂Ji=∂yi∂Ji∗∂tempyi∂yi×∂hi∂tempyi=∂yi∂Ji∗f′(tempyi)×wT
那么对于所有的样本,w的梯度计算等同于普通前馈神经网络
∂ J ∂ w = ∑ i = 1 n ( h i ) T × ∂ J i ∂ y i ∗ f ′ ( t e m p y i ) = H T × ∂ J ∂ Y ∗ f ′ ( t e m p Y ) \frac{ {\partial J}}{ {\partial w}} = \sum\limits_{i = 1}^n { { {\left( { {h^i}} \right)}^T} \times \frac{ {\partial {J^i}}}{ {\partial {y^i}}}*f'\left( {temp{y^i}} \right)} = {H^T} \times \frac{ {\partial J}}{ {\partial Y}}*f'\left( {tempY} \right) ∂w∂J=i=1∑n(hi)T×∂yi∂Ji∗f′(tempyi)=HT×∂Y∂J∗f′(tempY)
同理, ∂ J ∂ H = ∂ J ∂ Y ∗ f ′ ( t e m p Y ) × w T \frac{ {\partial J}}{ {\partial H}} = \frac{ {\partial J}}{ {\partial Y}}*f'\left( {tempY} \right) \times {w^T} ∂H∂J=∂Y∂J∗f′(tempY)×wT
参数u的计算关系到当前样本与之前样本的链接,需要用“循环”计算梯度。
∂ J ∂ t e m p H = ∂ J ∂ H ∗ f ′ ( t e m p H ) = [ ∂ J ∂ Y ∗ f ′ ( t e m p Y ) × w T ] ∗ f ′ ( t e m p H ) \frac{ {\partial J}}{ {\partial tempH}} = \frac{ {\partial J}}{ {\partial H}}*f'\left( {tempH} \right) = \left[ {\frac{ {\partial J}}{ {\partial Y}}*f'\left( {tempY} \right) \times {w^T}} \right]*f'\left( {tempH} \right) ∂tempH∂J=∂H∂J∗f′(tempH)=[∂Y∂J∗f′(tempY)×wT]∗f′(tempH)
则, ∂ J i ∂ t e m p h i = ∂ J ∂ t e m p H ( i , : ) \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} = \frac{ {\partial J}}{ {\partial tempH}}\left( {i,:} \right) ∂temphi∂Ji=∂tempH∂J(i,:),循环的重点,每次计算单个样本:
∂ J i ∂ h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 = ∂ J i ∂ t e m p h i × u T ∂ J i ∂ t e m p h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 = ( ∂ J i ∂ t e m p h i × u T ) ∗ f ′ ( t e m p h i − 1 ) \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial {h^{i - 1}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \times {u^T}\\ \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}}*\frac{ {\partial {h^{i - 1}}}}{ {\partial temp{h^{i - 1}}}} = \left( {\frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 1}}} \right) \end{array} ∂hi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi=∂temphi∂Ji×uT∂temphi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1=(∂temphi∂Ji×uT)∗f′(temphi−1)
∂ J i ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 × u T ∂ J i ∂ t e m p h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 2 ∂ t e m p h i − 2 = ( ∂ J i ∂ t e m p h i − 1 × u T ) ∗ f ′ ( t e m p h i − 2 ) \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial {h^{i - 2}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}}*\frac{ {\partial temp{h^{i - 1}}}}{ {\partial {h^{i - 2}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}} \times {u^T}\\ \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}}*\frac{ {\partial temp{h^{i - 1}}}}{ {\partial {h^{i - 2}}}}*\frac{ {\partial {h^{i - 2}}}}{ {\partial temp{h^{i - 2}}}} = \left( {\frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 2}}} \right) \end{array} ∂hi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1=∂temphi−1∂Ji×uT∂temphi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1∗∂temphi−2∂hi−2=(∂temphi−1∂Ji×uT)∗f′(temphi−2)
循环是为了计算当前样本误差Ji受前k次样本的影响。在计算上是利用当前样本误差Ji去计算前k次网络与当前网络之间的链接权值u。
对于单个样本i而言,其对当前网络的影响可以计算相应的梯度:
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ u = ( h i − 1 ) T × ∂ J i ∂ t e m p h i ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ x i = ( x i ) T × ∂ J i ∂ t e m p h i \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial u}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial u}} = {\left( { {h^{i - 1}}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}\\ \frac{ {\partial {J^i}}}{ {\partial v}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {x^i}}} = {\left( { {x^i}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \end{array} ∂u∂Ji=∂temphi∂Ji∗∂u∂temphi=(hi−1)T×∂temphi∂Ji∂v∂Ji=∂temphi∂Ji∗∂xi∂temphi=(xi)T×∂temphi∂Ji
那么前k个样本对于单个样本i的影响,都需要通过参数u和v,有
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ u = ∑ k = 1 i ( h k − 1 ) T × ∂ J i ∂ t e m p h k ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ x i = ∑ k = 1 i ( x k ) T × ∂ J i ∂ t e m p h k \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial u}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial u}} = \sum\limits_{k = 1}^i {\left( { {h^{k - 1}}} \right)^T} \times{\frac{ {\partial {J^i}}}{ {\partial temp{h^k}}}} \\ \frac{ {\partial {J^i}}}{ {\partial v}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {x^i}}} = \sum\limits_{k = 1}^i{\left( { {x^k}} \right)^T} \times {\frac{ {\partial {J^i}}}{ {\partial temp{h^k}}}} \end{array} ∂u∂Ji=∂temphi∂Ji∗∂u∂temphi=k=1∑i(hk−1)T×∂temphk∂Ji∂v∂Ji=∂temphi∂Ji∗∂xi∂temphi=k=1∑i(xk)T×∂temphk∂Ji
假设k=3,显然有
∂ J i ∂ h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 = ∂ J i ∂ t e m p h i × u T ∂ J i ∂ t e m p h i − 1 = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 = ( ∂ J i ∂ t e m p h i × u T ) ∗ f ′ ( t e m p h i − 1 ) \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial {h^{i - 1}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \times {u^T}\\ \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}}*\frac{ {\partial {h^{i - 1}}}}{ {\partial temp{h^{i - 1}}}} = \left( {\frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 1}}} \right) \end{array} ∂hi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi=∂temphi∂Ji×uT∂temphi−1∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1=(∂temphi∂Ji×uT)∗f′(temphi−1)
∂ J i ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 = ∂ J i ∂ t e m p h i − 1 × u T ∂ J i ∂ t e m p h i − 2 = ∂ J i ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 2 ∂ t e m p h i − 2 = ( ∂ J i ∂ t e m p h i − 1 × u T ) ∗ f ′ ( t e m p h i − 2 ) \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial {h^{i - 2}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}}*\frac{ {\partial temp{h^{i - 1}}}}{ {\partial {h^{i - 2}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}} \times {u^T}\\ \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}}*\frac{ {\partial temp{h^{i - 1}}}}{ {\partial {h^{i - 2}}}}*\frac{ {\partial {h^{i - 2}}}}{ {\partial temp{h^{i - 2}}}} = \left( {\frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 2}}} \right) \end{array} ∂hi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1=∂temphi−1∂Ji×uT∂temphi−2∂Ji=∂temphi−1∂Ji∗∂hi−2∂temphi−1∗∂temphi−2∂hi−2=(∂temphi−1∂Ji×uT)∗f′(temphi−2)
∂ J i ∂ h i − 3 = ∂ J i ∂ t e m p h i − 2 ∗ ∂ t e m p h i − 2 ∂ h i − 3 = ∂ J i ∂ t e m p h i − 2 × u T ∂ J i ∂ t e m p h i − 3 = ∂ J i ∂ t e m p h i − 2 ∗ ∂ t e m p h i − 2 ∂ h i − 3 ∗ ∂ h i − 3 ∂ t e m p h i − 3 = ( ∂ J i ∂ t e m p h i − 2 × u T ) ∗ f ′ ( t e m p h i − 3 ) \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial {h^{i - 3}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}}*\frac{ {\partial temp{h^{i - 2}}}}{ {\partial {h^{i - 3}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}} \times {u^T}\\ \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 3}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}}*\frac{ {\partial temp{h^{i - 2}}}}{ {\partial {h^{i - 3}}}}*\frac{ {\partial {h^{i - 3}}}}{ {\partial temp{h^{i - 3}}}} = \left( {\frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - 3}}} \right) \end{array} ∂hi−3∂Ji=∂temphi−2∂Ji∗∂hi−3∂temphi−2=∂temphi−2∂Ji×uT∂temphi−3∂Ji=∂temphi−2∂Ji∗∂hi−3∂temphi−2∗∂temphi−3∂hi−3=(∂temphi−2∂Ji×uT)∗f′(temphi−3)
可以归纳其通式:
∂ J i ∂ h i − k = ∂ J i ∂ t e m p h i − k + 1 × u T ∂ J i ∂ t e m p h i − 3 = ( ∂ J i ∂ t e m p h i − k + 1 × u T ) ∗ f ′ ( t e m p h i − k ) \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial {h^{i - k}}}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - k + 1}}}} \times {u^T}\\ \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 3}}}} = \left( {\frac{ {\partial {J^i}}}{ {\partial temp{h^{i - k + 1}}}} \times {u^T}} \right)*f'\left( {temp{h^{i - k}}} \right) \end{array} ∂hi−k∂Ji=∂temphi−k+1∂Ji×uT∂temphi−3∂Ji=(∂temphi−k+1∂Ji×uT)∗f′(temphi−k)
相应的对于参数u和v有:
∂ J i ∂ u = ∂ J i ∂ t e m p h i × ∂ t e m p h i ∂ u = ( h i − 1 ) T × ∂ J i ∂ t e m p h i ∂ J i ∂ v = ∂ J i ∂ t e m p h i × ∂ t e m p h i ∂ x i = ( x i ) T × ∂ J i ∂ t e m p h i \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial u}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \times \frac{ {\partial temp{h^i}}}{ {\partial u}} = {\left( { {h^{i - 1}}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}\\ \frac{ {\partial {J^i}}}{ {\partial v}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \times \frac{ {\partial temp{h^i}}}{ {\partial {x^i}}} = {\left( { {x^i}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}} \end{array} ∂u∂Ji=∂temphi∂Ji×∂u∂temphi=(hi−1)T×∂temphi∂Ji∂v∂Ji=∂temphi∂Ji×∂xi∂temphi=(xi)T×∂temphi∂Ji
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 × ∂ t e m p h i − 1 ∂ u = ( h i − 2 ) T × ∂ J i ∂ t e m p h i − 1 ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 × ∂ t e m p h i − 1 ∂ v = ( x i − 1 ) T × ∂ J i ∂ t e m p h i − 1 \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial u}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}}*\frac{ {\partial {h^{i - 1}}}}{ {\partial temp{h^{i - 1}}}} \times \frac{ {\partial temp{h^{i - 1}}}}{ {\partial u}} = {\left( { {h^{i - 2}}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}}\\ \frac{ {\partial {J^i}}}{ {\partial v}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}}*\frac{ {\partial {h^{i - 1}}}}{ {\partial temp{h^{i - 1}}}} \times \frac{ {\partial temp{h^{i - 1}}}}{ {\partial v}} = {\left( { {x^{i - 1}}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 1}}}} \end{array} ∂u∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1×∂u∂temphi−1=(hi−2)T×∂temphi−1∂Ji∂v∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1×∂v∂temphi−1=(xi−1)T×∂temphi−1∂Ji
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 1 ∂ t e m p h i − 2 × ∂ t e m p h i − 2 ∂ u = ( h i − 3 ) T × ∂ J i ∂ t e m p h i − 2 ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ h i − 1 ∗ ∂ h i − 1 ∂ t e m p h i − 1 ∗ ∂ t e m p h i − 1 ∂ h i − 2 ∗ ∂ h i − 2 ∂ t e m p h i − 2 × ∂ t e m p h i − 2 ∂ v = ( x i − 2 ) T × ∂ J i ∂ t e m p h i − 2 \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial u}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}}*\frac{ {\partial {h^{i - 1}}}}{ {\partial temp{h^{i - 1}}}}*\frac{ {\partial temp{h^{i - 1}}}}{ {\partial {h^{i - 2}}}}*\frac{ {\partial {h^{i - 1}}}}{ {\partial temp{h^{i - 2}}}} \times \frac{ {\partial temp{h^{i - 2}}}}{ {\partial u}} = {\left( { {h^{i - 3}}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}}\\ \frac{ {\partial {J^i}}}{ {\partial v}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {h^{i - 1}}}}*\frac{ {\partial {h^{i - 1}}}}{ {\partial temp{h^{i - 1}}}}*\frac{ {\partial temp{h^{i - 1}}}}{ {\partial {h^{i - 2}}}}*\frac{ {\partial {h^{i - 2}}}}{ {\partial temp{h^{i - 2}}}} \times \frac{ {\partial temp{h^{i - 2}}}}{ {\partial v}} = {\left( { {x^{i - 2}}} \right)^T} \times \frac{ {\partial {J^i}}}{ {\partial temp{h^{i - 2}}}} \end{array} ∂u∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1∗∂hi−2∂temphi−1∗∂temphi−2∂hi−1×∂u∂temphi−2=(hi−3)T×∂temphi−2∂Ji∂v∂Ji=∂temphi∂Ji∗∂hi−1∂temphi∗∂temphi−1∂hi−1∗∂hi−2∂temphi−1∗∂temphi−2∂hi−2×∂v∂temphi−2=(xi−2)T×∂temphi−2∂Ji
通过将反向传播到前k层的链接权值u和v求和,得到最终的梯度结果:
∂ J i ∂ u = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ u = ∑ k = 1 i ( h k − 1 ) T × ∂ J i ∂ t e m p h k ∂ J i ∂ v = ∂ J i ∂ t e m p h i ∗ ∂ t e m p h i ∂ x i = ∑ k = 1 i ( x k ) T × ∂ J i ∂ t e m p h k \begin{array}{l} \frac{ {\partial {J^i}}}{ {\partial u}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial u}} = \sum\limits_{k = 1}^i {\left( { {h^{k - 1}}} \right)^T} \times{\frac{ {\partial {J^i}}}{ {\partial temp{h^k}}}} \\ \frac{ {\partial {J^i}}}{ {\partial v}} = \frac{ {\partial {J^i}}}{ {\partial temp{h^i}}}*\frac{ {\partial temp{h^i}}}{ {\partial {x^i}}} = \sum\limits_{k = 1}^i{\left( { {x^k}} \right)^T} \times {\frac{ {\partial {J^i}}}{ {\partial temp{h^k}}}} \end{array} ∂u∂Ji=∂temphi∂Ji∗∂u∂temphi=k=1∑i(hk−1)T×∂temphk∂Ji∂v∂Ji=∂temphi∂Ji∗∂xi∂temphi=k=1∑i(xk)T×∂temphk∂Ji
https://blog.csdn.net/vendetta_gg/article/details/106444683
[1] https://zhuanlan.zhihu.com/p/26891871
[2] https://zhuanlan.zhihu.com/p/26892413
[3] https://zybuluo.com/hanbingtao/note/541458