状态方程:
s t = ( 1 − α ) s t − 1 + α tanh ( A s t − 1 + B y t − 1 ) s_t = (1-\alpha) s_{t-1} + \alpha \tanh (As_{t-1} + By_{t-1}) st=(1−α)st−1+αtanh(Ast−1+Byt−1)
或者
s t + 1 = ( 1 − α ) s t + α tanh ( A s t + B y t ) ≜ f ( s t , y t ) \begin{array}{ll} s_{t+1} &= (1-\alpha) s_{t} + \alpha \tanh (As_{t} + B y_{t}) \\ &\triangleq f(s_t, y_t) \end{array} st+1=(1−α)st+αtanh(Ast+Byt)≜f(st,yt)
∂ f t ∂ s t = ( 1 − α ) I + α ∂ ∂ s t [ tanh ( ∑ i A 1 i s t , i + ∑ i B 1 i y t , i ) tanh ( ∑ i A 2 i s t , i + ∑ i B 1 i y t , i ) ⋮ tanh ( ∑ i A n i s t , i + ∑ i B 1 i y t , i ) ] = ( 1 − α ) I + α [ [ 1 − tanh 2 ( ∑ i A 1 i s t , i + ∑ i B 1 i y t , i ) ] A 11 ⋯ [ 1 − tanh 2 ( ∑ i A 1 i s t , i + ∑ i B 1 i y t , i ) ] A 1 n ⋮ ⋱ ⋮ [ 1 − tanh 2 ( ∑ i A n i s t , i + ∑ i B 1 i y t , i ) ] A n 1 ⋯ [ 1 − tanh 2 ( ∑ i A n i s t , i + ∑ i B 1 i y t , i ) ] A n n ] = ( 1 − α ) I + α [ I − d i a g ( tanh 2 ( A s t + B y t ) ) ] A \begin{array}{ll} \frac{\partial f_t}{\partial s_t} &= (1-\alpha) I + \alpha \frac{\partial}{\partial s_t} \left [ \begin{array}{c} \tanh(\sum_iA_{1i}s_{t,i} + \sum_iB_{1i}y_{t,i}) \\ \tanh(\sum_iA_{2i}s_{t,i} + \sum_iB_{1i}y_{t,i}) \\ \vdots \\ \tanh(\sum_iA_{ni}s_{t,i} + \sum_iB_{1i}y_{t,i}) \\ \end{array} \right] \\\\ &= (1-\alpha) I + \alpha \left [ \begin{array}{c} [1-\tanh^2(\sum_iA_{1i}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{11} & \cdots & [1-\tanh^2(\sum_iA_{1i}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{1n}\\ \vdots & \ddots &\vdots\\ [1-\tanh^2(\sum_iA_{ni}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{n1} & \cdots & [1-\tanh^2(\sum_iA_{ni}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{nn} \end{array} \right] \\\\ &= (1-\alpha) I + \alpha [I - diag(\tanh^2(As_t + By_t)) ]A \end{array} ∂st∂ft=(1−α)I+α∂st∂⎣⎢⎢⎢⎡tanh(∑iA1ist,i+∑iB1iyt,i)tanh(∑iA2ist,i+∑iB1iyt,i)⋮tanh(∑iAnist,i+∑iB1iyt,i)⎦⎥⎥⎥⎤=(1−α)I+α⎣⎢⎡[1−tanh2(∑iA1ist,i+∑iB1iyt,i)]A11⋮[1−tanh2(∑iAnist,i+∑iB1iyt,i)]An1⋯⋱⋯[1−tanh2(∑iA1ist,i+∑iB1iyt,i)]A1n⋮[1−tanh2(∑iAnist,i+∑iB1iyt,i)]Ann⎦⎥⎤=(1−α)I+α[I−diag(tanh2(Ast+Byt))]A
所以对于谱范数
∣ ∣ ∂ f t ∂ s t ∣ ∣ 2 ≤ ( 1 − α ) + α ∣ ∣ I − d i a g ( tanh 2 ( A s t + B y t ) ) ∣ ∣ 2 ⋅ ∣ ∣ A ∣ ∣ 2 ≤ 1 − α + α ∣ ∣ A ∣ ∣ 2 ≤ 1 \begin{array}{ll} \left|\left|\frac{\partial f_t}{\partial s_t}\right|\right|_2 & \leq (1-\alpha) + \alpha \left|\left| I - diag(\tanh^2(As_t + By_t)) \right|\right|_2 \cdot \left|\left| A\right|\right|_2 \\ &\leq 1-\alpha + \alpha ||A||_2 \\ &\leq 1 \end{array} ∣∣∣∣∣∣∂st∂ft∣∣∣∣∣∣2≤(1−α)+α∣∣∣∣I−diag(tanh2(Ast+Byt))∣∣∣∣2⋅∣∣A∣∣2≤1−α+α∣∣A∣∣2≤1