y 0 : 输 入 , y ∈ R s 0 × 1 z l : 第 l 层 输 出 z ( l ) ∈ R s l × 1 y l : 第 l 层 输 出 y ( l ) ∈ R s l × 1 σ : 激 活 函 数 s l : 表 示 l 层 y ( l ) z ( l ) 的 向 量 维 数 t : 表 示 真 实 值 L : 一 共 L 层 f i l : 表 示 ∂ y i l ∂ z i l I ( i ) : 表 示 为 列 向 量 , 且 在 第 i 行 为 1 , 其 余 位 置 为 0 ; δ i l : 表 示 ∂ E ∂ y i l δ l : 表 示 ∂ E ∂ y l , 即 为 : ( δ 1 l , δ 2 l , ⋯   , δ s l l ) \begin{aligned} \boldsymbol{y}^{0}: & 输入,\boldsymbol{y}\in \mathbb{R}^{s0\times1} \\ \boldsymbol{z}^{l}: &第l层输出\boldsymbol{z}^{(l)} \in \mathbb{R}^{sl \times 1} \\ \boldsymbol{y}^{l}:&第l层输出\boldsymbol{y}^{(l)} \in \mathbb{R}^{sl \times 1} \\ \boldsymbol{\sigma}:&激活函数\\ sl:& 表示l层 \boldsymbol{y}^{(l)} \boldsymbol{z}^{(l)}的向量维数 \\ \boldsymbol{t}: &表示真实值 \\ L:&一共L层 \\ f^l_{i}:& 表示\frac{\partial{y^l_i}}{\partial{z^l_i}} \\ \boldsymbol{I}(i):&表示为列向量,且在第i行为1,其余位置为0; \\ \delta^l_i: &表示 \frac{\partial{E}}{\partial{y^l_i}} \\ \boldsymbol{\delta}^l: &表示\frac{\partial{E}}{\partial{\boldsymbol{y}^l}} , 即为: \begin{pmatrix} \delta^l_1 ,\delta^l_2 ,\cdots,\delta^l_{sl} \end{pmatrix} \end{aligned} y0:zl:yl:σ:sl:t:L:fil:I(i):δil:δl:输入,y∈Rs0×1第l层输出z(l)∈Rsl×1第l层输出y(l)∈Rsl×1激活函数表示l层y(l)z(l)的向量维数表示真实值一共L层表示∂zil∂yil表示为列向量,且在第i行为1,其余位置为0;表示∂yil∂E表示∂yl∂E,即为:(δ1l,δ2l,⋯,δsll)
它们之间的关系:
z l = w l ∗ y l − 1 z i l = ∑ j = 1 s ( l − 1 ) w i j ∗ y j l − 1 y l = σ ( y l ) y i l = σ ( z i l ) f ( x ) \begin{aligned} \boldsymbol{z}^{l} &=\boldsymbol{w}^{l}*\boldsymbol{y}^{l-1}\\ z^{l}_i &= \sum_{j=1}^{s(l-1)}w_{ij}*y^{l-1}_j \\ \boldsymbol{y}^{l} &=\boldsymbol{\sigma}(\boldsymbol{y}^{l}) \\ y^{l}_i &= \sigma(z^l_i) \end{aligned} f(\boldsymbol{x}) zlzilylyil=wl∗yl−1=j=1∑s(l−1)wij∗yjl−1=σ(yl)=σ(zil)f(x)
符号说明
y : 列 向 量 , y ∈ R n × 1 \boldsymbol{y}:列向量,\boldsymbol{y} \in \mathbb{R}^{n \times 1} y:列向量,y∈Rn×1
x : 列 向 量 , x ∈ R m × 1 \boldsymbol{x}:列向量,\boldsymbol{x} \in \mathbb{R}^{m \times 1} x:列向量,x∈Rm×1
f ( x ) : 实 值 标 量 函 数 , 记 做 f : R m → R f(\boldsymbol{x}):实值标量函数,记做 f: \mathbb{R}^m \to \mathbb{R} f(x):实值标量函数,记做f:Rm→R
公式
∂ y T ∂ x = ( ∂ y 1 ∂ x 1 ⋯ ∂ y n ∂ x 1 ⋮ ⋮ ∂ y 1 ∂ x m ⋯ ∂ y n ∂ x m ) ∂ y T ∂ y = E n × n f ( x ) ∂ x = [ f ( x ) ∂ x 1 , ⋯   , f ( x ) ∂ x m ] T \begin{aligned} \frac{\partial{\boldsymbol{y}^ \mathrm{T}}}{\partial{\boldsymbol{x}}} &= \begin{pmatrix} \frac{\partial{y_1}}{\partial{x_1}} & \cdots & \frac{\partial{y_n}}{\partial{x_1}} \\ \vdots & & \vdots \\ \frac{\partial{y_1}}{\partial{x_m}} & \cdots & \frac{\partial{y_n}}{\partial{x_m}} \end{pmatrix} \\ \frac{\partial{\boldsymbol{y}^ \mathrm{T}}}{\partial{\boldsymbol{y}}} &= \mathbf{E}_{n \times n} \\ \frac{f(\boldsymbol{x})}{\partial{\boldsymbol{x}}} &= [ \frac{f(\boldsymbol{x})}{\partial{x_1}} , \cdots ,\frac{f(\boldsymbol{x})}{\partial{x_m}}]^{\mathrm{T}} \end{aligned} ∂x∂yT∂y∂yT∂xf(x)=⎝⎜⎛∂x1∂y1⋮∂xm∂y1⋯⋯∂x1∂yn⋮∂xm∂yn⎠⎟⎞=En×n=[∂x1f(x),⋯,∂xmf(x)]T
误差定义
E = 1 m ∑ p = 1 m ( E p ) E p = 1 2 ( y L − t L ) 2 = 1 2 ∑ i = 1 s L ( y i L − t i ) 2 \begin{aligned} E &=\frac{1}{m}\sum_{p=1}^{m}(E_p) \\ E_p &= \frac{1}{2}(\boldsymbol{y}^L - \boldsymbol{t}^L)^2 \\ &=\frac{1}{2}\sum_{i=1}^{sL}(y^L_i - t_i)^2 \end{aligned} EEp=m1p=1∑m(Ep)=21(yL−tL)2=21i=1∑sL(yiL−ti)2
其中m为样本数,为了推导简单,让m=1
求 ∂ E ∂ w i j L \frac{\partial{E}}{\partial{w^L_{ij}}} ∂wijL∂E
几点说明
∂ z k l ∂ w i j l = { z j l − 1 z = i 0 k ≠ i ∂ z l ∂ w i j = [ ∂ z 1 l ∂ w i j l , ⋯   , ∂ z s l l ∂ w i j l ] T ∈ R s l × 1 = I ( i ) . z i l − 1 ∂ y l ∂ ( z l ) T = ( ∂ y 1 l ∂ z 1 l ⋯ ∂ y 1 l ∂ z s l l ⋮ ⋮ ∂ y s l l ∂ z 1 l ⋯ ∂ y s l l ∂ z s l l ) ∈ R s l × s l = ( f 1 l f 2 l ⋱ f ( s l ) l ) ∂ z l ∂ ( y l − 1 ) T = ( ∂ z 1 l ∂ y 1 l − 1 ⋯ ∂ z 1 l ∂ y s ( l − 1 ) l − 1 ⋮ ⋮ ∂ z s l l ∂ y 1 l − 1 ⋯ ∂ z s l l ∂ y s ( l − 1 ) ( l − 1 ) ) = ( w 11 l ⋯ w ( s ( l − 1 ) ) 1 l ⋮ ⋮ w ( s l ) 1 l ⋯ w ( s l ) ( s ( l − 1 ) ) l ) ∈ R s l × s ( l − 1 ) ∂ y l ∂ y l − 1 = ∂ y l ∂ z l . ∂ z l ∂ y l − 1 = ∂ y l ∂ ( z l ) T . ∂ ( z l ) T ∂ z l . ∂ z l ∂ ( y l − 1 ) T . ∂ ( y l − 1 ) T ∂ y l − 1 = ∂ y l ∂ ( z l ) T . ∂ z l ∂ ( y l − 1 ) T = ( f 1 l w 11 l ⋯ f 1 l w ( s ( l − 1 ) ) 1 ⋮ ⋮ f s l l w ( s l ) 1 l ⋯ f s l l w ( s l ) ( s ( l − 1 ) ) l ) ∈ R s l × s ( l − 1 ) ∂ E ∂ ( y L ) T = [ y 1 L − t 1 , ⋯   , y s L L − t s L ] ∂ z l ∂ z i l − 1 = [ w 1 i l , w 2 i l , ⋯   , w ( s l ) i l ] T ∂ y l ∂ y i l − 1 = ∂ y l ∂ z l . ∂ z l ∂ y i l − 1 = ( f 1 l f 2 l ⋱ f ( s l ) l ) . ( w 1 i l w 2 i l ⋯ w ( s l ) i l ) = ( f 1 l w 1 i l f 2 l w 2 i l ⋮ f ( s l ) l w ( s l ) i l ) \begin{aligned} \frac{\partial{z_k^l}}{\partial{w^l_{ij}}} &= \begin{cases} z^{l-1}_j & z = i \\ 0 & k \ne i \end{cases} \\ \frac{\partial{\boldsymbol{z}^l } }{\partial{w_{ij}}} &= [\frac{\partial{z_1^l}}{\partial{w_{ij}^l}} ,\cdots,\frac{\partial{z_{sl}^l}}{\partial{w_{ij}^l}}]^{\mathrm{T}} \in \mathbb{R}^{sl \times 1} \\ &=\boldsymbol{I}(i).z_i^{l-1} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} &= \begin{pmatrix} \frac{\partial{y_1^l}}{\partial{z_1^l}}& \cdots & \frac{\partial{y_1^l}}{\partial{z_{sl}^l}} \\ \vdots & & \vdots \\ \frac{\partial{y_{sl}^l}}{\partial{z_1^l}}& \cdots & \frac{\partial{y_{sl}^l}}{\partial{z_{sl}^l}} \end{pmatrix} \in \mathbb{R}^{sl \times sl} \\ &= \begin{pmatrix} f^l_1& & & \\ &f^l_2 \\ & &\ddots \\ & & &f^l_{(sl)} \end{pmatrix} \\ \\ \frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}}&= \begin{pmatrix} \frac{\partial{z_1^l}}{\partial{y_1^{l-1}}}& \cdots & \frac{\partial{z_1^l}}{\partial{y_{s(l-1)}^{l-1}}} \\ \vdots & & \vdots \\ \frac{\partial{z_{sl}^l}}{\partial{y_1^{l-1}}}& \cdots & \frac{\partial{z_{sl}^l}}{\partial{y_{s(l-1)}^{(l-1)}}} \end{pmatrix} \\ &= \begin{pmatrix} w_{11}^l& \cdots & w_{(s(l-1))1}^l \\ \vdots & & \vdots \\ w_{(sl)1}^l& \cdots & w_{(sl)(s(l-1))}^l \end{pmatrix} \in \mathbb{R}^{sl \times s(l-1)} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{y}^{l-1}}} &= \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{z}^{l}}} .\frac{\partial{\boldsymbol{z}^l}}{\partial{\boldsymbol{y}^{l-1}}} \\ &=\frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}}{\partial{\boldsymbol{z}^{l}}} . \frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}}{\partial{\boldsymbol{y}^{l-1}}} \\ &= \frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} .\frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}} \\ &= \begin{pmatrix} f^l_1w_{11}^l & \cdots & f^l_1w_{(s(l-1))1} \\ \vdots & & \vdots \\ f^l_{sl}w_{(sl)1}^l& \cdots & f^l_{sl}w_{(sl)(s(l-1))}^l \end{pmatrix} \in \mathbb{R^{sl \times s(l-1)}} \\ \\ \frac{\partial{E}}{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}} &=[y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL}] \\ \\ \frac{\partial{\boldsymbol{z}^l}}{\partial{z^{l-1}_i}} &=[w^l_{1i},w^l_{2i},\cdots,w^l_{(sl)i}]^{\mathrm{T}} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{y^{l-1}_i}} &= \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{z}^l}}. \frac{\partial{\boldsymbol{z}^l}}{\partial{y^{l-1}_i}} \\ &= \begin{pmatrix} f^l_1& & & \\ &f^l_2 \\ & &\ddots \\ & & &f^l_{(sl)} \end{pmatrix}. \begin{pmatrix} w^l_{1i} \\ w^l_{2i} \\ \cdots \\ w^l_{(sl)i} \end{pmatrix} \\ &= \begin{pmatrix} f^l_1w^l_{1i} \\ f^l_2 w^l_{2i} \\ \vdots \\ f^l_{(sl)} w^l_{(sl)i} \end{pmatrix} \end{aligned} ∂wijl∂zkl∂wij∂zl∂(zl)T∂yl∂(yl−1)T∂zl∂yl−1∂yl∂(yL)T∂E∂zil−1∂zl∂yil−1∂yl={zjl−10z=ik̸=i=[∂wijl∂z1l,⋯,∂wijl∂zsll]T∈Rsl×1=I(i).zil−1=⎝⎜⎜⎜⎛∂z1l∂y1l⋮∂z1l∂ysll⋯⋯∂zsll∂y1l⋮∂zsll∂ysll⎠⎟⎟⎟⎞∈Rsl×sl=⎝⎜⎜⎛f1lf2l⋱f(sl)l⎠⎟⎟⎞=⎝⎜⎜⎜⎜⎛∂y1l−1∂z1l⋮∂y1l−1∂zsll⋯⋯∂ys(l−1)l−1∂z1l⋮∂ys(l−1)(l−1)∂zsll⎠⎟⎟⎟⎟⎞=⎝⎜⎛w11l⋮w(sl)1l⋯⋯w(s(l−1))1l⋮w(sl)(s(l−1))l⎠⎟⎞∈Rsl×s(l−1)=∂zl∂yl.∂yl−1∂zl=∂(zl)T∂yl.∂zl∂(zl)T.∂(yl−1)T∂zl.∂yl−1∂(yl−1)T=∂(zl)T∂yl.∂(yl−1)T∂zl=⎝⎜⎛f1lw11l⋮fsllw(sl)1l⋯⋯f1lw(s(l−1))1⋮fsllw(sl)(s(l−1))l⎠⎟⎞∈Rsl×s(l−1)=[y1L−t1,⋯,ysLL−tsL]=[w1il,w2il,⋯,w(sl)il]T=∂zl∂yl.∂yil−1∂zl=⎝⎜⎜⎛f1lf2l⋱f(sl)l⎠⎟⎟⎞.⎝⎜⎜⎛w1ilw2il⋯w(sl)il⎠⎟⎟⎞=⎝⎜⎜⎜⎛f1lw1ilf2lw2il⋮f(sl)lw(sl)il⎠⎟⎟⎟⎞
求解
∂ E ∂ w i j L = ∂ E ∂ y L . ∂ y L ∂ w i j L = ∂ E ∂ ( y L ) T . ∂ ( y L ) T ∂ y L . ∂ y L ∂ w i j L . = ( y 1 L − t 1 , ⋯   , y s L L − t s L ) . I ( i ) . z i L − 1 ∂ E ∂ w i j L − 1 = ∂ E ∂ y L . ∂ y L ∂ y L − 1 . ∂ y L − 1 ∂ w i j L − 1 = ( y 1 L − t 1 , ⋯   , y s L L − t s L ) . ( f 1 L w 11 L ⋯ f 1 L w ( s ( L − 1 ) ) 1 ⋮ ⋮ f s L L w ( s L ) 1 L ⋯ f s L L w ( s L ) ( s ( L − 1 ) ) L ) . I ( i ) . z i l − 1 = ∑ k = 1 s l ( y k L − t k ) f 1 L w k i L z j L − 1 \begin{aligned} \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{w^L_{ij}}} \\ &=\frac{\partial{E}}{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{w^L_{ij}}} . \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \boldsymbol{I}(i).z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{L-1}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{\boldsymbol{y}^{L-1}}}. \frac{\partial{\boldsymbol{y}^{L-1}}}{\partial{w^{L-1}_{ij}}} \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \begin{pmatrix} f^L_1w_{11}^L & \cdots & f^L_1w_{(s(L-1))1} \\ \vdots & & \vdots \\ f^L_{sL}w_{(sL)1}^L& \cdots & f^L_{sL}w_{(sL)(s(L-1))}^L \end{pmatrix} . \boldsymbol{I}(i).z_i^{l-1} \\ &= \sum_{k=1}^{sl}(y_k^L-t_k) f_1^L w^L_{ki}z^{L-1}_j \end{aligned} ∂wijL∂E∂wijL−1∂E=∂yL∂E.∂wijL∂yL=∂(yL)T∂E.∂yL∂(yL)T.∂wijL∂yL.=(y1L−t1,⋯,ysLL−tsL).I(i).ziL−1=∂yL∂E.∂yL−1∂yL.∂wijL−1∂yL−1=(y1L−t1,⋯,ysLL−tsL).⎝⎜⎛f1Lw11L⋮fsLLw(sL)1L⋯⋯f1Lw(s(L−1))1⋮fsLLw(sL)(s(L−1))L⎠⎟⎞.I(i).zil−1=k=1∑sl(ykL−tk)f1LwkiLzjL−1
另一种定义方法
∂ E ∂ w i j L = ∂ E ∂ y i L . ∂ y i L ∂ w i j L = ( y i L − h i ) z i L − 1 ∂ E ∂ w i j L − 1 = ∂ E ∂ y L . ∂ y L ∂ y i L − 1 . ∂ y i L − 1 ∂ w i j L − 1 = ( y 1 L − t 1 , ⋯   , y s L L − t s L ) . ( f 1 L w 1 i L f 2 L w 2 i L ⋮ f ( s L ) L w ( s L ) i L ) . z i L − 1 = ∑ k = 1 s l ( y k L − t k ) f 1 L w k i L z j L − 1 δ i l − 1 = ∂ E ∂ y i l − 1 = ∂ E ∂ y l . ∂ y l ∂ y i l − 1 = ( δ 1 l , δ 2 l , ⋯   , δ s l l ) . ( f 1 l w 1 i l f 2 l w 2 i l ⋮ f ( s l ) l w ( s l ) i l ) = ∑ k = 1 s l δ k l f k l w ( s l ) i l ∂ E ∂ w i j L = ∂ E ∂ y i L . ∂ y i L ∂ w i j L = δ i L z i L − 1 = ( y i L − h i ) z i L − 1 ∂ E ∂ w i j l = ∂ E ∂ y l . ∂ y i l ∂ w i j l = δ i l z i l − 1 = ∑ k = 1 s ( l + 1 ) δ k ( l + 1 ) f k l + 1 w k i l + 1 z j l − 1 \begin{aligned} \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{y^L_i}} . \frac{\partial{y^L_i}}{\partial{w^L_{ij}}} \\ &= (y^L_i - h_i) z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{L-1}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{y^{L-1}_i}}. \frac{\partial{y^{L-1}_i}}{\partial{w^{L-1}_{ij}}} \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \begin{pmatrix} f^L_1w^L_{1i} \\ f^L_2 w^L_{2i} \\ \vdots \\ f^L_{(sL)} w^L_{(sL)i} \end{pmatrix} .z_i^{L-1} \\ &= \sum_{k=1}^{sl}(y_k^L-t_k) f_1^L w^L_{ki}z^{L-1}_j \\ \\ \delta^{l-1}_i&=\frac{\partial{E}}{\partial{y^{l-1}_i}} \\ &=\frac{\partial{E}}{\partial{\boldsymbol{y}^{l}}}. \frac{\partial{\boldsymbol{y}^{l}}}{\partial{y^{l-1}_i}} \\ &= \begin{pmatrix} \delta^l_1 ,\delta^l_2 ,\cdots,\delta^l_{sl} \end{pmatrix}. \begin{pmatrix} f^l_1w^l_{1i} \\ f^l_2 w^l_{2i} \\ \vdots \\ f^l_{(sl)} w^l_{(sl)i} \end{pmatrix} \\ &=\sum_{k=1}^{sl}\delta^l_k f^l_k w^l_{(sl)i} \\ \\ \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{y^L_i}} . \frac{\partial{y^L_i}}{\partial{w^L_{ij}}} \\ &= \delta^L_i z_i^{L-1} \\ &= (y^L_i - h_i) z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{l}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^l}} . \frac{\partial{y^{l}_i}}{\partial{w^{l}_{ij}}} \\ &=\delta^{l}_i z_i^{l-1} \\ &=\sum_{k=1}^{s(l+1)} \delta^{(l+1)}_k f^{l+1}_k w^{l+1}_{ki}z_j^{l-1} \end{aligned} ∂wijL∂E∂wijL−1∂Eδil−1∂wijL∂E∂wijl∂E=∂yiL∂E.∂wijL∂yiL=(yiL−hi)ziL−1=∂yL∂E.∂yiL−1∂yL.∂wijL−1∂yiL−1=(y1L−t1,⋯,ysLL−tsL).⎝⎜⎜⎜⎛f1Lw1iLf2Lw2iL⋮f(sL)Lw(sL)iL⎠⎟⎟⎟⎞.ziL−1=k=1∑sl(ykL−tk)f1LwkiLzjL−1=∂yil−1∂E=∂yl∂E.∂yil−1∂yl=(δ1l,δ2l,⋯,δsll).⎝⎜⎜⎜⎛f1lw1ilf2lw2il⋮f(sl)lw(sl)il⎠⎟⎟⎟⎞=k=1∑slδklfklw(sl)il=∂yiL∂E.∂wijL∂yiL=δiLziL−1=(yiL−hi)ziL−1=∂yl∂E.∂wijl∂yil=δilzil−1=k=1∑s(l+1)δk(l+1)fkl+1wkil+1zjl−1