BP神经网络推导

示意图

BP神经网络推导_第1张图片

符号说明

y 0 : 输 入 , y ∈ R s 0 × 1 z l : 第 l 层 输 出 z ( l ) ∈ R s l × 1 y l : 第 l 层 输 出 y ( l ) ∈ R s l × 1 σ : 激 活 函 数 s l : 表 示 l 层 y ( l ) z ( l ) 的 向 量 维 数 t : 表 示 真 实 值 L : 一 共 L 层 f i l : 表 示 ∂ y i l ∂ z i l I ( i ) : 表 示 为 列 向 量 , 且 在 第 i 行 为 1 , 其 余 位 置 为 0 ; δ i l : 表 示 ∂ E ∂ y i l δ l : 表 示 ∂ E ∂ y l , 即 为 : ( δ 1 l , δ 2 l , ⋯   , δ s l l ) \begin{aligned} \boldsymbol{y}^{0}: & 输入,\boldsymbol{y}\in \mathbb{R}^{s0\times1} \\ \boldsymbol{z}^{l}: &第l层输出\boldsymbol{z}^{(l)} \in \mathbb{R}^{sl \times 1} \\ \boldsymbol{y}^{l}:&第l层输出\boldsymbol{y}^{(l)} \in \mathbb{R}^{sl \times 1} \\ \boldsymbol{\sigma}:&激活函数\\ sl:& 表示l层 \boldsymbol{y}^{(l)} \boldsymbol{z}^{(l)}的向量维数 \\ \boldsymbol{t}: &表示真实值 \\ L:&一共L层 \\ f^l_{i}:& 表示\frac{\partial{y^l_i}}{\partial{z^l_i}} \\ \boldsymbol{I}(i):&表示为列向量,且在第i行为1,其余位置为0; \\ \delta^l_i: &表示 \frac{\partial{E}}{\partial{y^l_i}} \\ \boldsymbol{\delta}^l: &表示\frac{\partial{E}}{\partial{\boldsymbol{y}^l}} , 即为: \begin{pmatrix} \delta^l_1 ,\delta^l_2 ,\cdots,\delta^l_{sl} \end{pmatrix} \end{aligned} y0:zl:yl:σ:sl:t:L:fil:I(i):δil:δl:yRs0×1lz(l)Rsl×1ly(l)Rsl×1ly(l)z(l)Lzilyili10;yilEylE(δ1l,δ2l,,δsll)
它们之间的关系:
z l = w l ∗ y l − 1 z i l = ∑ j = 1 s ( l − 1 ) w i j ∗ y j l − 1 y l = σ ( y l ) y i l = σ ( z i l ) f ( x ) \begin{aligned} \boldsymbol{z}^{l} &=\boldsymbol{w}^{l}*\boldsymbol{y}^{l-1}\\ z^{l}_i &= \sum_{j=1}^{s(l-1)}w_{ij}*y^{l-1}_j \\ \boldsymbol{y}^{l} &=\boldsymbol{\sigma}(\boldsymbol{y}^{l}) \\ y^{l}_i &= \sigma(z^l_i) \end{aligned} f(\boldsymbol{x}) zlzilylyil=wlyl1=j=1s(l1)wijyjl1=σ(yl)=σ(zil)f(x)

矩阵相关求导说明

符号说明
y : 列 向 量 , y ∈ R n × 1 \boldsymbol{y}:列向量,\boldsymbol{y} \in \mathbb{R}^{n \times 1} y:,yRn×1
x : 列 向 量 , x ∈ R m × 1 \boldsymbol{x}:列向量,\boldsymbol{x} \in \mathbb{R}^{m \times 1} x:,xRm×1
f ( x ) : 实 值 标 量 函 数 , 记 做 f : R m → R f(\boldsymbol{x}):实值标量函数,记做 f: \mathbb{R}^m \to \mathbb{R} f(x):f:RmR
公式
∂ y T ∂ x = ( ∂ y 1 ∂ x 1 ⋯ ∂ y n ∂ x 1 ⋮ ⋮ ∂ y 1 ∂ x m ⋯ ∂ y n ∂ x m ) ∂ y T ∂ y = E n × n f ( x ) ∂ x = [ f ( x ) ∂ x 1 , ⋯   , f ( x ) ∂ x m ] T \begin{aligned} \frac{\partial{\boldsymbol{y}^ \mathrm{T}}}{\partial{\boldsymbol{x}}} &= \begin{pmatrix} \frac{\partial{y_1}}{\partial{x_1}} & \cdots & \frac{\partial{y_n}}{\partial{x_1}} \\ \vdots & & \vdots \\ \frac{\partial{y_1}}{\partial{x_m}} & \cdots & \frac{\partial{y_n}}{\partial{x_m}} \end{pmatrix} \\ \frac{\partial{\boldsymbol{y}^ \mathrm{T}}}{\partial{\boldsymbol{y}}} &= \mathbf{E}_{n \times n} \\ \frac{f(\boldsymbol{x})}{\partial{\boldsymbol{x}}} &= [ \frac{f(\boldsymbol{x})}{\partial{x_1}} , \cdots ,\frac{f(\boldsymbol{x})}{\partial{x_m}}]^{\mathrm{T}} \end{aligned} xyTyyTxf(x)=x1y1xmy1x1ynxmyn=En×n=[x1f(x),,xmf(x)]T

公式推导

误差定义
E = 1 m ∑ p = 1 m ( E p ) E p = 1 2 ( y L − t L ) 2 = 1 2 ∑ i = 1 s L ( y i L − t i ) 2 \begin{aligned} E &=\frac{1}{m}\sum_{p=1}^{m}(E_p) \\ E_p &= \frac{1}{2}(\boldsymbol{y}^L - \boldsymbol{t}^L)^2 \\ &=\frac{1}{2}\sum_{i=1}^{sL}(y^L_i - t_i)^2 \end{aligned} EEp=m1p=1m(Ep)=21(yLtL)2=21i=1sL(yiLti)2

其中m为样本数,为了推导简单,让m=1
∂ E ∂ w i j L \frac{\partial{E}}{\partial{w^L_{ij}}} wijLE
几点说明
∂ z k l ∂ w i j l = { z j l − 1 z = i 0 k ≠ i ∂ z l ∂ w i j = [ ∂ z 1 l ∂ w i j l , ⋯   , ∂ z s l l ∂ w i j l ] T ∈ R s l × 1 = I ( i ) . z i l − 1 ∂ y l ∂ ( z l ) T = ( ∂ y 1 l ∂ z 1 l ⋯ ∂ y 1 l ∂ z s l l ⋮ ⋮ ∂ y s l l ∂ z 1 l ⋯ ∂ y s l l ∂ z s l l ) ∈ R s l × s l = ( f 1 l f 2 l ⋱ f ( s l ) l ) ∂ z l ∂ ( y l − 1 ) T = ( ∂ z 1 l ∂ y 1 l − 1 ⋯ ∂ z 1 l ∂ y s ( l − 1 ) l − 1 ⋮ ⋮ ∂ z s l l ∂ y 1 l − 1 ⋯ ∂ z s l l ∂ y s ( l − 1 ) ( l − 1 ) ) = ( w 11 l ⋯ w ( s ( l − 1 ) ) 1 l ⋮ ⋮ w ( s l ) 1 l ⋯ w ( s l ) ( s ( l − 1 ) ) l ) ∈ R s l × s ( l − 1 ) ∂ y l ∂ y l − 1 = ∂ y l ∂ z l . ∂ z l ∂ y l − 1 = ∂ y l ∂ ( z l ) T . ∂ ( z l ) T ∂ z l . ∂ z l ∂ ( y l − 1 ) T . ∂ ( y l − 1 ) T ∂ y l − 1 = ∂ y l ∂ ( z l ) T . ∂ z l ∂ ( y l − 1 ) T = ( f 1 l w 11 l ⋯ f 1 l w ( s ( l − 1 ) ) 1 ⋮ ⋮ f s l l w ( s l ) 1 l ⋯ f s l l w ( s l ) ( s ( l − 1 ) ) l ) ∈ R s l × s ( l − 1 ) ∂ E ∂ ( y L ) T = [ y 1 L − t 1 , ⋯   , y s L L − t s L ] ∂ z l ∂ z i l − 1 = [ w 1 i l , w 2 i l , ⋯   , w ( s l ) i l ] T ∂ y l ∂ y i l − 1 = ∂ y l ∂ z l . ∂ z l ∂ y i l − 1 = ( f 1 l f 2 l ⋱ f ( s l ) l ) . ( w 1 i l w 2 i l ⋯ w ( s l ) i l ) = ( f 1 l w 1 i l f 2 l w 2 i l ⋮ f ( s l ) l w ( s l ) i l ) \begin{aligned} \frac{\partial{z_k^l}}{\partial{w^l_{ij}}} &= \begin{cases} z^{l-1}_j & z = i \\ 0 & k \ne i \end{cases} \\ \frac{\partial{\boldsymbol{z}^l } }{\partial{w_{ij}}} &= [\frac{\partial{z_1^l}}{\partial{w_{ij}^l}} ,\cdots,\frac{\partial{z_{sl}^l}}{\partial{w_{ij}^l}}]^{\mathrm{T}} \in \mathbb{R}^{sl \times 1} \\ &=\boldsymbol{I}(i).z_i^{l-1} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} &= \begin{pmatrix} \frac{\partial{y_1^l}}{\partial{z_1^l}}& \cdots & \frac{\partial{y_1^l}}{\partial{z_{sl}^l}} \\ \vdots & & \vdots \\ \frac{\partial{y_{sl}^l}}{\partial{z_1^l}}& \cdots & \frac{\partial{y_{sl}^l}}{\partial{z_{sl}^l}} \end{pmatrix} \in \mathbb{R}^{sl \times sl} \\ &= \begin{pmatrix} f^l_1& & & \\ &f^l_2 \\ & &\ddots \\ & & &f^l_{(sl)} \end{pmatrix} \\ \\ \frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}}&= \begin{pmatrix} \frac{\partial{z_1^l}}{\partial{y_1^{l-1}}}& \cdots & \frac{\partial{z_1^l}}{\partial{y_{s(l-1)}^{l-1}}} \\ \vdots & & \vdots \\ \frac{\partial{z_{sl}^l}}{\partial{y_1^{l-1}}}& \cdots & \frac{\partial{z_{sl}^l}}{\partial{y_{s(l-1)}^{(l-1)}}} \end{pmatrix} \\ &= \begin{pmatrix} w_{11}^l& \cdots & w_{(s(l-1))1}^l \\ \vdots & & \vdots \\ w_{(sl)1}^l& \cdots & w_{(sl)(s(l-1))}^l \end{pmatrix} \in \mathbb{R}^{sl \times s(l-1)} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{y}^{l-1}}} &= \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{z}^{l}}} .\frac{\partial{\boldsymbol{z}^l}}{\partial{\boldsymbol{y}^{l-1}}} \\ &=\frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}}{\partial{\boldsymbol{z}^{l}}} . \frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}}{\partial{\boldsymbol{y}^{l-1}}} \\ &= \frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} .\frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}} \\ &= \begin{pmatrix} f^l_1w_{11}^l & \cdots & f^l_1w_{(s(l-1))1} \\ \vdots & & \vdots \\ f^l_{sl}w_{(sl)1}^l& \cdots & f^l_{sl}w_{(sl)(s(l-1))}^l \end{pmatrix} \in \mathbb{R^{sl \times s(l-1)}} \\ \\ \frac{\partial{E}}{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}} &=[y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL}] \\ \\ \frac{\partial{\boldsymbol{z}^l}}{\partial{z^{l-1}_i}} &=[w^l_{1i},w^l_{2i},\cdots,w^l_{(sl)i}]^{\mathrm{T}} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{y^{l-1}_i}} &= \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{z}^l}}. \frac{\partial{\boldsymbol{z}^l}}{\partial{y^{l-1}_i}} \\ &= \begin{pmatrix} f^l_1& & & \\ &f^l_2 \\ & &\ddots \\ & & &f^l_{(sl)} \end{pmatrix}. \begin{pmatrix} w^l_{1i} \\ w^l_{2i} \\ \cdots \\ w^l_{(sl)i} \end{pmatrix} \\ &= \begin{pmatrix} f^l_1w^l_{1i} \\ f^l_2 w^l_{2i} \\ \vdots \\ f^l_{(sl)} w^l_{(sl)i} \end{pmatrix} \end{aligned} wijlzklwijzl(zl)Tyl(yl1)Tzlyl1yl(yL)TEzil1zlyil1yl={zjl10z=ik̸=i=[wijlz1l,,wijlzsll]TRsl×1=I(i).zil1=z1ly1lz1lysllzslly1lzsllysllRsl×sl=f1lf2lf(sl)l=y1l1z1ly1l1zsllys(l1)l1z1lys(l1)(l1)zsll=w11lw(sl)1lw(s(l1))1lw(sl)(s(l1))lRsl×s(l1)=zlyl.yl1zl=(zl)Tyl.zl(zl)T.(yl1)Tzl.yl1(yl1)T=(zl)Tyl.(yl1)Tzl=f1lw11lfsllw(sl)1lf1lw(s(l1))1fsllw(sl)(s(l1))lRsl×s(l1)=[y1Lt1,,ysLLtsL]=[w1il,w2il,,w(sl)il]T=zlyl.yil1zl=f1lf2lf(sl)l.w1ilw2ilw(sl)il=f1lw1ilf2lw2ilf(sl)lw(sl)il

求解
∂ E ∂ w i j L = ∂ E ∂ y L . ∂ y L ∂ w i j L = ∂ E ∂ ( y L ) T . ∂ ( y L ) T ∂ y L . ∂ y L ∂ w i j L . = ( y 1 L − t 1 , ⋯   , y s L L − t s L ) . I ( i ) . z i L − 1 ∂ E ∂ w i j L − 1 = ∂ E ∂ y L . ∂ y L ∂ y L − 1 . ∂ y L − 1 ∂ w i j L − 1 = ( y 1 L − t 1 , ⋯   , y s L L − t s L ) . ( f 1 L w 11 L ⋯ f 1 L w ( s ( L − 1 ) ) 1 ⋮ ⋮ f s L L w ( s L ) 1 L ⋯ f s L L w ( s L ) ( s ( L − 1 ) ) L ) . I ( i ) . z i l − 1 = ∑ k = 1 s l ( y k L − t k ) f 1 L w k i L z j L − 1 \begin{aligned} \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{w^L_{ij}}} \\ &=\frac{\partial{E}}{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{w^L_{ij}}} . \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \boldsymbol{I}(i).z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{L-1}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{\boldsymbol{y}^{L-1}}}. \frac{\partial{\boldsymbol{y}^{L-1}}}{\partial{w^{L-1}_{ij}}} \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \begin{pmatrix} f^L_1w_{11}^L & \cdots & f^L_1w_{(s(L-1))1} \\ \vdots & & \vdots \\ f^L_{sL}w_{(sL)1}^L& \cdots & f^L_{sL}w_{(sL)(s(L-1))}^L \end{pmatrix} . \boldsymbol{I}(i).z_i^{l-1} \\ &= \sum_{k=1}^{sl}(y_k^L-t_k) f_1^L w^L_{ki}z^{L-1}_j \end{aligned} wijLEwijL1E=yLE.wijLyL=(yL)TE.yL(yL)T.wijLyL.=(y1Lt1,,ysLLtsL).I(i).ziL1=yLE.yL1yL.wijL1yL1=(y1Lt1,,ysLLtsL).f1Lw11LfsLLw(sL)1Lf1Lw(s(L1))1fsLLw(sL)(s(L1))L.I(i).zil1=k=1sl(ykLtk)f1LwkiLzjL1
另一种定义方法
∂ E ∂ w i j L = ∂ E ∂ y i L . ∂ y i L ∂ w i j L = ( y i L − h i ) z i L − 1 ∂ E ∂ w i j L − 1 = ∂ E ∂ y L . ∂ y L ∂ y i L − 1 . ∂ y i L − 1 ∂ w i j L − 1 = ( y 1 L − t 1 , ⋯   , y s L L − t s L ) . ( f 1 L w 1 i L f 2 L w 2 i L ⋮ f ( s L ) L w ( s L ) i L ) . z i L − 1 = ∑ k = 1 s l ( y k L − t k ) f 1 L w k i L z j L − 1 δ i l − 1 = ∂ E ∂ y i l − 1 = ∂ E ∂ y l . ∂ y l ∂ y i l − 1 = ( δ 1 l , δ 2 l , ⋯   , δ s l l ) . ( f 1 l w 1 i l f 2 l w 2 i l ⋮ f ( s l ) l w ( s l ) i l ) = ∑ k = 1 s l δ k l f k l w ( s l ) i l ∂ E ∂ w i j L = ∂ E ∂ y i L . ∂ y i L ∂ w i j L = δ i L z i L − 1 = ( y i L − h i ) z i L − 1 ∂ E ∂ w i j l = ∂ E ∂ y l . ∂ y i l ∂ w i j l = δ i l z i l − 1 = ∑ k = 1 s ( l + 1 ) δ k ( l + 1 ) f k l + 1 w k i l + 1 z j l − 1 \begin{aligned} \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{y^L_i}} . \frac{\partial{y^L_i}}{\partial{w^L_{ij}}} \\ &= (y^L_i - h_i) z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{L-1}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{y^{L-1}_i}}. \frac{\partial{y^{L-1}_i}}{\partial{w^{L-1}_{ij}}} \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \begin{pmatrix} f^L_1w^L_{1i} \\ f^L_2 w^L_{2i} \\ \vdots \\ f^L_{(sL)} w^L_{(sL)i} \end{pmatrix} .z_i^{L-1} \\ &= \sum_{k=1}^{sl}(y_k^L-t_k) f_1^L w^L_{ki}z^{L-1}_j \\ \\ \delta^{l-1}_i&=\frac{\partial{E}}{\partial{y^{l-1}_i}} \\ &=\frac{\partial{E}}{\partial{\boldsymbol{y}^{l}}}. \frac{\partial{\boldsymbol{y}^{l}}}{\partial{y^{l-1}_i}} \\ &= \begin{pmatrix} \delta^l_1 ,\delta^l_2 ,\cdots,\delta^l_{sl} \end{pmatrix}. \begin{pmatrix} f^l_1w^l_{1i} \\ f^l_2 w^l_{2i} \\ \vdots \\ f^l_{(sl)} w^l_{(sl)i} \end{pmatrix} \\ &=\sum_{k=1}^{sl}\delta^l_k f^l_k w^l_{(sl)i} \\ \\ \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{y^L_i}} . \frac{\partial{y^L_i}}{\partial{w^L_{ij}}} \\ &= \delta^L_i z_i^{L-1} \\ &= (y^L_i - h_i) z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{l}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^l}} . \frac{\partial{y^{l}_i}}{\partial{w^{l}_{ij}}} \\ &=\delta^{l}_i z_i^{l-1} \\ &=\sum_{k=1}^{s(l+1)} \delta^{(l+1)}_k f^{l+1}_k w^{l+1}_{ki}z_j^{l-1} \end{aligned} wijLEwijL1Eδil1wijLEwijlE=yiLE.wijLyiL=(yiLhi)ziL1=yLE.yiL1yL.wijL1yiL1=(y1Lt1,,ysLLtsL).f1Lw1iLf2Lw2iLf(sL)Lw(sL)iL.ziL1=k=1sl(ykLtk)f1LwkiLzjL1=yil1E=ylE.yil1yl=(δ1l,δ2l,,δsll).f1lw1ilf2lw2ilf(sl)lw(sl)il=k=1slδklfklw(sl)il=yiLE.wijLyiL=δiLziL1=(yiLhi)ziL1=ylE.wijlyil=δilzil1=k=1s(l+1)δk(l+1)fkl+1wkil+1zjl1

你可能感兴趣的:(机器学习)