LeNet-5 CNN 反向传播过程(back propagation)推导

LeNet-5 CNN 反向传播过程(back propagation)推导

by ztx

文章目录

  • LeNet-5 CNN 反向传播过程(back propagation)推导
    • 说明
    • 前置技能
    • MLP-NN BP
      • 结构
      • 正向传播
      • 反向传播
    • LeNet-5 CNN
      • 结构
      • 传播过程推导
          • F6>>>OUTPUT
          • C5>>>F6
          • S4>>>C5
          • C3>>>S4
          • S2>>>C3
          • C1>>>S2
          • INPUT>>>C1

说明

本篇博客是学习过程中的笔记,谨慎阅读,欢迎指正。
时间仓促,LeNet-5的具体结构可能会有偏差,具体以论文为准,留坑待填。
另外,csdn的latex公式挂掉了,不知道怎么回事,以代码形式给出式子。
KaTex 多行公式只支持aligned,不支持align = =

前置技能

感知器,数学技巧,脑补能力

MLP-NN BP

MLP-NN(Multilayer perceptron neural network 多层感知器神经网络)

结构

LeNet-5 CNN 反向传播过程(back propagation)推导_第1张图片
上面是自己画的一张图。

正向传播

过程如上图中左边箭头指示的过程。
y k = f ( ∑ j = 0 H f ( ∑ i = 0 D x i W j i ) W k j ) y_k =f\left( \sum_{j=0}^{H} f\left( \sum_{i=0}^{D}x_iW_{ji} \right) W_{kj} \right) yk=f(j=0Hf(i=0DxiWji)Wkj)

反向传播

Loss function:L2 distance
E ( W ) = 1 2 ∑ k = 0 c − 1 ( t k − y k ) 2 = 1 2 ∣ ∣ t − y ∣ ∣ 2 E(\mathbf{W}) =\frac{1}{2}\sum_{k=0}^{c-1}(t_k − y_k)^2 =\frac{1}{2}||\mathbf{t} −\mathbf{y}||^2 E(W)=21k=0c1(tkyk)2=21ty2

Activation function:sigmoid

f ( x ) = 1 1 + e − x f(x) = \frac{1}{1+e^{-x}} f(x)=1+ex1

Sigmoid Derivative function:

f ′ ( x ) = f ( x ) ( 1 − f ( x ) ) f'(x)=f(x)(1-f(x)) f(x)=f(x)(1f(x))

隐藏层到输出层的BP如下

∂ E ∂ W k j = ∂ E ∂ y k ∂ y k ∂ n e t k ∂ n e t k ∂ W k j = − ( t k − y k ) f ′ ( n e t k ) y j = δ k y j \begin{aligned} \frac{\partial E}{\partial W_{kj}} &= \frac{\partial E}{\partial y_k}\frac{\partial y_k}{\partial net_{k}} \frac{\partial net_{k}}{\partial W_{kj}} \\ &= -(t_k-y_k)f'(net_k)y_j \\ &= \delta_k y_j \end{aligned} WkjE=ykEnetkykWkjnetk=(tkyk)f(netk)yj=δkyj

其中:

δ k = − ( t k − y k ) f ′ ( n e t k ) = ∂ E ∂ n e t k \delta_k = -(t_k-y_k)f'(net_k) = \frac{\partial E}{\partial net_{k}} δk=(tkyk)f(netk)=netkE

设学习率为 η \eta η则梯度下降权值更新为:

Δ W k j = η δ k y j \Delta W_{kj} = \eta \delta_k y_j ΔWkj=ηδkyj

输入层到隐藏层的BP如下

∂ E ∂ W j i = ∂ E ∂ y j ∂ y j ∂ n e t j ∂ n e t j ∂ W j i = ∂ ∂ y j ( 1 2 ∑ k = 0 c − 1 ( t k − y k ) 2 ) f ′ ( n e t j ) x i = ( − ∑ k = 0 c − 1 ( t k − y k ) ∂ y k ∂ y j ) f ′ ( n e t j ) x i = ( − ∑ k = 0 c − 1 ( t k − y k ) ∂ y k ∂ n e t k ∂ n e t k ∂ y j ) f ′ ( n e t j ) x i = ( − ∑ k = 0 c − 1 ( t k − y k ) f ′ ( n e t k ) W k j ) f ′ ( n e t j ) x i = ( ∑ k = 0 c − 1 δ k W k j ) f ′ ( n e t j ) x i = δ j x i \begin{aligned} \frac{\partial E}{\partial W_{ji}} &= \frac{\partial E}{\partial y_j} \frac{\partial y_j}{\partial net_{j}} \frac{\partial net_{j}}{\partial W_{ji}} \\ &= \frac{\partial }{\partial y_j} \left( \frac{1}{2}\sum_{k=0}^{c-1}(t_k-y_k)^2 \right) f'(net_j) x_i \\ &= \left( -\sum_{k=0}^{c-1}(t_k-y_k)\frac{\partial y_k}{\partial y_j} \right) f'(net_j) x_i \\ &= \left( -\sum_{k=0}^{c-1}(t_k-y_k)\frac{\partial y_k}{\partial net_k}\frac{\partial net_k}{\partial y_j} \right) f'(net_j) x_i \\ &= \left( -\sum_{k=0}^{c-1}(t_k-y_k)f'(net_k)W_{kj} \right) f'(net_j) x_i \\ &= \left( \sum_{k=0}^{c-1}\delta_k W_{kj} \right) f'(net_j) x_i \\ &= \delta_j x_i \end{aligned} WjiE=yjEnetjyjWjinetj=yj(21k=0c1(tkyk)2)f(netj)xi=(k=0c1(tkyk)yjyk)f(netj)xi=(k=0c1(tkyk)netkykyjnetk)f(netj)xi=(k=0c1(tkyk)f(netk)Wkj)f(netj)xi=(k=0c1δkWkj)f(netj)xi=δjxi
其中:
δ j = ( ∑ k = 0 c − 1 δ k W k j ) f ′ ( n e t j ) = ∂ E ∂ n e t j \delta_j =\left( \sum_{k=0}^{c-1}\delta_k W_{kj} \right) f'(net_j) = \frac{\partial E}{\partial net_{j}} δj=(k=0c1δkWkj)f(netj)=netjE
设学习率为 η \eta η则梯度下降权值更新为:

Δ W j i = η δ j x i \Delta W_{ji} = \eta \delta_j x_i ΔWji=ηδjxi

LeNet-5 CNN

结构

LeNet-5 CNN 反向传播过程(back propagation)推导_第2张图片

C1:

5*5的卷积核6个

S2:

类似于平均池化,sigmoid激活

C3:

5*5的卷积核16个

S4:

类似于平均池化,sigmoid激活

C5:

5*5的卷积核120个

F6:

120*84的全连接,加84的偏置,tanh激活

OUTPUT:

84*10的径向基函数

传播过程推导

首先定义 l l l

n e t net net值为 x l x^l xl

输出为 y l = f ( x l ) y^l=f(x^l) yl=f(xl)

δ l = ∂ E ∂ x l \delta^{l} = \frac{\partial E}{\partial x^l} δl=xlE

激活函数为 f l ( ⋅ ) f^l(\cdot) fl()

输入为上一层的输出 x l − 1 x^{l-1} xl1

当前层的参数为 W l W^l Wl, b i a s l bias^l biasl

F6>>>OUTPUT

正向传播过程( l − 1 l-1 l1为F6, l l l为OUTPUT, l + 1 l+1 l+1为loss)

LeNet5的OUTPUT使用的径向基(RBF)函数
x j l = ∑ i = 0 83 ( y i l − 1 − W i , j l ) 2 y j l = f l ( x j l ) = x j l x^{l}_j = \sum_{i=0}^{83}(y^{l-1}_i-W^{l}_{i,j})^2 \\ y^l_j = f^l(x^l_j) = x^l_j xjl=i=083(yil1Wi,jl)2yjl=fl(xjl)=xjl

反向传播

论文中这里的 W W W参数是似乎是固定的?暂时不懂。如果训练的话如下

δ i l = ∂ E ∂ x i l = ∂ E ∂ y i l ∂ y i l ∂ x i l = ∂ E ∂ y i l f l ′ ( x i l ) = ∂ E ∂ y i l \begin{aligned} \delta^{l}_i &= \frac{\partial E}{\partial x^{l}_i}\\ &= \frac{\partial E}{\partial y^{l}_i} \frac{\partial y^l_i}{\partial x^{l}_i} \\ &= \frac{\partial E}{\partial y^{l}_i} {f^{l}}' (x^l_i) \\ &= \frac{\partial E}{\partial y^{l}_i} \end{aligned} δil=xilE=yilExilyil=yilEfl(xil)=yilE

∂ E ∂ y i l \frac{\partial E}{\partial y^{l}_i} yilE需要在loss层给出

∂ E ∂ W i , j l = ∂ E ∂ x j l ∂ x j l ∂ W i , j l = δ j l ⋅ 2 ( W i , j l − y i l − 1 ) \begin{aligned} \frac{\partial E}{\partial W^{l}_{i,j}} &= \frac{\partial E}{\partial x^{l}_j} \frac{\partial x^{l}_j}{\partial W^{l}_{i,j}} \\ &= \delta^{l}_j \cdot 2 (W^{l}_{i,j} - y^{l-1}_i) \end{aligned} Wi,jlE=xjlEWi,jlxjl=δjl2(Wi,jlyil1)

∂ E ∂ y i l − 1 = ∑ j = 0 83 ∂ E ∂ x j l ∂ x j l ∂ y i l − 1 = ∑ j = 0 83 ∂ E ∂ x j l ⋅ 2 ( y i l − 1 − W i , j l ) = ∑ j = 0 83 δ j l ⋅ 2 ( y i l − 1 − W i , j l ) \begin{aligned} \frac{\partial E}{\partial y^{l-1}_i} &= \sum_{j=0}^{83} \frac{\partial E}{\partial x^{l}_j} \frac{\partial x^{l}_j}{\partial y^{l-1}_i}\\ &= \sum_{j=0}^{83} \frac{\partial E}{\partial x^{l}_j} \cdot 2(y^{l-1}_i-W^{l}_{i,j}) \\ &= \sum_{j=0}^{83}\delta^{l}_j \cdot 2(y^{l-1}_i-W^{l}_{i,j}) \\ \end{aligned} yil1E=j=083xjlEyil1xjl=j=083xjlE2(yil1Wi,jl)=j=083δjl2(yil1Wi,jl)

C5>>>F6

120*84的全连接,加84的偏置,与MLP-NN过程相同,激活函数使用了 y = f ( x ) = A   tanh ⁡ ( S x ) y = f(x) = A \ \tanh (S x) y=f(x)=A tanh(Sx) A A A 1.7159 1.7159 1.7159

S4>>>C5

C5是一个卷积层。

现在不管形状,讨论卷积的反向传播。

设原图像大小为 ( N , N ) (N,N) (N,N)其中每个像素点都可能是一个值或者向量或者多维的数据。

使用大小为 ( m , m ) (m,m) (m,m)的卷积核 K K K个,每个卷积核参数为 W a , b l k , 0 ≤ k < K , 0 ≤ a , b < m W^{lk}_{a,b},0\le k < K, 0\le a,b <m Wa,blk,0k<K,0a,b<m,每一个卷积核都有一个偏置 b i a s l k , 0 ≤ k < K bias^{lk},0\le k <K biaslk,0k<K W W W中的每一个数值的维度都与原图 ( N , N ) (N,N) (N,N)中像素的维度相同, K K K个卷积核之后会形成新的图像,像素点变为 K K K维。

正向传播:
x i , j l k = ∑ a = 0 m − 1 ∑ b = 0 m − 1 W a , b l k y i + a , j + b l − 1 + b i a s l k , 0 ≤ k < K y i , j l k = f l ( x i , j l k ) = x i , j l k x^{lk}_{i,j} = \sum_{a=0}^{m-1}\sum_{b=0}^{m-1}W^{lk}_{a,b}y^{l-1}_{i+a,j+b}+bias^{lk},0\le k<K\\ y^{lk}_{i,j}=f^l(x^{lk}_{i,j}) = x^{lk}_{i,j} xi,jlk=a=0m1b=0m1Wa,blkyi+a,j+bl1+biaslk,0k<Kyi,jlk=fl(xi,jlk)=xi,jlk
LeNet5只有F6层和池化层有激活函数,卷积层没有激活函数。

由于不同卷积核进行的过程是相同的,之后省略 x l k , W l k x^{lk},W^{lk} xlk,Wlk等中的 k k k

现在已知 ∂ E ∂ y i l \frac{\partial E}{\partial y^{l}_i} yilE由上一层给出,求 ∂ E ∂ W a , b l \frac{\partial E}{\partial W^{l}_{a,b}} Wa,blE

首先,求出当前层的 δ l \delta^l δl

δ i , j l = ∂ E ∂ x i , j l = ∂ E ∂ y i , j l ∂ y i , j l ∂ x i , j l = ∂ E ∂ y i , j l f l ′ ( x i , j l ) = ∂ E ∂ y i , j l \begin{aligned} \delta^{l}_{i,j} &= \frac{\partial E}{\partial x^{l}_{i,j}}\\ &= \frac{\partial E}{\partial y^{l}_{i,j}} \frac{\partial y^l_{i,j}}{\partial x^{l}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} {f^{l}}' (x^l_{i,j}) \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} \end{aligned} δi,jl=xi,jlE=yi,jlExi,jlyi,jl=yi,jlEfl(xi,jl)=yi,jlE

参数权重变化

∂ E ∂ W a , b l = ∑ i = 0 N − m ∑ j = 0 N − m ∂ E ∂ x i , j l ∂ x i , j l ∂ W a , b l = ∑ i = 0 N − m ∑ j = 0 N − m δ i , j l y i + a , j + b l − 1 \begin{aligned} \frac{\partial E}{\partial W^{l}_{a,b}} &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial W^{l}_{a,b}} \\ &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \delta^{l}_{i,j} y^{l-1}_{i+a,j+b} \\ \end{aligned} Wa,blE=i=0Nmj=0Nmxi,jlEWa,blxi,jl=i=0Nmj=0Nmδi,jlyi+a,j+bl1

∂ E ∂ b i a s l = ∑ i = 0 N − m ∑ j = 0 N − m ∂ E ∂ x i , j l ∂ x i , j l ∂ b i a s l = ∑ i = 0 N − m ∑ j = 0 N − m δ i , j l \begin{aligned} \frac{\partial E}{\partial bias^{l}} &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial bias^{l}} \\ &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \delta^{l}_{i,j} \end{aligned} biaslE=i=0Nmj=0Nmxi,jlEbiaslxi,jl=i=0Nmj=0Nmδi,jl

计算下一层的 ∂ E ∂ y i , j l − 1 \frac{\partial E}{\partial y^{l-1}_{i,j}} yi,jl1E

∂ E ∂ y i , j l − 1 = ∑ a = 0 m − 1 ∑ b = 0 m − 1 ∂ E ∂ x i − a , j − b l ∂ x i − a , j − b l ∂ y i , j l − 1 = ∑ a = 0 m − 1 ∑ b = 0 m − 1 δ i − a , j − b l W a , b l \begin{aligned} \frac{\partial E}{\partial y^{l-1}_{i,j}} &= \sum_{a=0}^{m-1}\sum_{b=0}^{m-1}\frac{\partial E}{\partial x^{l}_{i-a,j-b}} \frac{\partial x^{l}_{i-a,j-b}}{\partial y^{l-1}_{i,j}}\\ &= \sum_{a=0}^{m-1}\sum_{b=0}^{m-1} \delta^{l}_{i-a,j-b} W^{l}_{a,b} \end{aligned} yi,jl1E=a=0m1b=0m1xia,jblEyi,jl1xia,jbl=a=0m1b=0m1δia,jblWa,bl

这样这一层的推导已经做完了。

再讨论卷积层之间 δ \delta δ的关系。

若已知 l l l层卷积层的 δ l \delta^l δl如果 l − 1 l-1 l1层也是卷积层求 l − 1 l-1 l1层的 δ l − 1 \delta^{l-1} δl1

δ i , j l − 1 = ∂ E ∂ x i , j l − 1 = ∂ E ∂ y i , j l − 1 ∂ y i , j l − 1 ∂ x i , j l − 1 = ( ∑ a = 0 m − 1 ∑ b = 0 m − 1 δ i − a , j − b l W a , b l ) f l − 1 ′ ( x i , j l − 1 ) = f l − 1 ′ ( x i , j l − 1 ) ⋅ ∑ a = 0 m − 1 ∑ b = 0 m − 1 δ i − a , j − b l W a , b l \begin{aligned} \delta^{l-1}_{i,j} &= \frac{\partial E}{\partial x^{l-1}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l-1}_{i,j}} \frac{\partial y^{l-1}_{i,j} }{\partial x^{l-1}_{i,j}} \\ &= \left( \sum_{a=0}^{m-1}\sum_{b=0}^{m-1} \delta^{l}_{i-a,j-b} W^{l}_{a,b} \right) {f^{l-1}}'(x^{l-1}_{i,j}) \\ &= {f^{l-1}}'(x^{l-1}_{i,j}) \cdot \sum_{a=0}^{m-1}\sum_{b=0}^{m-1} \delta^{l}_{i-a,j-b} W^{l}_{a,b}\\ \end{aligned} δi,jl1=xi,jl1E=yi,jl1Exi,jl1yi,jl1=(a=0m1b=0m1δia,jblWa,bl)fl1(xi,jl1)=fl1(xi,jl1)a=0m1b=0m1δia,jblWa,bl

这里 δ l − 1 \delta^{l-1} δl1相当于使用旋转180度的 W l W^{l} Wl δ l \delta^l δl上进行卷积,定义旋转180度后的 W W W r o t 180 ( W ) rot180(W) rot180(W),则:
δ l − 1 = f l ′ ( x l − 1 ) ⋅ ( δ l ⊗ r o t 180 ( W l ) ) \delta^{l-1} = {f^l}'(x^{l-1}) \cdot (\delta^l \otimes rot180(W^{l})) δl1=fl(xl1)(δlrot180(Wl))

注意到LeNet-5中没有卷积层直接与卷积层相连接的地方,所以上式没有用到。以后应该会用到吧(大概

C3>>>S4

S4是池化层,在LeNet5中,使用的是 2 × 2 2\times 2 2×2的区域加和乘权重 W l W^{l} Wl在加上偏置 b i a s l bias^{l} biasl

设池化后大小为 N , N N,N N,N

正向传播:
x i , j l = ( ∑ a = 0 1 ∑ b = 0 1 y 2 i + a , 2 j + b l − 1 ) ⋅ W l + b i a s l y i , j l = s i g m o i d ( x i , j l ) x^{l}_{i,j} = \left( \sum_{a=0}^{1}\sum_{b=0}^{1} y^{l-1}_{2i+a,2j+b} \right)\cdot W^{l} + bias^{l}\\ y^{l}_{i,j} = sigmoid(x^{l}_{i,j}) xi,jl=(a=01b=01y2i+a,2j+bl1)Wl+biaslyi,jl=sigmoid(xi,jl)

反向传播:

δ i , j l = ∂ E ∂ x i , j l = ∂ E ∂ y i , j l ∂ y i , j l ∂ x i , j l = ∂ E ∂ y i , j l s i g m o i d ′ ( x i , j l ) = ∂ E ∂ y i , j l s i g m o i d ( x i , j l ) ( 1 − s i g m o i d ( x i , j l ) ) \begin{aligned} \delta^{l}_{i,j} &= \frac{\partial E}{\partial x^{l}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} \frac{\partial y^{l}_{i,j}}{\partial x^{l}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} sigmoid'(x^{l}_{i,j}) \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} sigmoid(x^{l}_{i,j}) (1-sigmoid(x^{l}_{i,j}) ) \end{aligned} δi,jl=xi,jlE=yi,jlExi,jlyi,jl=yi,jlEsigmoid(xi,jl)=yi,jlEsigmoid(xi,jl)(1sigmoid(xi,jl))

∂ E ∂ W l = ∑ i = 0 N ∑ j = 0 N ∂ E ∂ x i , j l ∂ x i , j l ∂ W l = ∑ i = 0 N ∑ j = 0 N δ i , j l ( ∑ a = 0 1 ∑ b = 0 1 y 2 i + a , 2 j + b l − 1 ) \begin{aligned} \frac{\partial E}{\partial W^{l}} &= \sum_{i=0}^{N}\sum_{j=0}^{N} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial W^{l}} \\ &= \sum_{i=0}^{N}\sum_{j=0}^{N} \delta^{l}_{i,j} \left( \sum_{a=0}^{1}\sum_{b=0}^{1} y^{l-1}_{2i+a,2j+b} \right) \end{aligned} WlE=i=0Nj=0Nxi,jlEWlxi,jl=i=0Nj=0Nδi,jl(a=01b=01y2i+a,2j+bl1)

∂ E ∂ b i a s l = ∑ i = 0 N ∑ j = 0 N ∂ E ∂ x i , j l ∂ x i , j l ∂ b i a s l = ∑ i = 0 N ∑ j = 0 N δ i , j l \begin{aligned} \frac{\partial E}{\partial bias^{l}} &= \sum_{i=0}^{N}\sum_{j=0}^{N} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial bias^{l}} \\ &= \sum_{i=0}^{N}\sum_{j=0}^{N} \delta^{l}_{i,j} \end{aligned} biaslE=i=0Nj=0Nxi,jlEbiaslxi,jl=i=0Nj=0Nδi,jl

计算下一层的 ∂ E ∂ y i , j l − 1 \frac{\partial E}{\partial y^{l-1}_{i,j}} yi,jl1E

∂ E ∂ y i , j l − 1 = ∂ E ∂ x ⌊ i / 2 ⌋ , ⌊ j / 2 ⌋ l ∂ x ⌊ i / 2 ⌋ , ⌊ j / 2 ⌋ l ∂ y i , j l − 1 = δ ⌊ i / 2 ⌋ , ⌊ j / 2 ⌋ l W l \begin{aligned} \frac{\partial E}{\partial y^{l-1}_{i,j}} &= \frac{\partial E}{\partial x^{l}_{\lfloor i/2 \rfloor,\lfloor j/2 \rfloor} } \frac{\partial x^{l}_{\lfloor i/2 \rfloor,\lfloor j/2 \rfloor}}{\partial y^{l-1}_{i,j}} \\ &=\delta^{l}_{\lfloor i/2 \rfloor,\lfloor j/2 \rfloor} W^{l} \end{aligned} yi,jl1E=xi/2,j/2lEyi,jl1xi/2,j/2l=δi/2,j/2lWl

S2>>>C3

与S4>>>C5同

C1>>>S2

与C3>>>S4同

INPUT>>>C1

与S4>>>C5同,输入图片即为 y l − 1 y^{l-1} yl1

你可能感兴趣的:(学习-总结,模式识别与机器学习,计算机视觉)