by ztx
本篇博客是学习过程中的笔记,谨慎阅读,欢迎指正。
时间仓促,LeNet-5的具体结构可能会有偏差,具体以论文为准,留坑待填。
另外,csdn的latex公式挂掉了,不知道怎么回事,以代码形式给出式子。
KaTex 多行公式只支持aligned,不支持align = =
感知器,数学技巧,脑补能力
MLP-NN(Multilayer perceptron neural network 多层感知器神经网络)
过程如上图中左边箭头指示的过程。
y k = f ( ∑ j = 0 H f ( ∑ i = 0 D x i W j i ) W k j ) y_k =f\left( \sum_{j=0}^{H} f\left( \sum_{i=0}^{D}x_iW_{ji} \right) W_{kj} \right) yk=f(j=0∑Hf(i=0∑DxiWji)Wkj)
Loss function:L2 distance
E ( W ) = 1 2 ∑ k = 0 c − 1 ( t k − y k ) 2 = 1 2 ∣ ∣ t − y ∣ ∣ 2 E(\mathbf{W}) =\frac{1}{2}\sum_{k=0}^{c-1}(t_k − y_k)^2 =\frac{1}{2}||\mathbf{t} −\mathbf{y}||^2 E(W)=21k=0∑c−1(tk−yk)2=21∣∣t−y∣∣2
Activation function:sigmoid
f ( x ) = 1 1 + e − x f(x) = \frac{1}{1+e^{-x}} f(x)=1+e−x1
Sigmoid Derivative function:
f ′ ( x ) = f ( x ) ( 1 − f ( x ) ) f'(x)=f(x)(1-f(x)) f′(x)=f(x)(1−f(x))
隐藏层到输出层的BP如下
∂ E ∂ W k j = ∂ E ∂ y k ∂ y k ∂ n e t k ∂ n e t k ∂ W k j = − ( t k − y k ) f ′ ( n e t k ) y j = δ k y j \begin{aligned} \frac{\partial E}{\partial W_{kj}} &= \frac{\partial E}{\partial y_k}\frac{\partial y_k}{\partial net_{k}} \frac{\partial net_{k}}{\partial W_{kj}} \\ &= -(t_k-y_k)f'(net_k)y_j \\ &= \delta_k y_j \end{aligned} ∂Wkj∂E=∂yk∂E∂netk∂yk∂Wkj∂netk=−(tk−yk)f′(netk)yj=δkyj
其中:
δ k = − ( t k − y k ) f ′ ( n e t k ) = ∂ E ∂ n e t k \delta_k = -(t_k-y_k)f'(net_k) = \frac{\partial E}{\partial net_{k}} δk=−(tk−yk)f′(netk)=∂netk∂E
设学习率为 η \eta η则梯度下降权值更新为:
Δ W k j = η δ k y j \Delta W_{kj} = \eta \delta_k y_j ΔWkj=ηδkyj
输入层到隐藏层的BP如下
∂ E ∂ W j i = ∂ E ∂ y j ∂ y j ∂ n e t j ∂ n e t j ∂ W j i = ∂ ∂ y j ( 1 2 ∑ k = 0 c − 1 ( t k − y k ) 2 ) f ′ ( n e t j ) x i = ( − ∑ k = 0 c − 1 ( t k − y k ) ∂ y k ∂ y j ) f ′ ( n e t j ) x i = ( − ∑ k = 0 c − 1 ( t k − y k ) ∂ y k ∂ n e t k ∂ n e t k ∂ y j ) f ′ ( n e t j ) x i = ( − ∑ k = 0 c − 1 ( t k − y k ) f ′ ( n e t k ) W k j ) f ′ ( n e t j ) x i = ( ∑ k = 0 c − 1 δ k W k j ) f ′ ( n e t j ) x i = δ j x i \begin{aligned} \frac{\partial E}{\partial W_{ji}} &= \frac{\partial E}{\partial y_j} \frac{\partial y_j}{\partial net_{j}} \frac{\partial net_{j}}{\partial W_{ji}} \\ &= \frac{\partial }{\partial y_j} \left( \frac{1}{2}\sum_{k=0}^{c-1}(t_k-y_k)^2 \right) f'(net_j) x_i \\ &= \left( -\sum_{k=0}^{c-1}(t_k-y_k)\frac{\partial y_k}{\partial y_j} \right) f'(net_j) x_i \\ &= \left( -\sum_{k=0}^{c-1}(t_k-y_k)\frac{\partial y_k}{\partial net_k}\frac{\partial net_k}{\partial y_j} \right) f'(net_j) x_i \\ &= \left( -\sum_{k=0}^{c-1}(t_k-y_k)f'(net_k)W_{kj} \right) f'(net_j) x_i \\ &= \left( \sum_{k=0}^{c-1}\delta_k W_{kj} \right) f'(net_j) x_i \\ &= \delta_j x_i \end{aligned} ∂Wji∂E=∂yj∂E∂netj∂yj∂Wji∂netj=∂yj∂(21k=0∑c−1(tk−yk)2)f′(netj)xi=(−k=0∑c−1(tk−yk)∂yj∂yk)f′(netj)xi=(−k=0∑c−1(tk−yk)∂netk∂yk∂yj∂netk)f′(netj)xi=(−k=0∑c−1(tk−yk)f′(netk)Wkj)f′(netj)xi=(k=0∑c−1δkWkj)f′(netj)xi=δjxi
其中:
δ j = ( ∑ k = 0 c − 1 δ k W k j ) f ′ ( n e t j ) = ∂ E ∂ n e t j \delta_j =\left( \sum_{k=0}^{c-1}\delta_k W_{kj} \right) f'(net_j) = \frac{\partial E}{\partial net_{j}} δj=(k=0∑c−1δkWkj)f′(netj)=∂netj∂E
设学习率为 η \eta η则梯度下降权值更新为:
Δ W j i = η δ j x i \Delta W_{ji} = \eta \delta_j x_i ΔWji=ηδjxi
C1:
5*5的卷积核6个
S2:
类似于平均池化,sigmoid激活
C3:
5*5的卷积核16个
S4:
类似于平均池化,sigmoid激活
C5:
5*5的卷积核120个
F6:
120*84的全连接,加84的偏置,tanh激活
OUTPUT:
84*10的径向基函数
首先定义 l l l层
n e t net net值为 x l x^l xl
输出为 y l = f ( x l ) y^l=f(x^l) yl=f(xl)
δ l = ∂ E ∂ x l \delta^{l} = \frac{\partial E}{\partial x^l} δl=∂xl∂E
激活函数为 f l ( ⋅ ) f^l(\cdot) fl(⋅)
输入为上一层的输出 x l − 1 x^{l-1} xl−1
当前层的参数为 W l W^l Wl, b i a s l bias^l biasl等
正向传播过程( l − 1 l-1 l−1为F6, l l l为OUTPUT, l + 1 l+1 l+1为loss)
LeNet5的OUTPUT使用的径向基(RBF)函数
x j l = ∑ i = 0 83 ( y i l − 1 − W i , j l ) 2 y j l = f l ( x j l ) = x j l x^{l}_j = \sum_{i=0}^{83}(y^{l-1}_i-W^{l}_{i,j})^2 \\ y^l_j = f^l(x^l_j) = x^l_j xjl=i=0∑83(yil−1−Wi,jl)2yjl=fl(xjl)=xjl
反向传播
论文中这里的 W W W参数是似乎是固定的?暂时不懂。如果训练的话如下
δ i l = ∂ E ∂ x i l = ∂ E ∂ y i l ∂ y i l ∂ x i l = ∂ E ∂ y i l f l ′ ( x i l ) = ∂ E ∂ y i l \begin{aligned} \delta^{l}_i &= \frac{\partial E}{\partial x^{l}_i}\\ &= \frac{\partial E}{\partial y^{l}_i} \frac{\partial y^l_i}{\partial x^{l}_i} \\ &= \frac{\partial E}{\partial y^{l}_i} {f^{l}}' (x^l_i) \\ &= \frac{\partial E}{\partial y^{l}_i} \end{aligned} δil=∂xil∂E=∂yil∂E∂xil∂yil=∂yil∂Efl′(xil)=∂yil∂E
∂ E ∂ y i l \frac{\partial E}{\partial y^{l}_i} ∂yil∂E需要在loss层给出
∂ E ∂ W i , j l = ∂ E ∂ x j l ∂ x j l ∂ W i , j l = δ j l ⋅ 2 ( W i , j l − y i l − 1 ) \begin{aligned} \frac{\partial E}{\partial W^{l}_{i,j}} &= \frac{\partial E}{\partial x^{l}_j} \frac{\partial x^{l}_j}{\partial W^{l}_{i,j}} \\ &= \delta^{l}_j \cdot 2 (W^{l}_{i,j} - y^{l-1}_i) \end{aligned} ∂Wi,jl∂E=∂xjl∂E∂Wi,jl∂xjl=δjl⋅2(Wi,jl−yil−1)
∂ E ∂ y i l − 1 = ∑ j = 0 83 ∂ E ∂ x j l ∂ x j l ∂ y i l − 1 = ∑ j = 0 83 ∂ E ∂ x j l ⋅ 2 ( y i l − 1 − W i , j l ) = ∑ j = 0 83 δ j l ⋅ 2 ( y i l − 1 − W i , j l ) \begin{aligned} \frac{\partial E}{\partial y^{l-1}_i} &= \sum_{j=0}^{83} \frac{\partial E}{\partial x^{l}_j} \frac{\partial x^{l}_j}{\partial y^{l-1}_i}\\ &= \sum_{j=0}^{83} \frac{\partial E}{\partial x^{l}_j} \cdot 2(y^{l-1}_i-W^{l}_{i,j}) \\ &= \sum_{j=0}^{83}\delta^{l}_j \cdot 2(y^{l-1}_i-W^{l}_{i,j}) \\ \end{aligned} ∂yil−1∂E=j=0∑83∂xjl∂E∂yil−1∂xjl=j=0∑83∂xjl∂E⋅2(yil−1−Wi,jl)=j=0∑83δjl⋅2(yil−1−Wi,jl)
120*84的全连接,加84的偏置,与MLP-NN过程相同,激活函数使用了 y = f ( x ) = A tanh ( S x ) y = f(x) = A \ \tanh (S x) y=f(x)=A tanh(Sx), A A A取 1.7159 1.7159 1.7159
C5是一个卷积层。
现在不管形状,讨论卷积的反向传播。
设原图像大小为 ( N , N ) (N,N) (N,N)其中每个像素点都可能是一个值或者向量或者多维的数据。
使用大小为 ( m , m ) (m,m) (m,m)的卷积核 K K K个,每个卷积核参数为 W a , b l k , 0 ≤ k < K , 0 ≤ a , b < m W^{lk}_{a,b},0\le k < K, 0\le a,b <m Wa,blk,0≤k<K,0≤a,b<m,每一个卷积核都有一个偏置 b i a s l k , 0 ≤ k < K bias^{lk},0\le k <K biaslk,0≤k<K。 W W W中的每一个数值的维度都与原图 ( N , N ) (N,N) (N,N)中像素的维度相同, K K K个卷积核之后会形成新的图像,像素点变为 K K K维。
正向传播:
x i , j l k = ∑ a = 0 m − 1 ∑ b = 0 m − 1 W a , b l k y i + a , j + b l − 1 + b i a s l k , 0 ≤ k < K y i , j l k = f l ( x i , j l k ) = x i , j l k x^{lk}_{i,j} = \sum_{a=0}^{m-1}\sum_{b=0}^{m-1}W^{lk}_{a,b}y^{l-1}_{i+a,j+b}+bias^{lk},0\le k<K\\ y^{lk}_{i,j}=f^l(x^{lk}_{i,j}) = x^{lk}_{i,j} xi,jlk=a=0∑m−1b=0∑m−1Wa,blkyi+a,j+bl−1+biaslk,0≤k<Kyi,jlk=fl(xi,jlk)=xi,jlk
LeNet5只有F6层和池化层有激活函数,卷积层没有激活函数。
由于不同卷积核进行的过程是相同的,之后省略 x l k , W l k x^{lk},W^{lk} xlk,Wlk等中的 k k k。
现在已知 ∂ E ∂ y i l \frac{\partial E}{\partial y^{l}_i} ∂yil∂E由上一层给出,求 ∂ E ∂ W a , b l \frac{\partial E}{\partial W^{l}_{a,b}} ∂Wa,bl∂E
首先,求出当前层的 δ l \delta^l δl
δ i , j l = ∂ E ∂ x i , j l = ∂ E ∂ y i , j l ∂ y i , j l ∂ x i , j l = ∂ E ∂ y i , j l f l ′ ( x i , j l ) = ∂ E ∂ y i , j l \begin{aligned} \delta^{l}_{i,j} &= \frac{\partial E}{\partial x^{l}_{i,j}}\\ &= \frac{\partial E}{\partial y^{l}_{i,j}} \frac{\partial y^l_{i,j}}{\partial x^{l}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} {f^{l}}' (x^l_{i,j}) \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} \end{aligned} δi,jl=∂xi,jl∂E=∂yi,jl∂E∂xi,jl∂yi,jl=∂yi,jl∂Efl′(xi,jl)=∂yi,jl∂E
参数权重变化
∂ E ∂ W a , b l = ∑ i = 0 N − m ∑ j = 0 N − m ∂ E ∂ x i , j l ∂ x i , j l ∂ W a , b l = ∑ i = 0 N − m ∑ j = 0 N − m δ i , j l y i + a , j + b l − 1 \begin{aligned} \frac{\partial E}{\partial W^{l}_{a,b}} &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial W^{l}_{a,b}} \\ &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \delta^{l}_{i,j} y^{l-1}_{i+a,j+b} \\ \end{aligned} ∂Wa,bl∂E=i=0∑N−mj=0∑N−m∂xi,jl∂E∂Wa,bl∂xi,jl=i=0∑N−mj=0∑N−mδi,jlyi+a,j+bl−1
∂ E ∂ b i a s l = ∑ i = 0 N − m ∑ j = 0 N − m ∂ E ∂ x i , j l ∂ x i , j l ∂ b i a s l = ∑ i = 0 N − m ∑ j = 0 N − m δ i , j l \begin{aligned} \frac{\partial E}{\partial bias^{l}} &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial bias^{l}} \\ &= \sum_{i=0}^{N-m} \sum_{j=0}^{N-m} \delta^{l}_{i,j} \end{aligned} ∂biasl∂E=i=0∑N−mj=0∑N−m∂xi,jl∂E∂biasl∂xi,jl=i=0∑N−mj=0∑N−mδi,jl
计算下一层的 ∂ E ∂ y i , j l − 1 \frac{\partial E}{\partial y^{l-1}_{i,j}} ∂yi,jl−1∂E
∂ E ∂ y i , j l − 1 = ∑ a = 0 m − 1 ∑ b = 0 m − 1 ∂ E ∂ x i − a , j − b l ∂ x i − a , j − b l ∂ y i , j l − 1 = ∑ a = 0 m − 1 ∑ b = 0 m − 1 δ i − a , j − b l W a , b l \begin{aligned} \frac{\partial E}{\partial y^{l-1}_{i,j}} &= \sum_{a=0}^{m-1}\sum_{b=0}^{m-1}\frac{\partial E}{\partial x^{l}_{i-a,j-b}} \frac{\partial x^{l}_{i-a,j-b}}{\partial y^{l-1}_{i,j}}\\ &= \sum_{a=0}^{m-1}\sum_{b=0}^{m-1} \delta^{l}_{i-a,j-b} W^{l}_{a,b} \end{aligned} ∂yi,jl−1∂E=a=0∑m−1b=0∑m−1∂xi−a,j−bl∂E∂yi,jl−1∂xi−a,j−bl=a=0∑m−1b=0∑m−1δi−a,j−blWa,bl
这样这一层的推导已经做完了。
再讨论卷积层之间 δ \delta δ的关系。
若已知 l l l层卷积层的 δ l \delta^l δl,如果 l − 1 l-1 l−1层也是卷积层求 l − 1 l-1 l−1层的 δ l − 1 \delta^{l-1} δl−1
δ i , j l − 1 = ∂ E ∂ x i , j l − 1 = ∂ E ∂ y i , j l − 1 ∂ y i , j l − 1 ∂ x i , j l − 1 = ( ∑ a = 0 m − 1 ∑ b = 0 m − 1 δ i − a , j − b l W a , b l ) f l − 1 ′ ( x i , j l − 1 ) = f l − 1 ′ ( x i , j l − 1 ) ⋅ ∑ a = 0 m − 1 ∑ b = 0 m − 1 δ i − a , j − b l W a , b l \begin{aligned} \delta^{l-1}_{i,j} &= \frac{\partial E}{\partial x^{l-1}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l-1}_{i,j}} \frac{\partial y^{l-1}_{i,j} }{\partial x^{l-1}_{i,j}} \\ &= \left( \sum_{a=0}^{m-1}\sum_{b=0}^{m-1} \delta^{l}_{i-a,j-b} W^{l}_{a,b} \right) {f^{l-1}}'(x^{l-1}_{i,j}) \\ &= {f^{l-1}}'(x^{l-1}_{i,j}) \cdot \sum_{a=0}^{m-1}\sum_{b=0}^{m-1} \delta^{l}_{i-a,j-b} W^{l}_{a,b}\\ \end{aligned} δi,jl−1=∂xi,jl−1∂E=∂yi,jl−1∂E∂xi,jl−1∂yi,jl−1=(a=0∑m−1b=0∑m−1δi−a,j−blWa,bl)fl−1′(xi,jl−1)=fl−1′(xi,jl−1)⋅a=0∑m−1b=0∑m−1δi−a,j−blWa,bl
这里 δ l − 1 \delta^{l-1} δl−1相当于使用旋转180度的 W l W^{l} Wl在 δ l \delta^l δl上进行卷积,定义旋转180度后的 W W W为 r o t 180 ( W ) rot180(W) rot180(W),则:
δ l − 1 = f l ′ ( x l − 1 ) ⋅ ( δ l ⊗ r o t 180 ( W l ) ) \delta^{l-1} = {f^l}'(x^{l-1}) \cdot (\delta^l \otimes rot180(W^{l})) δl−1=fl′(xl−1)⋅(δl⊗rot180(Wl))
注意到LeNet-5中没有卷积层直接与卷积层相连接的地方,所以上式没有用到。以后应该会用到吧(大概
S4是池化层,在LeNet5中,使用的是 2 × 2 2\times 2 2×2的区域加和乘权重 W l W^{l} Wl在加上偏置 b i a s l bias^{l} biasl
设池化后大小为 N , N N,N N,N
正向传播:
x i , j l = ( ∑ a = 0 1 ∑ b = 0 1 y 2 i + a , 2 j + b l − 1 ) ⋅ W l + b i a s l y i , j l = s i g m o i d ( x i , j l ) x^{l}_{i,j} = \left( \sum_{a=0}^{1}\sum_{b=0}^{1} y^{l-1}_{2i+a,2j+b} \right)\cdot W^{l} + bias^{l}\\ y^{l}_{i,j} = sigmoid(x^{l}_{i,j}) xi,jl=(a=0∑1b=0∑1y2i+a,2j+bl−1)⋅Wl+biaslyi,jl=sigmoid(xi,jl)
反向传播:
δ i , j l = ∂ E ∂ x i , j l = ∂ E ∂ y i , j l ∂ y i , j l ∂ x i , j l = ∂ E ∂ y i , j l s i g m o i d ′ ( x i , j l ) = ∂ E ∂ y i , j l s i g m o i d ( x i , j l ) ( 1 − s i g m o i d ( x i , j l ) ) \begin{aligned} \delta^{l}_{i,j} &= \frac{\partial E}{\partial x^{l}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} \frac{\partial y^{l}_{i,j}}{\partial x^{l}_{i,j}} \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} sigmoid'(x^{l}_{i,j}) \\ &= \frac{\partial E}{\partial y^{l}_{i,j}} sigmoid(x^{l}_{i,j}) (1-sigmoid(x^{l}_{i,j}) ) \end{aligned} δi,jl=∂xi,jl∂E=∂yi,jl∂E∂xi,jl∂yi,jl=∂yi,jl∂Esigmoid′(xi,jl)=∂yi,jl∂Esigmoid(xi,jl)(1−sigmoid(xi,jl))
∂ E ∂ W l = ∑ i = 0 N ∑ j = 0 N ∂ E ∂ x i , j l ∂ x i , j l ∂ W l = ∑ i = 0 N ∑ j = 0 N δ i , j l ( ∑ a = 0 1 ∑ b = 0 1 y 2 i + a , 2 j + b l − 1 ) \begin{aligned} \frac{\partial E}{\partial W^{l}} &= \sum_{i=0}^{N}\sum_{j=0}^{N} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial W^{l}} \\ &= \sum_{i=0}^{N}\sum_{j=0}^{N} \delta^{l}_{i,j} \left( \sum_{a=0}^{1}\sum_{b=0}^{1} y^{l-1}_{2i+a,2j+b} \right) \end{aligned} ∂Wl∂E=i=0∑Nj=0∑N∂xi,jl∂E∂Wl∂xi,jl=i=0∑Nj=0∑Nδi,jl(a=0∑1b=0∑1y2i+a,2j+bl−1)
∂ E ∂ b i a s l = ∑ i = 0 N ∑ j = 0 N ∂ E ∂ x i , j l ∂ x i , j l ∂ b i a s l = ∑ i = 0 N ∑ j = 0 N δ i , j l \begin{aligned} \frac{\partial E}{\partial bias^{l}} &= \sum_{i=0}^{N}\sum_{j=0}^{N} \frac{\partial E}{\partial x^{l}_{i,j}} \frac{\partial x^{l}_{i,j}}{\partial bias^{l}} \\ &= \sum_{i=0}^{N}\sum_{j=0}^{N} \delta^{l}_{i,j} \end{aligned} ∂biasl∂E=i=0∑Nj=0∑N∂xi,jl∂E∂biasl∂xi,jl=i=0∑Nj=0∑Nδi,jl
计算下一层的 ∂ E ∂ y i , j l − 1 \frac{\partial E}{\partial y^{l-1}_{i,j}} ∂yi,jl−1∂E
∂ E ∂ y i , j l − 1 = ∂ E ∂ x ⌊ i / 2 ⌋ , ⌊ j / 2 ⌋ l ∂ x ⌊ i / 2 ⌋ , ⌊ j / 2 ⌋ l ∂ y i , j l − 1 = δ ⌊ i / 2 ⌋ , ⌊ j / 2 ⌋ l W l \begin{aligned} \frac{\partial E}{\partial y^{l-1}_{i,j}} &= \frac{\partial E}{\partial x^{l}_{\lfloor i/2 \rfloor,\lfloor j/2 \rfloor} } \frac{\partial x^{l}_{\lfloor i/2 \rfloor,\lfloor j/2 \rfloor}}{\partial y^{l-1}_{i,j}} \\ &=\delta^{l}_{\lfloor i/2 \rfloor,\lfloor j/2 \rfloor} W^{l} \end{aligned} ∂yi,jl−1∂E=∂x⌊i/2⌋,⌊j/2⌋l∂E∂yi,jl−1∂x⌊i/2⌋,⌊j/2⌋l=δ⌊i/2⌋,⌊j/2⌋lWl
与S4>>>C5同
与C3>>>S4同
与S4>>>C5同,输入图片即为 y l − 1 y^{l-1} yl−1