BP神经网络在百度百科中的解释就是:BP(back propagation)神经网络是1986年由Rumelhart和McClelland为首的科学家提出的概念,是一种按照误差逆向传播算法训练的多层前馈神经网络,是目前应用最广泛的神经网络。大家应该对基本的神经网络模型有一定程度的了解,神经网络模型包含了很多神经元模型,每一个神经元都有着自己的权重,然后一般来说,典型的神经网路包含输入层,隐藏层和输出层,然后输入层和隐藏层一般来说都会有自己的偏置项。如图所示,我们可以看一个典型的神经网络模型:
为了方便计算,我们将输入项设成i1和i2,还有一个偏置项b1,同理,在隐藏层为h1,h2和一个偏置项b2,输出层为o1和o2。wi表示层与层之间连接的权重,每一层的激活函数我们使用常用的Sigmoid函数,然后我们对上图的神经网络附上数据的初始值,可以得到下面这张图:
在这张带权值的神经网络图中,我们可以正向计算出输入层到隐藏层的输出,也就是h1,就是权重和输入值加上偏置项的和:
ne t h 1 = w 1 ∗ i 1 + w 2 ∗ i 2 + b 1 {\text{ne}}{{\text{t}}_{h1}} = {w_1}*{i_1} + {w_2}*{i_2} + {b_1} neth1=w1∗i1+w2∗i2+b1 ne t h 1 = 0.15 ∗ 0.05 + 0.2 ∗ 0.1 + 0.35 = 0.3775 {\text{ne}}{{\text{t}}_{h1}} = 0.15*0.05 + 0.2*0.1 + 0.35 = 0.3775 neth1=0.15∗0.05+0.2∗0.1+0.35=0.3775
这个时候我们的激活函数是Sigmoid函数:
f ( x ) = 1 1 + e − x f(x) = \frac{1}{{1 + {e^{ - x}}}} f(x)=1+e−x1所以我们得到的h1就是f(0.3775):
f ( 0.3775 ) = 1 1 + e − 0.3775 ≈ 0.59327 f(0.3775) = \frac{1}{{1 + {e^{ - 0.3775}}}} \approx 0.59327 f(0.3775)=1+e−0.37751≈0.59327
同样的道理,我们可以得到h2的输出:
f ( w 3 ∗ i 1 + w 4 ∗ i 2 + b 1 ) = f ( 0.3925 ) ≈ 0.59688 f({w_3}*{i_1} + {w_4}*{i_2} + {b_1}) = f(0.3925) \approx0.59688 f(w3∗i1+w4∗i2+b1)=f(0.3925)≈0.59688在计算完隐藏层的输出之后,接下来就是计算隐藏层到输出层的计算,同样的道理,上一轮计算的输出就作为这一轮计算的输入,就可以得到:
ne t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 {\text{ne}}{{\text{t}}_{o1}} = {w_5}*ou{t_{h1}} + {w_6}*ou{t_{h2}} + {b_2} neto1=w5∗outh1+w6∗outh2+b2我们将输入层到隐藏层的输出作为隐藏层到输出层的输入,然后可以计算得到:
ne t o 1 = 0.4 ∗ 0.589327 + 0.45 ∗ 0.59688 + 0.6 ≈ 1.1059 {\text{ne}}{{\text{t}}_{o1}} = 0.4*0.589327 + 0.45*0.59688 + 0.6 \approx 1.1059 neto1=0.4∗0.589327+0.45∗0.59688+0.6≈1.1059那么加上激活函数的处理我们可以得到: o u t o 1 = f ( n e t o 1 ) = f ( 1.1059 ) = 1 1 + e − 1.1059 ≈ 0.751365 ou{t_{o1}} = f(ne{t_{o1}}) = f(1.1059) = \frac{1}{{1 + {e^{ - 1.1059}}}} \approx 0.751365 outo1=f(neto1)=f(1.1059)=1+e−1.10591≈0.751365
同理我们可以计算得到o2的输出:
o u t o 2 = f ( n e t o 2 ) = 0.772928 ou{t_{o2}} = f(ne{t_{o2}}) = 0.772928 outo2=f(neto2)=0.772928这样我们就得到了神经网络的正向传播就结束了,两个输出0.751365和0.772928。但是这两个输出和我们实际的神经网络输出0.01和0.99相差还是比较大的,这个时候为了缩小输出值和真实值之间的误差,我们就要对误差进行神经网络的反向传播,通过反向传播来重新更新权值,然后重新计算输出。
如上述所示,我们现在开始来计算反向传播,首先我们要计算这个神经网络的总误差,我们计算神经网络误差的公式如下:
E t o t a l = 1 2 ∑ ( t a r g e t − o u t p u t ) 2 {E_{total}} = \frac{1}{2}\sum {{{(target - output)}^2}} Etotal=21∑(target−output)2通过这个公式我们就可以计算整个神经网络的误差:
E o 1 = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 = 1 2 ( 0.01 − 0.751365 ) 2 ≈ 0.274811 {E_{o1}}{\text{ = }}\frac{1}{2}{(targe{t_{o1}} - ou{t_{o1}})^2} = \frac{1}{2}{(0.01 - 0.751365)^2} \approx 0.274811 Eo1 = 21(targeto1−outo1)2=21(0.01−0.751365)2≈0.274811同理我们可以计算得到o2的误差: E o 2 = 1 2 ( t a r g e t o 2 − o u t o 2 ) 2 = 1 2 ( 0.99 − 0.772928 ) 2 ≈ 0.02356 {E_{o2}}{\text{ = }}\frac{1}{2}{(targe{t_{o2}} - ou{t_{o2}})^2} = \frac{1}{2}{(0.99 - 0.772928)^2} \approx0.02356 Eo2 = 21(targeto2−outo2)2=21(0.99−0.772928)2≈0.02356所以总的误差就是:
E t o t a l = E o 1 + E o 2 = 0.274811 + 0.02356 = 0.298371 {E_{total}} = {E_{o1}} + {E_{o2}} = 0.274811 + 0.02356 = 0.298371 Etotal=Eo1+Eo2=0.274811+0.02356=0.298371
在计算了总共的误差之后,我们可以计算反向传播的过程了,首先计算的是从输出层到隐藏层的传播,首先以权重w5为例,我们看看这个过程是如何反向传播的:
如上图所示,我们以权重w5为例来分析一下这个神经网络的误差是如何进行反向传播的,首先误差从Error传播到out o1,再从out o1传播到net o1,最后从net o1传播到权重w5。所以我们如果想知道w5对误差有多大的影响,我们就可以用整体的误差对w5求偏导,由上述的过程可以得出这个求偏导的过程是一个链式的,从整体到局部的求偏导,如下面这个公式所示: ∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ w 5 \frac{{\partial {E_{total}}}}{{\partial {w_5}}} = \frac{{\partial {E_{total}}}}{{\partial ou{t_{o1}}}} \cdot \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} \cdot \frac{{\partial ne{t_{o1}}}}{{\partial {w_5}}} ∂w5∂Etotal=∂outo1∂Etotal⋅∂neto1∂outo1⋅∂w5∂neto1这个就是由整体到局部的链式求导,我们就把整体对w5的求偏导拆分成了三个项的偏导,我们就可以对这三项分别来求偏导,下面我们来看看计算的过程:
∂ E t o t a l ∂ o u t o 1 = ∂ [ 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t o 2 ) 2 ] ∂ o u t o 1 = ( − 1 ) ( t a r g e t o 1 − o u t o 1 ) = o u t o 1 − t a r g e t o 1 = 0.751365 − 0.01 = 0.741365 \frac{{\partial {E_{total}}}}{{\partial ou{t_{o1}}}} = \frac{{\partial [\frac{1}{2}{{(targe{t_{o1}} - ou{t_{o1}})}^2} + \frac{1}{2}{{(targe{t_{o2}} - ou{t_{o2}})}^2}]}}{{\partial ou{t_{o1}}}}\\= (-1)(targe{t_{o1}} - ou{t_{o1}}) = ou{t_{o1}} - targe{t_{o1}}=0.751365 - 0.01 = 0.741365 ∂outo1∂Etotal=∂outo1∂[21(targeto1−outo1)2+21(targeto2−outo2)2]=(−1)(targeto1−outo1)=outo1−targeto1=0.751365−0.01=0.741365
接下来就是求out o1对于net o1的偏导数,这个就相当于是求激活函数Sigmoid函数的导数:
∂ o u t o 1 ∂ n e t o 1 = ∂ [ 1 1 + e − n e t o 1 ] / ∂ n e t o 1 = 1 1 + e − n e t o 1 ( 1 − 1 1 + e − n e t o 1 ) = o u t o 1 ( 1 − o u t o 1 ) = 0.751365 ( 1 − 0.751365 ) = 0.186816 \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} = \partial [\frac{1}{{1 + {e^{ - ne{t_{o1}}}}}}]/\partial ne{t_{o1}} = \frac{1}{{1 + {e^{ - ne{t_{o1}}}}}}(1 - \frac{1}{{1 + {e^{ - ne{t_{o1}}}}}}) \\= ou{t_{o1}}(1 - ou{t_{o1}}) = 0.751365(1 - 0.751365) = 0.186816 ∂neto1∂outo1=∂[1+e−neto11]/∂neto1=1+e−neto11(1−1+e−neto11)=outo1(1−outo1)=0.751365(1−0.751365)=0.186816
前两个部分计算完成,我们就来计算最后一个部分偏导数,这里就是net o1对w5求偏导数,我们由上面的神经网络图可以得到:
n e t o 1 = o u t h 1 ∗ w 5 + o u t h 2 ∗ w 6 + b 2 ne{t_{o1}} = ou{t_{h1}}*{w_5} + ou{t_{h2}}*{w_6} + {b_2} neto1=outh1∗w5+outh2∗w6+b2很明显上述的表达式对于w5的偏导数为:
∂ n e t o 1 ∂ w 5 = ∂ ( o u t h 1 ∗ w 5 + o u t h 2 ∗ w 6 + b 2 ) ∂ w 5 = o u t h 1 = 0.59327 \frac{{\partial ne{t_{o1}}}}{{\partial {w_5}}} = \frac{{\partial (ou{t_{h1}}*{w_5} + ou{t_{h2}}*{w_6} + {b_2})}}{{\partial {w_5}}} = ou{t_{h1}} = 0.59327 ∂w5∂neto1=∂w5∂(outh1∗w5+outh2∗w6+b2)=outh1=0.59327
最后将上述的三个求偏导数的结果相乘就可以得到神经网络对于w5的偏导数的值:
∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ w 5 = 0.741365 ∗ 0.186816 ∗ 0.59327 = 0.082167 \frac{{\partial {E_{total}}}}{{\partial {w_5}}} = \frac{{\partial {E_{total}}}}{{\partial ou{t_{o1}}}} \cdot \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} \cdot \frac{{\partial ne{t_{o1}}}}{{\partial {w_5}}} = 0.741365*0.186816*0.59327 = 0.082167 ∂w5∂Etotal=∂outo1∂Etotal⋅∂neto1∂outo1⋅∂w5∂neto1=0.741365∗0.186816∗0.59327=0.082167
所以综上所述,我们可以发现:
∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ w 5 = ( o u t o 1 − t a r g e t o 1 ) ∗ o u t o 1 ( 1 − o u t o 1 ) ∗ o u t o 1 \frac{{\partial {E_{total}}}}{{\partial {w_5}}} = \frac{{\partial {E_{total}}}}{{\partial ou{t_{o1}}}} \cdot \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} \cdot \frac{{\partial ne{t_{o1}}}}{{\partial {w_5}}} = (ou{t_{o1}} - targe{t_{o1}})*ou{t_{o1}}(1 - ou{t_{o1}})*ou{t_{o1}} ∂w5∂Etotal=∂outo1∂Etotal⋅∂neto1∂outo1⋅∂w5∂neto1=(outo1−targeto1)∗outo1(1−outo1)∗outo1通常来说,为了表达的方便,我们用 δ o 1 {\delta _{o1}} δo1表示输出层的误差,那么 δ o 1 {\delta _{o1}} δo1就可以表示成:
δ o 1 = ∂ E t o t a l ∂ n e t o 1 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 = ( o u t o 1 − t a r g e t o 1 ) ∗ o u t o 1 ( 1 − o u t o 1 ) {\delta _{o1}} = \frac{{\partial {E_{total}}}}{{\partial ne{t_{o1}}}} = \frac{{\partial {E_{total}}}}{{\partial ou{t_{o1}}}} \cdot \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} = (ou{t_{o1}} - targe{t_{o1}})*ou{t_{o1}}(1 - ou{t_{o1}}) δo1=∂neto1∂Etotal=∂outo1∂Etotal⋅∂neto1∂outo1=(outo1−targeto1)∗outo1(1−outo1)所以说整体误差对于w5的偏导公式可以表示成:
∂ E t o t a l ∂ w 5 = δ o 1 o u t h 1 \frac{{\partial {E_{total}}}}{{\partial {w_5}}} = {\delta _{o1}}ou{t_{h1}} ∂w5∂Etotal=δo1outh1
经过上述的推导,我们可以更新w5的权重:
w 5 n e w = w 5 − η ∂ E t o t a l ∂ w 5 = 0.4 − 0.5 ∗ 0.082167 = 0.358916 {w_{5new}} = {w_5} - \eta \frac{{\partial {E_{total}}}}{{\partial {w_5}}} = 0.4 - 0.5*0.082167 = 0.358916 w5new=w5−η∂w5∂Etotal=0.4−0.5∗0.082167=0.358916这里的 η \eta η表示学习率,一般初始化的时候我们将 η \eta η设为0.5,同样的道理,我们就可以计算w6、w7和w8。
讲到这里,我们又会发现一个新的问题,那么w1-w4的权重如何更新呢?其实这里更新的方法基本相同,只不过是将o1改成h1,所以在求偏导数的时候要做一些变化,如下图所示:
在上图中我们的神经网络中,更新w1的权值和更新w5的权值不同的地方在与更新w1权值的时候,我们不在是对输出层o1进行求偏导,而是对隐藏层h1来进行求偏导,所以由上图可以得到整体误差,在整体误差反向传播的时候,h1会接收o1和o2两个地方反向传播的误差,所以整体误差在对net h1求偏导的时候,要考虑两个数据来源,按照上述的叙述,我们可以得到如下的等式:
∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 ∂ w 1 = ( ∂ E o 1 ∂ o u t h 1 + ∂ E o 2 ∂ o u t h 1 ) ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 ∂ w 1 \frac{{\partial {E_{total}}}}{{\partial {w_1}}} = \frac{{\partial {E_{total}}}}{{\partial ou{t_{h1}}}} \cdot \frac{{\partial ou{t_{h1}}}}{{\partial ne{t_{h1}}}} \cdot \frac{{\partial ne{t_{h1}}}}{{\partial {w_1}}} = (\frac{{\partial {E_{o1}}}}{{\partial ou{t_{h1}}}} + \frac{{\partial {E_{o2}}}}{{\partial ou{t_{h1}}}}) \cdot \frac{{\partial ou{t_{h1}}}}{{\partial ne{t_{h1}}}} \cdot \frac{{\partial ne{t_{h1}}}}{{\partial {w_1}}} ∂w1∂Etotal=∂outh1∂Etotal⋅∂neth1∂outh1⋅∂w1∂neth1=(∂outh1∂Eo1+∂outh1∂Eo2)⋅∂neth1∂outh1⋅∂w1∂neth1
在得到上述计算公式之后,我们可以将其分开来计算:
∂ E o 1 ∂ o u t h 1 = ∂ E o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ o u t h 1 = ∂ E o 1 ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ o u t h 1 \frac{{\partial {E_{o1}}}}{{\partial ou{t_{h1}}}} = \frac{{\partial {E_{o1}}}}{{\partial ne{t_{o1}}}} \cdot \frac{{\partial ne{t_{o1}}}}{{\partial ou{t_{h1}}}} = \frac{{\partial {E_{o1}}}}{{\partial ou{t_{o1}}}} \cdot \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} \cdot \frac{{\partial ne{t_{o1}}}}{{\partial ou{t_{h1}}}} ∂outh1∂Eo1=∂neto1∂Eo1⋅∂outh1∂neto1=∂outo1∂Eo1⋅∂neto1∂outo1⋅∂outh1∂neto1
我们将 ∂ E o 1 ∂ o u t h 1 \frac{{\partial {E_{o1}}}}{{\partial ou{t_{h1}}}} ∂outh1∂Eo1分成了三项偏导项相乘,接下来我们分别来计算着三项偏导项:
∂ E o 1 ∂ n e t o 1 = ∂ E o 1 ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 = 0.741265 ∗ 0.186816 ≈ 0.138499 \frac{{\partial {E_{o1}}}}{{\partial ne{t_{o1}}}} = \frac{{\partial {E_{o1}}}}{{\partial ou{t_{o1}}}} \cdot \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} = 0.741265*0.186816 \approx 0.138499 ∂neto1∂Eo1=∂outo1∂Eo1⋅∂neto1∂outo1=0.741265∗0.186816≈0.138499
然后因为 n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 ne{t_{o1}} = {w_5}*ou{t_{h1}} + {w_6}*ou{t_{h2}} + b2 neto1=w5∗outh1+w6∗outh2+b2,所以可以得到 ∂ n e t o 1 ∂ o u t h 1 = w 5 = 0.4 \frac{{\partial ne{t_{o1}}}}{{\partial ou{t_{h1}}}} = {w_5} = 0.4 ∂outh1∂neto1=w5=0.4,所以整体可以得到:
∂ E o 1 ∂ o u t h 1 = ∂ E o 1 ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ o u t h 1 = 0.1384986 ∗ 0.4 ≈ 0.055399 \frac{{\partial {E_{o1}}}}{{\partial ou{t_{h1}}}} = \frac{{\partial {E_{o1}}}}{{\partial ou{t_{o1}}}} \cdot \frac{{\partial ou{t_{o1}}}}{{\partial ne{t_{o1}}}} \cdot \frac{{\partial ne{t_{o1}}}}{{\partial ou{t_{h1}}}} = 0.1384986*0.4 \approx 0.055399 ∂outh1∂Eo1=∂outo1∂Eo1⋅∂neto1∂outo1⋅∂outh1∂neto1=0.1384986∗0.4≈0.055399
同样的道理我们可以计算出 ∂ E o 2 ∂ o u t h 1 = − 0.019049 \frac{{\partial {E_{o2}}}}{{\partial ou{t_{h1}}}} = - 0.019049 ∂outh1∂Eo2=−0.019049,然后可以得到:
∂ E t o t a l ∂ o u t h 1 = ∂ E o 1 ∂ o u t h 1 + ∂ E o 2 ∂ o u t h 1 = 0.055399 − 0.019049 = 0.03635 \frac{{\partial {E_{total}}}}{{\partial ou{t_{h1}}}} = \frac{{\partial {E_{o1}}}}{{\partial ou{t_{h1}}}} + \frac{{\partial {E_{o2}}}}{{\partial ou{t_{h1}}}} = 0.055399 - 0.019049 = 0.03635 ∂outh1∂Etotal=∂outh1∂Eo1+∂outh1∂Eo2=0.055399−0.019049=0.03635因为 o u t h 1 = 1 1 + e − n e t h 1 ou{t_{h1}} = \frac{1}{{1 + {e^{ - ne{t_{h1}}}}}} outh1=1+e−neth11,所以可以得到等式:
∂ o u t h 1 ∂ n e t h 1 = o u t h 1 ( 1 − o u t h 1 ) = 0.593269 ( 1 − 0.593269 ) = 0.241301 \frac{{\partial ou{t_{h1}}}}{{\partial ne{t_{h1}}}} = ou{t_{h1}}(1 - ou{t_{h1}}) = 0.593269(1 - 0.593269) = 0.241301 ∂neth1∂outh1=outh1(1−outh1)=0.593269(1−0.593269)=0.241301
因为 n e t h 1 = w 1 ∗ i 1 + w 2 ∗ i 2 + b 1 ne{t_{h1}} = {w_1}*{i_1} + {w_2}*{i_2} + {b_1} neth1=w1∗i1+w2∗i2+b1,所以可以得到 ∂ n e t h 1 ∂ w 1 = i 1 = 0.05 \frac{{\partial ne{t_{h1}}}}{{\partial {w_1}}} = {i_1} = 0.05 ∂w1∂neth1=i1=0.05,得到这些结果之后,我们将这三个偏导项进行相乘,可以得到最终的结果:
∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 ∂ w 1 = 0.03635 ∗ 0.241301 ∗ 0.05 = 0.000438 \frac{{\partial {E_{total}}}}{{\partial {w_1}}} = \frac{{\partial {E_{total}}}}{{\partial ou{t_{h1}}}} \cdot \frac{{\partial ou{t_{h1}}}}{{\partial ne{t_{h1}}}} \cdot \frac{{\partial ne{t_{h1}}}}{{\partial {w_1}}} = 0.03635*0.241301*0.05 = 0.000438 ∂w1∂Etotal=∂outh1∂Etotal⋅∂neth1∂outh1⋅∂w1∂neth1=0.03635∗0.241301∗0.05=0.000438
最后我们可以更新w1的权值了:
w 1 n e w = w 1 − η ∂ E t o t a l ∂ w 1 = 0.15 − 0.5 ∗ 0.000438 = 0.14978 {w_{1new}} = {w_1} - \eta \frac{{\partial {E_{total}}}}{{\partial {w_1}}} = 0.15 - 0.5*0.000438 = 0.14978 w1new=w1−η∂w1∂Etotal=0.15−0.5∗0.000438=0.14978同样的道理,我们可以通过同样的方法对w2、w3和w4进行权值的更新。我们可以按照此算法不断的对神经网络的每个权值进行迭代更新,直到收敛,最终我们神经网络的输出值可以非常接近我们的实际输出值了,这就是BP神经网络算法的整个过程,希望对大家对神经网络算法的理解有所帮助,本人能力有限,文中如有纰漏,也请大家不吝指教,如有转载,也请标明出处,谢谢。