BP算法详解(笔记)

 

目录

 1、前向传播

2、反向传播

3、权重矩阵更新推导(doing)

4、算法流程


1、前向传播

BP算法详解(笔记)_第1张图片

Z^{(l+1)} = X^{(l)}\Theta^{(l)}  
a^{(l)}_{i}=g(z^{(l)}_{(i)})=\frac{1}{1+z^{-1}}  

 最上面的一行 +1 为偏置量;第 l 层的a就是 l+1 层的X

example: 取中间两个隐藏层,即第二层和第三层

BP算法详解(笔记)_第2张图片

\theta_{ji}^{(l)} l层的第i个神经元对第l+1层的第j个神经元的权重 (下标0时为偏置量对神经元的权重)

z^{(3)}_{1}=1\cdot \theta_{10}^{(2)}+a_{1}^{(2)}\cdot \theta_{11}^{2}+a_{2}^{(2)}\cdot \theta_{12}^{(2)}       

 z^{(3)}_{2}=1\cdot \theta_{20}^{(2)}+a_{1}^{(2)}\cdot \theta_{21}^{2}+a_{2}^{(2)}\cdot \theta_{22}^{(2)}

2、反向传播
 

BP算法详解(笔记)_第3张图片

\delta ^{(l)}_{j} l层中的第j个神经元的激活项的误差
z^{(l)}_{j} l层中的第j个神经元的“输入加权求和项”

\delta ^{(l)}_{j} :第 l 层第 j 个神经元中产生的错误(即实际值与预测值之间的误差)。

 反向传播算法其实就是算 \delta ^{(l)}_{j} ,\delta ^{(l)}_{j}其实是代价函数对z^{(l)}_{j}的偏导。通过影响神经网络各层的权重进而影响输出。

example:

BP算法详解(笔记)_第4张图片

\delta ^{(4)}_{1}=y^{(i)}-a^{(4)}_{1}

\delta ^{(3)}_{2}=\theta _{12}^{(3)}\cdot \delta ^{(4)}_{1}

\delta ^{(2)}_{2}=\theta ^{(2)}_{12}\cdot \delta ^{(3)}_{1}+\theta ^{(2)}_{22}\cdot \delta ^{(3)}_{2}

3、权重矩阵更新推导

基本公式:

z_{i}^{(l)}= \sum _{k}w_{ik}^{(l-1)}\cdot a^{(l-1)}_{k} +b

g(z)=\frac{1}{1+e^{-z}}

a_{i}^{(l)} = g(z^{(l)}_{i})

J(\theta )=\frac{1}{2}\sum ^{S_{nl}}_{j=1}(y_{j} - a^{(nl)}_{j})^{2}

\delta ^{(l)}_{i} = \frac{\partial J(\theta )}{\partial z^{(l)}_{i}}

开始:

\delta _{i}^{(nl)} = \frac{\partial J(\theta )}{\partial z^{nl}_{i}} =\frac{1}{2}\cdot \frac{\partial \sum ^{s_{(nl)}}_{j=1}(y_{j} -a_{j}^{(nl)})^{2}}{\partial z^{(nl)}_{i}} = \frac{1}{2}\cdot \frac{\partial (y_{j}-g(z_{i}^{(bl)}))^{2}}{\partial z_{i}^{(nl)}}

=-(y_{i}-g(z_{i}^{(nl)}))g^{'}(z_{i}^{(nl)})

=-(y_{i}-a_{i}^{(nl)})g^{'}(z_{i}^{(nl)})

反向传播递推公式:

1、权重更新的梯度:

\frac{\partial J(\theta)}{\partial w_{ji}^{l}} = \frac{J(\theta)}{\partial z_{j}^{l}}*\frac{\partial z^{l}_{j}}{\partial w^{l}_{ji}} =\delta ^{l}_{i}*a^{l-1}_{i}

\delta ^{(l)}_{i}=\frac{\partial J(\theta )}{z^{(l)}_{i}} = \sum _{j=1}^{S_{l+_1}} \frac{\partial J(\theta)}{\partial z^{l+1}_{i}}\cdot \frac{\partial z^{l+1}_{j}}{\partial a_{i}^{l}}\cdot \frac{\partial a^{l}_{i}}{\partial z^{l}_{i}}

=\sum _{j=1}^{S_{l+1}}\frac{\partial J(\theta)}{\partial z^{l+1}_{j}}\cdot \frac{\partial (\sum _{k}w^{l}_{jk}\cdot a^{l}_{k}}{\partial a^{l}_{i}}\cdot \frac{\partial a_{i}^{l}}{\partial z^{l}_{i}}

=\sum _{j=1}^{S_{l+1}}\delta _{j}^{l+1}\cdot w_{ji}^{l}\cdot g^{'}(z_{i}^{l})

=g^{'}(z_{i}^{l})\cdot \sum _{j=1}^{S_{l+1}}\delta _{j}^{l+1}\cdot w_{ji}^{l}

其中,没有涉及到偏置量的权重,因为偏置量总是为 +1 ,对偏导结果没有影响,其实:\frac{\partial J(\theta)}{\partial b^{l}_{j}} = \frac{\partial J(\theta)}{\partial z^{l}_{j}}\cdot \frac{\partial z^{l}_{j}}{\partial b^{l}_{j}} = \delta ^{l}_{j}\cdot \frac{\partial \sum (wa+b)}{\partial b^{l}_{j}} = \delta ^{l}_{j}

4、算法流程

1、输入训练集

2、开始训练:

  • 前向传播:z^{l} = w^{l-1}*a^{l-1} + b^{l-1}  ; a^{l} = g(z^{l})
  • 输出层产生的误差:\delta^{L}_{i} = -(y_{i} - a^{L}_{i})*g^{'}(z^{L}_{i})      (L:最后一层神经网络层数,即输出层)
  • 反向传播误差(各隐藏层中的神经元误差):\delta ^{l}_{i} = g^{'}(z^{l}_{i})*\sum \delta ^{l+1}\cdot w^{l}

3、更新权重:

  • w^{l} := w^{l} - \alpha \sum \delta ^{l}*a^{l-1}      (\alpha为更新步长)

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(Machine,Learning)