

  • Back-propagation Neural Network (BP神经网络)
    • Structure of BP-NN(BP神经网络的结构)
    • Learning Algorithm of BP-NN(BP-NN的学习算法)
    • Algorithm of Back-propagation 反向传播算法
      • A Stochastic Gradient Descent Algorithm 随机梯度下降算法
      • Back Propagation Algorithm (误差逆传播算法)
      • Defects of Back Propagation algorithm (BP 算法的缺陷)
        • (1)Overfitting Problem(过拟合问题)
        • (2) Local minimum problem(局部极小值问题)
        • (3) Gradient vanishing problem(“梯度消失”问题)
        • Problem of Overfitting in Classification(分类过拟合)
        • Problem of Local Minimum in BP Algorithm
        • Problems of Full Connection
      • Disadvantages of BP –NN

Back-propagation Neural Network (BP神经网络)

  • Feedforward neural networks are usually trained by back-propagation learning algorithm, so they are called BP neural networks.
  • BP algorithm is a common method of training Artificial Neural Networks, and used in conjunction with an optimization method such as gradient descent.
  • It should be pointed out that BP algorithm can be used not only in multilayer feedforward neural networks, but also in other types of neural networks, such as recurrent neural networks. 需要指出的是,BP算法不仅可用于多层前馈神经网络,还可用于其他类型的神经网络,比如递归神经网络。
  • But, in general, when we say “Back-propagation NN”, it refers to a
    multilayer Feedforward NN which is trained using BP algorithm.
    但通常说BP网络时, 一般是指用BP算法训练的多层前馈神经网络.

Structure of BP-NN(BP神经网络的结构)


Learning Algorithm of BP-NN(BP-NN的学习算法)

  • The learning of NN is to adjust the connection weight or structure of NN effectively, so that the input and output have the required characteristics.

  • The learning process of BP algorithm consists of forward propagation and backward propagation.

    (1) In the phase of forward propagation, the input signal (that is, training sample) is processed from the input layer through the hidden layer, and then transmitted to the output layer. The neuron states of each layer only affects those of the next layer.

    (2) If the actual output of the output layer is different from the expected output, go into the backward propagation phase. The error signal is returned along the original connection path and the gradient descent method is used to modify the connection weights of the neurons in each layer, so that the error signal decreases to a minimum.

  • The process of updating weights is “backward", that is, from the output layer, through each hidden layer in turn, to the input layer. 更新权值的过程是“反向”进行的,即,由输出层,依次经由每个隐藏层,到达输入层。

  • Repeat phase (1) and (2) until the performance of the network is satisfactory.
    重复阶段 (1) 和 (2) 的操作,直到网络的性能得到满足。

  • Suppose that there are N samples in the given training set as follows.

    • Input data: x i = [ x i 1 , x i 2 , … , x i p 1 ] T x_i = [x_{i1}, x_{i2},…, x_{ip1}]^T xi=[xi1,xi2,,xip1]T
    • Expected output : d i = [ d i 1 , d i 2 , … , d i p m ] T d_i = [d_{i1}, d_{i2},…,d_{ipm}]^T di=[di1,di2,,dipm]T ( i=1, 2,…, N)
  • BP learning algorithm minimizes the error through the backward learning process, so the objective function is defined as BP学习算法是通过反向学习过程使得误差最小,因此定义目标函数为:

    min J = 1 2 {1}\over{2} 21 ∑ j = 1 p m ( y j m − d j ) 2 \sum^{p_m}_{j=1}{(y^m_j -d_j)^2} j=1pm(yjmdj)2

  • That is, selecting the weights of neural networks to minimize the sum of squares (mean variance) of the difference between the expected output of all training samples and the actual output of the network. 即选择神经网络权值使所有训练样本的期望输出与网络的实际输出之差的平方和(均方差)最小。

Algorithm of Back-propagation 反向传播算法

  • Phase 1: Forward Propagation 正向传播
    the input of training data through the neural network in order to generate output

  • Phase 2: Back-propagation 反向传播

  • the output activations backward through the neural network using the training data target in order to generate the deltas of all output and hidden neurons.
    Δ = expected output (labeled ground truth)- actual output values

  • In the process of Back-propagation, update each weight.

  • Multiply its output delta and input activation, to get the gradient of the weight.将其输出差值与输入激活相乘,以便得到该权值梯度。

  • Subtract a ratio (percentage) of the gradient from the weight. The ratio is called learning rate. 从权值中减去梯度的比值(百分比)。该比值称为学习率。

  • The greater the ratio, the faster the neuron trains and the less accurate the training is ; 比值越大,神经元训练越快(但越不准确);

  • the lower the ratio, the more accurate the training is and the slower the neuron trains . 比值越低,训练精度越高(训练所需的时间越长、越慢)。
    Note: it is like to approximate a circle using polygonal line (折线)

A Stochastic Gradient Descent Algorithm 随机梯度下降算法

For training a three-layer network (only one hidden layer) 用于训练一个三层网络(仅有一个隐藏层)

Function STOCHASTIC-GRADIENT-DESCENT() return the network 
	initialize network weights(often small random values) 
		for each training example named ex //取一个样本
			prediction= neural-net-output (network, ex) // forward pass 
			actual= teacher-output(ex) //获得该样本的真实输出
			compute error (prediction-actual) at the output units //求输出误差
			Compute ∆wh for all weights from hidden layer to output layer // backward pass
			Compute ∆wi for all weights from input layer to hidden layer
			update network weights // input layer not modified by error estimate 
	until all examples classified correctly or another stopping criterion satisfied 
	return the network

Back Propagation Algorithm (误差逆传播算法)


Defects of Back Propagation algorithm (BP 算法的缺陷)

(1)Overfitting Problem(过拟合问题)

  • In general, when the training ability is improved , prediction ability will be improved.

  • However, with the improvement of the training ability, the prediction ability will
    decrease, which is called “over-fitting” phenomenon, that is, the training error is small, whereas the test error decreases to a certain value and then starts to increase.

  • When the training data is not enough or the model is too complex, it often leads to the over-fitting of the training data set.

  • Predictive ability is also called generalization capability or extend capability, and training capability is also called approximation capability or learning capability.

(2) Local minimum problem(局部极小值问题)

  • BP algorithm is a optimization method of local search. When training multilayer neural networks, it may fall into the local minimum, instead of the global minimum.

(3) Gradient vanishing problem(“梯度消失”问题)

  • Because BP algorithm uses Gradient Descent method to update the weights, and the objective function to be optimized is very complex, there will be some flat regions which makes the gradient disappear when the output of the neuron is close to 0 or 1.

  • This will cause that the weights of the rear layers are not updated and the training
    process is almost paused, which results in the slow convergence of BP-NN algorithm.

Problem of Overfitting in Classification(分类过拟合)


  • Solution strategy:
    Dropout (随机失活)
    Data enhancement (数据增强)
    Weight regularization (权重正则化)

Problem of Local Minimum in BP Algorithm


Solution strategy: take heuristic search algorithm, such as simulated annealing algorithm, accept the " sub-optimal solution" with a certain probability which decreases with time.

Problems of Full Connection

(1)Large number of parameters. (参数量巨大)
(2)Some properties of the data cannot be fully applied 不能充分应用数据的某些特性

Disadvantages of BP –NN

  • BP-NN is fully connected and has a large number of weights. BP–NN采用全连接,权值数量多

  • It requires a large number of training samples and requires a large amount of computation. BP–NN需要大量训练样本,计算量大

  • Ways to deal with it: reduce the number of weights using local connections and weight sharing.
