最成功的神经网络学习算法:反向传播

文章目录

  • Back-propagation Neural Network (BP神经网络)
    • Structure of BP-NN(BP神经网络的结构)
    • Learning Algorithm of BP-NN(BP-NN的学习算法)
    • Algorithm of Back-propagation 反向传播算法
      • A Stochastic Gradient Descent Algorithm 随机梯度下降算法
      • Back Propagation Algorithm (误差逆传播算法)
      • Defects of Back Propagation algorithm (BP 算法的缺陷)
        • (1)Overfitting Problem(过拟合问题)
        • (2) Local minimum problem(局部极小值问题)
        • (3) Gradient vanishing problem(“梯度消失”问题)
        • Problem of Overfitting in Classification(分类过拟合)
        • Problem of Local Minimum in BP Algorithm
        • Problems of Full Connection
      • Disadvantages of BP –NN

Back-propagation Neural Network (BP神经网络)

  • Feedforward neural networks are usually trained by back-propagation learning algorithm, so they are called BP neural networks.
    由于前馈型神经网络多采用反向传播学习算法进行训练,故被称为BP网络。
  • BP algorithm is a common method of training Artificial Neural Networks, and used in conjunction with an optimization method such as gradient descent.
    BP算法是训练人工神经网络的常用方法,与梯度下降优化方法结合使用。
  • It should be pointed out that BP algorithm can be used not only in multilayer feedforward neural networks, but also in other types of neural networks, such as recurrent neural networks. 需要指出的是,BP算法不仅可用于多层前馈神经网络,还可用于其他类型的神经网络,比如递归神经网络。
  • But, in general, when we say “Back-propagation NN”, it refers to a
    multilayer Feedforward NN which is trained using BP algorithm.
    但通常说BP网络时, 一般是指用BP算法训练的多层前馈神经网络.

Structure of BP-NN(BP神经网络的结构)

最成功的神经网络学习算法:反向传播_第1张图片

Learning Algorithm of BP-NN(BP-NN的学习算法)

  • The learning of NN is to adjust the connection weight or structure of NN effectively, so that the input and output have the required characteristics.
    神经网络学习是指有效调整神经网络的连接权值或结构,使输入输出具有需要的特性。

  • The learning process of BP algorithm consists of forward propagation and backward propagation.
    BP算法的学习过程由正向传播和反向传播两个阶段组成。

    (1) In the phase of forward propagation, the input signal (that is, training sample) is processed from the input layer through the hidden layer, and then transmitted to the output layer. The neuron states of each layer only affects those of the next layer.
    正向传播阶段中,输入信号(即训练样本)从输入层经隐藏层逐层处理,并传向输出层,每一层神经元的状态只影响下一层神经元的状态。

    (2) If the actual output of the output layer is different from the expected output, go into the backward propagation phase. The error signal is returned along the original connection path and the gradient descent method is used to modify the connection weights of the neurons in each layer, so that the error signal decreases to a minimum.
    若输出层的实际输出与期望输出不同,则转入反向传播阶段,将误差信号沿原来的连接通路返回,采用梯度下降法修改各层神经元的连接权值,使得误差信号递减到最小。

  • The process of updating weights is “backward", that is, from the output layer, through each hidden layer in turn, to the input layer. 更新权值的过程是“反向”进行的,即,由输出层,依次经由每个隐藏层,到达输入层。

  • Repeat phase (1) and (2) until the performance of the network is satisfactory.
    重复阶段 (1) 和 (2) 的操作,直到网络的性能得到满足。

  • Suppose that there are N samples in the given training set as follows.

    • Input data: x i = [ x i 1 , x i 2 , … , x i p 1 ] T x_i = [x_{i1}, x_{i2},…, x_{ip1}]^T xi=[xi1,xi2,,xip1]T
    • Expected output : d i = [ d i 1 , d i 2 , … , d i p m ] T d_i = [d_{i1}, d_{i2},…,d_{ipm}]^T di=[di1,di2,,dipm]T ( i=1, 2,…, N)
  • BP learning algorithm minimizes the error through the backward learning process, so the objective function is defined as BP学习算法是通过反向学习过程使得误差最小,因此定义目标函数为:

    min J = 1 2 {1}\over{2} 21 ∑ j = 1 p m ( y j m − d j ) 2 \sum^{p_m}_{j=1}{(y^m_j -d_j)^2} j=1pm(yjmdj)2

  • That is, selecting the weights of neural networks to minimize the sum of squares (mean variance) of the difference between the expected output of all training samples and the actual output of the network. 即选择神经网络权值使所有训练样本的期望输出与网络的实际输出之差的平方和(均方差)最小。

Algorithm of Back-propagation 反向传播算法

  • Phase 1: Forward Propagation 正向传播
    the input of training data through the neural network in order to generate output
    activations.
    输入的训练数据穿过神经网络(从输入层经隐藏层到输出层),从而生成输出的激活函数值

  • Phase 2: Back-propagation 反向传播

  • the output activations backward through the neural network using the training data target in order to generate the deltas of all output and hidden neurons.
    输出激活再使用训练数据的期望目标反向穿过神经网络(从输出层到输入层),生成所有的输出层和隐藏层神经元的差值。
    Δ = expected output (labeled ground truth)- actual output values

  • In the process of Back-propagation, update each weight.
    在反向传播过程中,更新每个权值

  • Multiply its output delta and input activation, to get the gradient of the weight.将其输出差值与输入激活相乘,以便得到该权值梯度。

  • Subtract a ratio (percentage) of the gradient from the weight. The ratio is called learning rate. 从权值中减去梯度的比值(百分比)。该比值称为学习率。

  • The greater the ratio, the faster the neuron trains and the less accurate the training is ; 比值越大,神经元训练越快(但越不准确);

  • the lower the ratio, the more accurate the training is and the slower the neuron trains . 比值越低,训练精度越高(训练所需的时间越长、越慢)。
    Note: it is like to approximate a circle using polygonal line (折线)

A Stochastic Gradient Descent Algorithm 随机梯度下降算法

For training a three-layer network (only one hidden layer) 用于训练一个三层网络(仅有一个隐藏层)

Function STOCHASTIC-GRADIENT-DESCENT() return the network 
	initialize network weights(often small random values) 
	do
		for each training example named ex //取一个样本
			prediction= neural-net-output (network, ex) // forward pass 
			actual= teacher-output(ex) //获得该样本的真实输出
			compute error (prediction-actual) at the output units //求输出误差
			Compute ∆wh for all weights from hidden layer to output layer // backward pass
			Compute ∆wi for all weights from input layer to hidden layer
			update network weights // input layer not modified by error estimate 
	until all examples classified correctly or another stopping criterion satisfied 
	return the network

Back Propagation Algorithm (误差逆传播算法)

最成功的神经网络学习算法:反向传播_第2张图片
最成功的神经网络学习算法:反向传播_第3张图片
最成功的神经网络学习算法:反向传播_第4张图片

Defects of Back Propagation algorithm (BP 算法的缺陷)

(1)Overfitting Problem(过拟合问题)

  • In general, when the training ability is improved , prediction ability will be improved.
    一般情况下,随着训练能力地提高,预测能力会得到提高。

  • However, with the improvement of the training ability, the prediction ability will
    decrease, which is called “over-fitting” phenomenon, that is, the training error is small, whereas the test error decreases to a certain value and then starts to increase.
    但随着训练能力的提高,预测能力反而会下降,称为“过拟合”现象,即训练误
    差小,然而测试误差减小到某个值之后却反而开始增大。

  • When the training data is not enough or the model is too complex, it often leads to the over-fitting of the training data set.
    在训练数据不够多,或模型过于复杂时,常会导致模型对训练数据集过度拟合。

  • Predictive ability is also called generalization capability or extend capability, and training capability is also called approximation capability or learning capability.
    预测能力也称泛化能力或推广能力,而训练能力也称逼近能力或学习能力。

(2) Local minimum problem(局部极小值问题)

  • BP algorithm is a optimization method of local search. When training multilayer neural networks, it may fall into the local minimum, instead of the global minimum.
    BP算法是一种局部搜索的优化方法,在训练多层神经网络时,可能会陷入局部极小
    值,而非全局最小值。

(3) Gradient vanishing problem(“梯度消失”问题)

  • Because BP algorithm uses Gradient Descent method to update the weights, and the objective function to be optimized is very complex, there will be some flat regions which makes the gradient disappear when the output of the neuron is close to 0 or 1.
    由于BP算法使用梯度下降法更新权重,它所要优化的目标函数非常复杂,则会在
    神经元输出接近0或1的情况下,出现一些平坦区,使得梯度消失;

  • This will cause that the weights of the rear layers are not updated and the training
    process is almost paused, which results in the slow convergence of BP-NN algorithm.
    导致后面层的权重基本不更新,训练过程几乎停顿,导致了BP神经网络算法收敛速度慢的现象。

Problem of Overfitting in Classification(分类过拟合)

最成功的神经网络学习算法:反向传播_第5张图片

  • Solution strategy:
    Early-stopping(早停法)
    Dropout (随机失活)
    Data enhancement (数据增强)
    Weight regularization (权重正则化)

Problem of Local Minimum in BP Algorithm

最成功的神经网络学习算法:反向传播_第6张图片

Solution strategy: take heuristic search algorithm, such as simulated annealing algorithm, accept the " sub-optimal solution" with a certain probability which decreases with time.
解决策略:采用启发式搜索算法,如模拟退火,以一定的概率接受“次优解”,概率随时间降低.

Problems of Full Connection

(1)Large number of parameters. (参数量巨大)
(2)Some properties of the data cannot be fully applied 不能充分应用数据的某些特性
最成功的神经网络学习算法:反向传播_第7张图片

Disadvantages of BP –NN

  • BP-NN is fully connected and has a large number of weights. BP–NN采用全连接,权值数量多

  • It requires a large number of training samples and requires a large amount of computation. BP–NN需要大量训练样本,计算量大

  • Ways to deal with it: reduce the number of weights using local connections and weight sharing.
    应对之道:采用局部连接权值共享方法,用以减少权值个数

你可能感兴趣的:(AI,神经网络,神经网络)