神经网络初始化

参考资料:
作业来自Coursera课程Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization第1周作业1
因为没有钱,只能看看github上面别人做完的放上去这样子

  • 零初始化
    神经网络backpropagate的时候是要用前面的导数乘以系数矩阵的,如果系数矩阵都为0,那么最后得到的梯度也全部都是0,也就是没有更新

In general, initializing all the weights to zero results in the network failing to break symmetry. This means that every neuron in each layer will learn the same thing, and you might as well be training a neural network with nl = 1 for every layer, and the network is no more powerful than a linear classifier such as logistic regression.

分类效果不好
  • 随机初始化
    接下来的实验随机初始化非常大的值。
    这种初始化加上sigmoid非线性函数会导致分错类的例子的loss非常大,因为系数非常大,得到的激活值非常大,错误的情况下,激活值还非常大,loss就非常大,导致了训练要很久。
    错误的初始化可能导致梯度消失或者梯度爆炸,因为计算梯度的时候,我们从后面每一层都乘以系数,如果系数都非常小,梯度消失,系数都非常大,梯度爆炸。


    随机初始化,但幅度很大
  • He initialization


  • 结论


Different initializations lead to different results
Random initialization is used to break symmetry and make sure different hidden units can learn different things
Don't intialize to values that are too large
He initialization works well for networks with ReLU activations

你可能感兴趣的:(神经网络初始化)