这次我们来探究初始化对于模型收敛速度与准确度的影响。我们主要从以下三种情况来进行这次的作业实验:1,全部初始化为零,我们将看到很差的成果 2,初始化为数值较大的矩阵,我们将看到收敛速度较慢,拟合效果一般的情况。 3,使用He方法,让初始矩阵适中大小。收敛效果好。
性能确实很差,损失也没有真正降低,该算法的性能甚至不如随机猜测。为什么呢?让我们看一下预测的详细信息和决策边界:
为什么会这样?因为我们之前的博客里有讲,这样会生成一个绝对对称的矩阵,然后直接就神经元变棒子了…(在这个例子里面就相当于把x1加上x2,然后乘以一个因子,再修正,再乘一个因子,再修正…最后结果就是 m ∗ ( x m*(x m∗(x1+ x x x2))没个卵用了)
梯度爆炸之典范,一开始直接就让cost突破天际了。
这里我修改了一下迭代次数(改成150000),想让结果变得更好,虽然好似并没有比15000次变得好很多…可以看到,还是那种没有完全拟合好的状态,因为初始值矩阵过大,使得其一直在最优拟合左右波动,而到达不了最优点。
我们已经学习了三种不同类型的初始化方法。对于相同的迭代次数和超参数,三种结果比较为:
el | 测试准确率 | 评价 |
---|---|---|
零初始化的3层NN | 50% | 未能打破对称性 |
随机初始化的3层NN | 85%(循环10倍) | 权重太大 |
He初始化的3层NN | 99% | 推荐方法 |
此作业中应记住的内容:
# GRADED FUNCTION: initialize_parameters_zeros
def initialize_parameters_zeros(layers_dims):
"""
Arguments:
layer_dims -- python array (list) containing the size of each layer.
Returns:
parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
b1 -- bias vector of shape (layers_dims[1], 1)
...
WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
bL -- bias vector of shape (layers_dims[L], 1)
"""
parameters = {}
L = len(layers_dims) # number of layers in the network
for l in range(1, L):
### START CODE HERE ### (≈ 2 lines of code)
parameters['W'+str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))
parameters['b'+str(l)] = np.zeros((layers_dims[l], 1))
### END CODE HERE ###
return parameters
# GRADED FUNCTION: initialize_parameters_random
def initialize_parameters_random(layers_dims):
"""
Arguments:
layer_dims -- python array (list) containing the size of each layer.
Returns:
parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
b1 -- bias vector of shape (layers_dims[1], 1)
...
WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
bL -- bias vector of shape (layers_dims[L], 1)
"""
np.random.seed(3) # This seed makes sure your "random" numbers will be the as ours
parameters = {}
L = len(layers_dims) # integer representing the number of layers
for l in range(1, L):
### START CODE HERE ### (≈ 2 lines of code)
parameters['W'+str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*10
parameters['b'+str(l)] = np.zeros((layers_dims[l], 1))
### END CODE HERE ###
return parameters
# GRADED FUNCTION: initialize_parameters_he
def initialize_parameters_he(layers_dims):
"""
Arguments:
layer_dims -- python array (list) containing the size of each layer.
Returns:
parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
b1 -- bias vector of shape (layers_dims[1], 1)
...
WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
bL -- bias vector of shape (layers_dims[L], 1)
"""
np.random.seed(3)
parameters = {}
L = len(layers_dims) - 1 # integer representing the number of layers
for l in range(1, L + 1):
### START CODE HERE ### (≈ 2 lines of code)
parameters['W'+str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*np.sqrt(2./layers_dims[l-1])
parameters['b'+str(l)] = np.zeros((layers_dims[l], 1))
### END CODE HERE ###
return parameters