本文章内容:
Coursera吴恩达深度学习课程,
第二课改善深层神经网络:超参数调试、正则化以及优化(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization)
第一周:深度学习的实用层面(Practical aspects of Deep Learning)
包含三个部分,Initialization, L2 regularization , gradient checking
编程作业,记错本。
the first assignment of "Improving Deep Neural Networks".
By completing this assignment you will:
- Understand that different regularization methods that could help your model.
- Implement dropout and see it work on data.
- Recognize that a model without regularization gives you a better accuracy on the training set but nor necessarily on the test set.
- Understand that you could use both dropout and regularization on your model.
Training your neural network requires specifying an initial value of the weights. A well chosen initialization method will help learning. (防止对称权重错误,it fails to "break symmetry",)
我的:
for l in range(1, L):
### START CODE HERE ### (≈ 2 lines of code)
parameters['W' + str(l)] = np.random.rand(np.shape(layers_dims[L], layers_dims[L-1]))
parameters['b' + str(l)] = np.random.rand(np.shape(layers_dims[L], 1))
### END CODE HERE ###
return parameters
正确:
parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))
parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
反思:
1.用循环内部的值小L,不是外面的那个大L
2. np.zeros的用法,内部直接设置维度。
To break symmetry, lets intialize the weights randomly.
我的: parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * 10 parameters['b' + str(l)] = np.zeros(layers_dims[l],1) |
正确: parameters['b' + str(l)] = np.zeros((layers_dims[l],1)) |
反思: np.zeros((维度)) |
If you see "inf" as the cost after the iteration 0, this is because of numerical roundoff;
舍入误差(英语:round-off error),是指运算得到的近似值和精确值之间的差异。
In summary:
我的: parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * np.square(2/layers_dims[l-1]) |
正确: parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2 / layers_dims[l-1]) |
反思: np.sqrt() 是平方根。 |
What you should remember from this notebook:
By completing this assignment you will:
- Understand that different regularization methods that could help your model.
- Implement dropout and see it work on data.
- Recognize that a model without regularization gives you a better accuracy on the training set but nor necessarily on the test set.
- Understand that you could use both dropout and regularization on your model.
我的: L2_regularization_cost = (1/m)*(lambd/2)*(np.sum(np.square(Wl)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
NameError: name 'Wl' is not defined |
正确: L2_regularization_cost = 1./m * lambd/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3))) |
反思: 输入法的错误。。。 |
Of course, because you changed the cost, you have to change backward propagation as well! All the gradients have to be computed with respect to this new cost.
Observations:
What you should remember -- the implications of L2-regularization on:
The dropped neurons don't contribute to the training in both the forward and backward propagations of the iteration.
疑问:
forward and backward时,dropout的是同一个点吗?
回答:是。会使用同一个random matrix D。
When you shut some neurons down, you actually modify your model. The idea behind drop-out is that at each iteration, you train a different model that uses only a subset of your neurons. With dropout, your neurons thus become less sensitive to the activation of one other specific neuron, because that other neuron might be shut down at any time.
nstructions: You would like to shut down some neurons in the first and second layers. To do that, you are going to carry out 4 Steps:
keep_prob - probability of keeping a neuron active during drop-out, scalar.
我的: D1 = np.random.rand(np.shape(A1)) # Step 1: initialize matrix D1 = np.random.rand(..., ...) D1 = None # Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold) A1 = A1 * D1 # Step 3: shut down some neurons of A1 A1 = A1/ keep_prob # Step 4: scale the value of neurons that haven't been shut down |
正确: D1 = np.random.rand(A1.shape[0], A1.shape[1]) # Step 1: initialize matrix D1 = np.random.rand(..., ...) D1 = (D1 < keep_prob) # Step 2: convert entries of D1 to 0 or 1 (using keep_prob as the threshold) A1 = A1 * D1 # Step 3: shut down some neurons of A1 A1 = A1 / keep_prob |
反思: 不会使用D1 = (D1 < keep_prob) np.random.rand()函数的输入是两个参数。 |
Instruction: Backpropagation with dropout is actually quite easy. You will have to carry out 2 Steps:
反向传播时代码:
dA2 = dA2 * D2 # Step 1: Apply mask D2 to shut down the same neurons as during the forward propagation
dA2 = dA2/ keep_prob # Step 2: Scale the value of neurons that haven't been shut down
Note:
What you should remember about dropout: