参考链接
1.If you have 10,000,000 examples, how would you split the train/dev/test set?
如果你有1000万个样本,如何划分训练/开发/测试集合?
- 33% train . 33% dev . 33% test
- 60% train . 20% dev . 20% test
- 98% train . 1% dev . 1% test。正确
理由参见课程。大量数据的时候,只需要1%的验证和测试就足够了。
==============================================================
2.The dev and test set should: 开发集和测试集应该
- Come from the same distribution。正确。
- Come from the different distribution
- Be indentical to each other(same (x,y) pairs)
- Have the same number of examples
建议大家要确保验证集和测试集的数据来自同一分布。参考课程。
===============================================================
3.If your Neural Network model seems to have high variance, what of the following would be promising things to try?
如果你的NN有高方差,你应该尝试以下哪个操作
- Add regularization 正确
- Get more test data
- Increase the number of units in each hidden layer
- Make the NN deeper
- Get more training data 正确
高偏差的解决办法是扩大神经网络或增加训练集,高方差的解决方法是正则化和增加训练集。参考课程。
=================================================================
4.You are working on an automated check-out kiosk for a supermarket, and are building a classifier for apples, bananas and oranges. Suppose your classifier obtains a training set error of 0.5%, and a dev set error of 7%. Which of the following are promising things to try to improve your classifier? (Check all that apply.)
你在构筑一个苹果,香蕉和桔子的分类器。假设你的分类器训练误差0.5%,开发误差7%,你会对你的分类器做以下哪些操作?
- Increase the regularization parameter lambda 正确
- Decrease the regularization parameter lambda
- Get more training data 正确
- Use a bigger NN
训练误差低,开发误差高,过拟合,高方差。和第三题类似,参考课程。
=============================================================
5.What is weight decay? 什么是权重衰减
- A regularization technique (such as L2 regularization) that results in gradient descent shrinking the weights on every iteration. 正确
- The process of gradually decreasing the learning rate during training.
- Gradual corruption of the weights in the NN if it is trained on noisy data.
- A technique to avoid vanishing gradient by imposing a ceiling on the values of the weights
L2正则化也被称为“权重衰减”。参考课程。
=============================================================
6.What happens when you increase the regularization hyperparameter lambda?
你增加正则超参λ,会发生什么?
- Weights are pushed toward becoming smaller (closer to 0) 正确
- Weights are pushed toward becoming bigger (further from 0)
- Doubling lambda should roughly result in doubling the weights
- Gradient descent taking bigger steps with each iteration (proportional to lambda)
如果正则化参数λλ设置得足够大,那么权重矩阵W就会被设置为接近于0。参考课程。
==============================================================
7.With the inverted dropout technique, at test time:
使用反向随机失活(inverted dropout),在测试时候
- You apply dropout (randomly eliminate units) and do not keep the 1/keep_prob factor in the calculations used in training
- You apply dropout (randomly eliminate units) but keep the 1/keep_prob factor in the calculations used in training
- You do not apply dropout (do not randomly eliminate units) but keep the 1/keep_prob factor in the calculations used in training 正确
- You do not apply dropout (do not randomly eliminate units) and do not keep the 1/keep_prob factor in the calculations used in training
参考课程。
==============================================================
8.Increasing the parameter keep_prob from (say) 0.5 to 0.6 will likely cause the following: (Check the two that apply)
增加参数keep_prob从0.5到0.6会导致
- Increasing the regularization effect
- Reducing the regularization effect 正确。
- Causing the neural network to end up with a higher training set error
- Causing the neural network to end up with a lower training set error 正确
keep_prob越大,保留的神经元越多,那正则化的效果越小。参考课程。
==============================================================
9.Which of these techniques are useful for reducing variance (reducing overfitting)? (Check all that apply.)
哪些技术可以减少方差,加少过拟合
- Xavier初始化
- Dropout 正确
- Gradient Checking
- Exploding gradient
- L2 regularization 正确
- Vanishing gradient
- Data augmentation 正确
高偏差的解决办法是扩大神经网络或增加训练集,高方差的解决方法是正则化和增加训练集。参考课程。
==============================================================
10.Why do we normalize the inputs x?
为什么我们要归一化输入x
- It makes the parameter initialization faster
- It makes the cost function faster to optimize 正确。
- Normalization is another word for regularization–It helps to reduce variance
- It makes it easier to visualize the data.
训练NN,其中一个加速训练的方法就是归一化输入。参考课程。