2020-7-22 吴恩达-改善深层NN-w1 深度学习的实用层面(课后作业)


1.If you have 10,000,000 examples, how would you split the train/dev/test set?

  • 33% train . 33% dev . 33% test
  • 60% train . 20% dev . 20% test
  • 98% train . 1% dev . 1% test。正确


2.The dev and test set should: 开发集和测试集应该

  • Come from the same distribution。正确。
  • Come from the different distribution
  • Be indentical to each other(same (x,y) pairs)
  • Have the same number of examples


3.If your Neural Network model seems to have high variance, what of the following would be promising things to try?

  • Add regularization 正确
  • Get more test data
  • Increase the number of units in each hidden layer
  • Make the NN deeper
  • Get more training data 正确



4.You are working on an automated check-out kiosk for a supermarket, and are building a classifier for apples, bananas and oranges. Suppose your classifier obtains a training set error of 0.5%, and a dev set error of 7%. Which of the following are promising things to try to improve your classifier? (Check all that apply.)

  • Increase the regularization parameter lambda 正确
  • Decrease the regularization parameter lambda
  • Get more training data 正确
  • Use a bigger NN



5.What is weight decay? 什么是权重衰减

  • A regularization technique (such as L2 regularization) that results in gradient descent shrinking the weights on every iteration. 正确
  • The process of gradually decreasing the learning rate during training.
  • Gradual corruption of the weights in the NN if it is trained on noisy data.
  • A technique to avoid vanishing gradient by imposing a ceiling on the values of the weights



6.What happens when you increase the regularization hyperparameter lambda?

  • Weights are pushed toward becoming smaller (closer to 0) 正确
  • Weights are pushed toward becoming bigger (further from 0)
  • Doubling lambda should roughly result in doubling the weights
  • Gradient descent taking bigger steps with each iteration (proportional to lambda)



7.With the inverted dropout technique, at test time:
使用反向随机失活(inverted dropout),在测试时候

  • You apply dropout (randomly eliminate units) and do not keep the 1/keep_prob factor in the calculations used in training
  • You apply dropout (randomly eliminate units) but keep the 1/keep_prob factor in the calculations used in training
  • You do not apply dropout (do not randomly eliminate units) but keep the 1/keep_prob factor in the calculations used in training 正确
  • You do not apply dropout (do not randomly eliminate units) and do not keep the 1/keep_prob factor in the calculations used in training



8.Increasing the parameter keep_prob from (say) 0.5 to 0.6 will likely cause the following: (Check the two that apply)

  • Increasing the regularization effect
  • Reducing the regularization effect 正确。
  • Causing the neural network to end up with a higher training set error
  • Causing the neural network to end up with a lower training set error 正确



9.Which of these techniques are useful for reducing variance (reducing overfitting)? (Check all that apply.)

  • Xavier初始化
  • Dropout 正确
  • Gradient Checking
  • Exploding gradient
  • L2 regularization 正确
  • Vanishing gradient
  • Data augmentation 正确



10.Why do we normalize the inputs x?

  • It makes the parameter initialization faster
  • It makes the cost function faster to optimize 正确。
  • Normalization is another word for regularization–It helps to reduce variance
  • It makes it easier to visualize the data.

