斯坦福cs231n课程记录——assignment2 Dropout

目录

  • Dropout原理
  • Dropout实现
  • Dropout运用
  • 作业问题
  • 参考文献

一、Dropout原理

作用:regularize neural networks by randomly setting some features to zero during the forward pass.

二、Dropout实现

1.dropout_forward

def dropout_forward(x, dropout_param):
    p, mode = dropout_param['p'], dropout_param['mode']
    if 'seed' in dropout_param:
        np.random.seed(dropout_param['seed'])
    mask = None
    out = None
    if mode == 'train':
        mask = (np.random.rand(*x.shape) < p) / p
        out = x * mask
    elif mode == 'test':
        out = x
    cache = (dropout_param, mask)
    out = out.astype(x.dtype, copy=False)

    return out, cache

2.dropout_backward

def dropout_backward(dout, cache):

    dropout_param, mask = cache
    mode = dropout_param['mode']

    dx = None
    if mode == 'train':

        dx = dout * mask

    elif mode == 'test':
        dx = dout
    return dx

三、Dropout运用

two-layer networks:one will use no dropout, and one will use a keep probability of 0.25

 

斯坦福cs231n课程记录——assignment2 Dropout_第1张图片

 

 

四、作业问题

Inline Question 1:

What happens if we do not divide the values being passed through inverse dropout by p in the dropout layer? Why does that happen?

Answer:

在测试的时候,dropout 不做任何操作,所以输出的数学期望是其本身,但是在训练的时候,dropout 会改变输入和输出的数学期望,比如输入是x,以概率 p 保留,那么输出的数学期望就是 E(x̂ )=p∗x+(1−p)∗0=px ,因为我们希望在训练和测试的时候数学期望保持一致,那么在训练中forward的时候,需要除以p保证输入和输出的数学期望不变。

Inline Question 2:

Compare the validation and training accuracies with and without dropout -- what do your results suggest about dropout as a regularizer?

Answer:

在使用dropout的时候,训练的准确率会比不使用dropout的时候训练的准确率低,同时验证集的准确率会比不使用dropout的时候高一些,这说明了dropout可以作为regularization

Inline Question 3:

Suppose we are training a deep fully-connected network for image classification, with dropout after hidden layers (parameterized by keep probability p). How should we modify p, if at all, if we decide to decrease the size of the hidden layers (that is, the number of nodes in each layer)?

 Answer:

如果我们需要减小hidden layers的尺寸,也就减小隐藏层的神经元个数,那么dropout的保留概率应该加大,比如考虑一个最极端的情况,我们将隐藏层的神经元个数减小到1,那么如果doprou的保留概率仍然特别小,那网络在大多数时候根本没有进行训练。

参考文献

[1] Geoffrey E. Hinton et al, "Improving neural networks by preventing co-adaptation of feature detectors", arXiv 2012

你可能感兴趣的:(实践)