关于dropout看我这篇就够了

hi诸位大佬,dropout在网络设计中很常见,然而真正说出来其概念和意义的并没有多少,dropout是怎样计算的?别告诉我就是神经元的随机失活,不是如此的简单。dropout为啥能避免过拟合?

一问三不知,这是送命题,基本题。

For Recommendation in Deep learning QQ Group 102948747
For Visual in deep learning QQ Group 629530787
I'm here waiting for you 

不接受这个网页的私聊/私信!!!
 

1-先看help结果

tf1.15

dropout(x, keep_prob=None, noise_shape=None, seed=None, name=None, rate=None)
    Computes dropout. (deprecated arguments)
    
    Warning: SOME ARGUMENTS ARE DEPRECATED: `(keep_prob)`. They will be removed in a futur
e version.
    Instructions for updating:
    Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`
.
    
    For each element of `x`, with probability `rate`, outputs `0`, and otherwise
    scales up the input by `1 / (1-rate)`. The scaling is such that the expected
    sum is unchanged.
    
    By default, each element is kept or dropped independently.

就只考虑最简单的情况,如下:

当keep_prob=1时,意味着没有失活,没有实质上的dropout,也就是说所有的元素值都以概率1保留了下来,如下测试结果,

>>> inputs=tf.random.normal((4,3,2))
>>> inputs2=tf.nn.dropout(inputs,1)
>>> with tf.Session() as sess:
...     sess.run(tf.global_variables_initializer())
...     x,x2=sess.run([inputs,inputs2])
>>> np.array_equal(x,x2)
True

此时的keep_prob是保留的概率,而rate才是失活的概率,因此有上面的rate=1-keep_prob

tf2中

class Dropout(tensorflow.python.keras.engine.base_layer.Layer)
 |  Dropout(rate, noise_shape=None, seed=None, **kwargs)
 |  
 |  Applies Dropout to the input.
 |  
 |  Dropout consists in randomly setting
 |  a fraction `rate` of input units to 0 at each update during training time,
 |  which helps prevent overfitting.
 |  
 |  Arguments:
 |    rate: Float between 0 and 1. Fraction of the input units to drop.
 |    noise_shape: 1D integer tensor representing the shape of the
 |      binary dropout mask that will be multiplied with the input.
 |      For instance, if your inputs have shape
 |      `(batch_size, timesteps, features)` and
 |      you want the dropout mask to be the same for all timesteps,
 |      you can use `noise_shape=(batch_size, 1, features)`.
 |    seed: A Python integer to use as random seed.

此时的rate就是上面的1-keep_prob,同样有如下:【因为我是在tf1中用的tf2,所以有下面的with】

>>> dropout=tf.compat.v2.keras.layers.Dropout(0)
>>> inputs3=dropout(inputs)
>>> with tf.Session() as sess:
...     sess.run(tf.global_variables_initializer())
...     x,x3=sess.run([inputs,inputs3])
>>> np.array_equal(x,x3)
True

可见tf2中的参数就是失活的概率,但仅仅如此么?

2-如何失活的??防止过拟合的原理是啥?

上面的tf1中也可见,其数值有进行scale,为了使期望的和保持不变?这个我之前也提过,当时懵逼,现在仍旧如此【问题详见1-我的issue,2-link1,3-link2】

link1回答的问题是:为啥dropout能够避免过拟合?

This technique minimizes overfitting because each neuron becomes independently sufficient, in the sense that the neurons within the layers learn weight values that are not based on the cooperation of its neighbouring neurons.

 因为每个神经元变得足够独立,能够独立的学习权值,而不是依赖于邻近神经元的协作。

link2回答的问题是:为啥要加scale?加了scale会使得期望不变(The scaling is such that the expected sum is unchanged.)

公式(scale)如下:

加了scale也就是加上了扰动,而扰动使得鲁棒性更强(Robustness through Perturbations)

在标准的 dropout 正则化中,通过按保留(未删除)的节点比例进行归一化来消除每一层的偏差。 换句话说,在 dropout 概率为 p(也即上边的rate) 的情况下,每个中间激活 x 被一个随机变量 x' 替换,如下所示:

x'=\begin{Bmatrix} 0,with -rate \\ x/(1-rate),otherwise \end{Bmatrix}

通过上述定义,E(x')=x【不会推导这个的可以问我,加群我给你推一下】

下面举例子来验证【例子是无法验证的,所以才会有我的issue这种误解, 单个值(x)是不能说明总体情况的(E(x'))】

>>> inputs10=tf.nn.dropout(inputs,rate=0.5)
>>> with tf.Session() as sess:
...     sess.run(tf.global_variables_initializer())
...     x,x10=sess.run([inputs,inputs10])
>>> np.mean(x)
-0.049901545
>>> np.mean(x10)
0.2292403

愿我们终有重逢之时,而你还记得我们曾经讨论的话题。

【17:55修改】

 上面的tf2的用法有误,如果不加training=True,dropout发挥不了作用,无论rate是多少。

>>> dropout=tf.keras.layers.Dropout(0.5,seed=2021)
>>> inputs2=dropout(inputs)
>>> inputs3=dropout(inputs,training=True)
>>> import numpy as np
>>> np.array_equal(inputs.numpy(),inputs2.numpy())
True
>>> np.array_equal(inputs.numpy(),inputs3.numpy())
False

你可能感兴趣的:(python,dropout)