在训练模式下,dropout层指的是在对全连接层中的数据进行指定概率p对神经元的权重置零;从而使得在每个批次中的数据不一致,这样可以简单的看作是很多个不同的模型进行训练,从而得到更鲁棒性的权重,达到多模型融合作用,提高模型的泛化性,降低模型的过拟合率;
h ′ = { 0 p h 1 − p o t h e r s h'=\left\{ \begin{aligned} 0&&p\\ \frac{h}{1-p}& & others \\ \end{aligned} \right. h′=⎩⎪⎨⎪⎧01−phpothers
注: 由上述公式可以看出,我们需要分两步完成dropout:
(1)以概率p来对当前层权重置0
(2)将剩余的权重值乘以 1 / ( 1 − p ) 1/(1-p) 1/(1−p)
那我们为什么需要将剩余的值乘以 1 / ( 1 − p ) 1/(1-p) 1/(1−p)?为了保证样本的期望
E ( h ′ ) = 0 × p + h 1 − p × ( 1 − p ) = h ′ E(h')=0\times p+\frac{h}{1-p}\times (1-p)=h' E(h′)=0×p+1−ph×(1−p)=h′
import torch
# define dropout function
def dropout_test(x, dropout):
"""
:param x: input tensor
:param dropout: probability for dropout
:return: a tensor with masked by dropout
"""
# dropout must be between 0 and 1
assert 0 <= dropout <= 1
# if dropout is equal to 0;just return self_x
if dropout == 0:
return x
# if dropout is equal to 1: put all values to zeros in tensor x
if dropout == 1:
return torch.zeros_like(x)
# torch.rand is for return a tensor filled with values from uniform distribution [0,1)
# we compare the values with dropout,if values is greater than dropout ,return 1,else 0
mask = (torch.rand(x.shape) > dropout).float()
# mask times x and give the scale(1-dropout) for the same expectation with before
return mask * x / (1.0 - dropout)
input = torch.rand(3, 4)
dropout = 0.8
output = dropout_test(input, dropout)
print(f"input={input}")
print(f"output={output}")
pytorch中有两个dropout,一个是函数形式的torch.nn.functional.dropout
;一个是封装好的类torch.nn.Dropout
在训练过程中,用伯努利分布的样本随机地将输入张量的一些元素以概率p归零。每个通道将在每个前转呼叫中独立地归零。
import torch
from torch import nn
from torch.nn import functional as F
my_dropout = nn.Dropout(p=0.6)
x_input = torch.randn(2,3,4)
x_output = my_dropout(x_input)
print(f"x_input={x_input}")
print(f"my_dropout={my_dropout}")
print(f"x_output={x_output}")
x_input=tensor([[[-1.1249, 0.4000, 1.3708, 0.7556],
[-0.3823, 0.4001, 0.0950, 0.8916],
[-1.2449, -0.8080, 0.2976, -2.3220]],
[[ 0.6720, 1.9750, -0.5260, -1.6763],
[ 1.2277, -0.0918, -1.4739, 0.3409],
[ 0.2559, -0.8436, -0.5755, -0.4961]]])
my_dropout=Dropout(p=0.6, inplace=False)
x_output=tensor([[[-0.0000, 0.0000, 0.0000, 1.8891],
[-0.0000, 0.0000, 0.2375, 2.2291],
[-0.0000, -0.0000, 0.0000, -0.0000]],
[[ 0.0000, 4.9375, -1.3149, -0.0000],
[ 0.0000, -0.0000, -0.0000, 0.8523],
[ 0.0000, -2.1089, -0.0000, -0.0000]]])
训练模式下只丢弃部分权重,不乘以系数 1/(1-rate);那么测试模式值需要乘以(1-rate);这样就可以保持不变
train_mode
:test_mode
:import numpy as np
def train(rate, x, w1, b1, w2, b2):
"""
description:
if the train cannot use scale(1/(1-rate)) for output;
then we need mutiply by (1.0-rate) for keeping the same expectation
:param rate: probability of dropout
:param x: input tensor
:param w1: weight_1 of layer1
:param b1: bias_1 of layer1
:param w2: weight_2 of layer2
:param b2: bias_2 of layer2
:return: layer2
"""
layer1 = np.maximum(0, (np.dot(x, w1) + b1))
mask1 = np.random.binomial(1, 1.0 - rate, layer1.shape)
layer1 = layer1 * mask1
layer2 = np.maximum(0, (np.dot(layer1, w2) + b2))
mask2 = np.random.binomial(1, 1.0 - rate, layer2.shape)
layer2 = layer2 * mask2
return layer2
def test(rate, x, w1, b1, w2, b2):
layer1 = np.maximum(0, np.dot(x, w1) + b1)
layer1 = layer1 * (1.0 - rate)
layer2 = np.maximum(0, np.dot(layer1, w2) + b2)
layer2 = layer2 * (1.0 - rate)
return layer2
训练模式下丢弃部分权重,乘以系数 1/(1-rate);这样就保证期望不变,那么测试模式值就可以不做处理;保持训练和测试期望一致
train_mode
:test_mode
:因为测试阶段不使用drop_out,所以为了使得训练模式train_mode和测试模式test_mode保持期望一致,我们在测试模式下值可以不变
E ( h ′ ) = h ′ E(h')=h' E(h′)=h′
import numpy as np
def another_train(rate, x, w1, b1, w2, b2):
layer1 = np.maximum(0, (np.dot(x, w1) + b1))
mask1 = np.random.binomial(1, 1.0 - rate, layer1.shape)
layer1 = layer1 * mask1
layer1 = layer1 / (1.0 - rate)
layer2 = np.maximum(0, (np.dot(layer1, w2) + b2))
mask2 = np.random.binomial(1, 1.0 - rate, layer2.shape)
layer2 = layer2 * mask2
layer2 = layer2 / (1.0 - rate)
return layer2
def another_test(x, w1, b1, w2, b2):
layer1 = np.maximum(0, np.dot(x, w1) + b1)
layer2 = np.maximum(0, np.dot(layer1, w2) + b2)
return layer2
有dropout
+returnh
导致了训练期望为: E t r a i n ( h ′ ) = h ( 1 − p ) E_{train}(h')=h(1-p) Etrain(h′)=h(1−p)无dropout
为了满足准则1; E t e s t ( h ′ ) = h ( 1 − p ) E_{test}(h')=h(1-p) Etest(h′)=h(1−p),return值必须满足return h(1-p)
,有dropout
+returnh/(1-p)
导致了训练期望为: E t r a i n ( h ′ ) = h / ( 1 − p ) × ( 1 − p ) = h E_{train}(h')=h/(1-p)×(1-p)=h Etrain(h′)=h/(1−p)×(1−p)=h无dropout
为了满足准则1; E t e s t ( h ′ ) = h E_{test}(h')=h Etest(h′)=h,return值必须满足return h