15 - Dropout的原理及其在TF/PyTorch/Numpy的源码实现

文章目录

  • 1. 作用
    • 1.2 公式
  • 2. nn.Dropout
  • 3. dropout 的numpy实现
    • 3.1 第一种实现:
    • 3.2 第二种实现:
  • 4. 小结

1. 作用

在训练模式下,dropout层指的是在对全连接层中的数据进行指定概率p对神经元的权重置零;从而使得在每个批次中的数据不一致,这样可以简单的看作是很多个不同的模型进行训练,从而得到更鲁棒性的权重,达到多模型融合作用,提高模型的泛化性,降低模型的过拟合率;
15 - Dropout的原理及其在TF/PyTorch/Numpy的源码实现_第1张图片

1.2 公式

h ′ = { 0 p h 1 − p o t h e r s h'=\left\{ \begin{aligned} 0&&p\\ \frac{h}{1-p}& & others \\ \end{aligned} \right. h=01phpothers
注: 由上述公式可以看出,我们需要分两步完成dropout:
(1)以概率p来对当前层权重置0
(2)将剩余的权重值乘以 1 / ( 1 − p ) 1/(1-p) 1/(1p)

那我们为什么需要将剩余的值乘以 1 / ( 1 − p ) 1/(1-p) 1/(1p)?为了保证样本的期望
E ( h ′ ) = 0 × p + h 1 − p × ( 1 − p ) = h ′ E(h')=0\times p+\frac{h}{1-p}\times (1-p)=h' E(h)=0×p+1ph×(1p)=h

import torch

# define dropout function
def dropout_test(x, dropout):
	"""

	:param x: input tensor
	:param dropout: probability for dropout
	:return: a tensor with masked by dropout
	"""
	# dropout must be between 0 and 1
	assert 0 <= dropout <= 1
	# if dropout is equal to 0;just return self_x
	if dropout == 0:
		return x
	# if dropout is equal to 1: put all values to zeros in tensor x
	if dropout == 1:
		return torch.zeros_like(x)
	# torch.rand is for return a tensor filled with values from uniform distribution [0,1)
	# we compare the values with dropout,if values is greater than dropout ,return 1,else 0
	mask = (torch.rand(x.shape) > dropout).float()
	# mask times x and give the scale(1-dropout) for the same expectation with before
	return mask * x / (1.0 - dropout)


input = torch.rand(3, 4)
dropout = 0.8
output = dropout_test(input, dropout)

print(f"input={input}")
print(f"output={output}")

2. nn.Dropout

pytorch中有两个dropout,一个是函数形式的torch.nn.functional.dropout;一个是封装好的类torch.nn.Dropout
在训练过程中,用伯努利分布的样本随机地将输入张量的一些元素以概率p归零。每个通道将在每个前转呼叫中独立地归零。

import torch
from torch import nn
from torch.nn import functional as F


my_dropout = nn.Dropout(p=0.6)
x_input = torch.randn(2,3,4)
x_output = my_dropout(x_input)
print(f"x_input={x_input}")
print(f"my_dropout={my_dropout}")
print(f"x_output={x_output}")
x_input=tensor([[[-1.1249,  0.4000,  1.3708,  0.7556],
         [-0.3823,  0.4001,  0.0950,  0.8916],
         [-1.2449, -0.8080,  0.2976, -2.3220]],

        [[ 0.6720,  1.9750, -0.5260, -1.6763],
         [ 1.2277, -0.0918, -1.4739,  0.3409],
         [ 0.2559, -0.8436, -0.5755, -0.4961]]])
my_dropout=Dropout(p=0.6, inplace=False)
x_output=tensor([[[-0.0000,  0.0000,  0.0000,  1.8891],
         [-0.0000,  0.0000,  0.2375,  2.2291],
         [-0.0000, -0.0000,  0.0000, -0.0000]],

        [[ 0.0000,  4.9375, -1.3149, -0.0000],
         [ 0.0000, -0.0000, -0.0000,  0.8523],
         [ 0.0000, -2.1089, -0.0000, -0.0000]]])

3. dropout 的numpy实现

3.1 第一种实现:

训练模式下只丢弃部分权重,不乘以系数 1/(1-rate);那么测试模式值需要乘以(1-rate);这样就可以保持不变

  • train_mode:
    h ′ = { 0 p h 1 − p h'=\left\{ \begin{aligned} 0&&&p\\ {h}& && 1-p \\ \end{aligned} \right. h={0hp1p
    E ( h ′ ) = 0 × p + h ∗ ( 1 − p ) = h ( 1 − p ) E(h')=0\times p + h*(1-p)=h(1-p) E(h)=0×p+h(1p)=h(1p)
  • test_mode:
    因为测试阶段不使用drop_out,所以为了使得训练模式train_mode和测试模式test_mode保持期望一致,那么我们需要将所有的值乘以(1-p);
    E ( h ′ ) = h ′ ( 1 − p ) E(h')=h'(1-p) E(h)=h(1p)
import numpy as np

def train(rate, x, w1, b1, w2, b2):
	"""
	description:
	if the train cannot use scale(1/(1-rate)) for output;
	then we need mutiply by (1.0-rate) for keeping the same expectation
	:param rate: probability of dropout
	:param x: input tensor
	:param w1: weight_1 of layer1
	:param b1: bias_1 of layer1
	:param w2: weight_2 of layer2
	:param b2: bias_2 of layer2
	:return: layer2
	"""
	layer1 = np.maximum(0, (np.dot(x, w1) + b1))
	mask1 = np.random.binomial(1, 1.0 - rate, layer1.shape)
	layer1 = layer1 * mask1

	layer2 = np.maximum(0, (np.dot(layer1, w2) + b2))
	mask2 = np.random.binomial(1, 1.0 - rate, layer2.shape)
	layer2 = layer2 * mask2
	return layer2


def test(rate, x, w1, b1, w2, b2):
	layer1 = np.maximum(0, np.dot(x, w1) + b1)
	layer1 = layer1 * (1.0 - rate)

	layer2 = np.maximum(0, np.dot(layer1, w2) + b2)
	layer2 = layer2 * (1.0 - rate)

	return layer2

3.2 第二种实现:

训练模式下丢弃部分权重,乘以系数 1/(1-rate);这样就保证期望不变,那么测试模式值就可以不做处理;保持训练和测试期望一致

  • train_mode:
    h ′ = { 0 p h 1 − p 1 − p h'=\left\{ \begin{aligned} 0&&&p\\ \frac{h}{1-p}& && 1-p \\ \end{aligned} \right. h=01php1p
    E ( h ′ ) = 0 × p + h / ( 1 − p ) ∗ ( 1 − p ) = h E(h')=0\times p + h/(1-p)*(1-p)=h E(h)=0×p+h/(1p)(1p)=h
  • test_mode:

因为测试阶段不使用drop_out,所以为了使得训练模式train_mode和测试模式test_mode保持期望一致,我们在测试模式下值可以不变
E ( h ′ ) = h ′ E(h')=h' E(h)=h

import numpy as np

def another_train(rate, x, w1, b1, w2, b2):
	layer1 = np.maximum(0, (np.dot(x, w1) + b1))
	mask1 = np.random.binomial(1, 1.0 - rate, layer1.shape)
	layer1 = layer1 * mask1
	layer1 = layer1 / (1.0 - rate)
	
	layer2 = np.maximum(0, (np.dot(layer1, w2) + b2))
	mask2 = np.random.binomial(1, 1.0 - rate, layer2.shape)
	layer2 = layer2 * mask2
	layer2 = layer2 / (1.0 - rate)

	return layer2


def another_test(x, w1, b1, w2, b2):
	layer1 = np.maximum(0, np.dot(x, w1) + b1)
	layer2 = np.maximum(0, np.dot(layer1, w2) + b2)
	return layer2

4. 小结

  • 准则1:
    不管是实现1,还是实现2,我们都需要保证训练与测试的期望相同:
    E t r a i n ( h ′ ) = E t e s t ( h ′ ) E_{train}(h')=E_{test}(h') Etrain(h)=Etest(h)
  • 准则2:
    训练与测试的方式是不同的,训练需要用到dropout方法,而测试不需要;
    -准则3:
    两种模式的不同:
    (1)模式1
    训练:有dropout+returnh导致了训练期望为: E t r a i n ( h ′ ) = h ( 1 − p ) E_{train}(h')=h(1-p) Etrain(h)=h(1p)
    测试:无dropout为了满足准则1; E t e s t ( h ′ ) = h ( 1 − p ) E_{test}(h')=h(1-p) Etest(h)=h(1p),return值必须满足return h(1-p),
    (2)模式2
    训练:有dropout+returnh/(1-p)导致了训练期望为: E t r a i n ( h ′ ) = h / ( 1 − p ) × ( 1 − p ) = h E_{train}(h')=h/(1-p)×(1-p)=h Etrain(h)=h/(1p)×(1p)=h
    测试:无dropout为了满足准则1; E t e s t ( h ′ ) = h E_{test}(h')=h Etest(h)=h,return值必须满足return h
    因为模式2能在测试阶段对参数更小的改动,所以在pytorch中采取的是模式2的方式

你可能感兴趣的:(python,pytorch,python)