在深度学习研究领域,论文结果的可复现性是一个很大的问题.遑论各种paper中的代码,有时候就是自己写的代码,都难以保证可复现性:即使使用同样的网络结构,同样的数据库,在同一台机器上训练,训练的结果都有差别.这一现象很大程度上是由于深度学习训练过程中的随机性造成的.
Tips: pytorch的可复现性会受到pytorch版本和操作系统平台的影响.
下面通过一个例子来演示如何设置pytorch的随机种子
# Train a model to fit a line y=mx using given data points
import torch
## Uncomment the two lines below to make the training reproducible.
#seed = 3
#torch.manual_seed(seed)
# set device to CUDA if available, else to CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Device:', device)
# N - number of data points
# n_inputs - number of input variables
# n_hidden - number of units in the hidden layer
# n_outputs - number of outputs
N, n_inputs, n_hidden, n_outputs = 5, 1, 100, 1
# Input 7 pairs of (x, y) input values
x = torch.tensor([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0]], device=device)
y = torch.tensor([[0.0], [10.0], [20.0], [30.0], [40.0], [50.0], [60.0], [70.0]], device=device)
# Make a 3 layer neural network with an input layer, hidden layer and output layer
model = torch.nn.Sequential(
torch.nn.Linear(n_inputs, n_hidden),
torch.nn.ReLU(),
torch.nn.Linear(n_hidden, n_outputs)
)
# Move the model to the device
model.to(device)
# Define the loss function to be the mean squared error loss
loss_fn = torch.nn.MSELoss(reduction='sum')
# Do forward pass through the data points, compute loss, compute gradients using backward propagation and update the weights using the gradients.
learning_rate = 1e-4
for t in range(1000):
y_out = model.forward(x)
loss = loss_fn(y_out, y)
if t % 100 == 99:
print(t, loss.item())
# print(y_out)
# Gradients are made to zero prior to backward pass.
model.zero_grad()
loss.backward()
# Update weights using gradient descent
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
运行上述代码两次:
第一次运行结果
Device: cuda
99 13.865872383117676
199 5.772928714752197
299 3.566026210784912
399 2.5292069911956787
499 1.8655864000320435
599 1.3915504217147827
699 1.0447190999984741
799 0.7871285676956177
899 0.5957959890365601
999 0.45342087745666504
第二次运行的结果
Device: cuda
99 6.1840715408325195
199 3.0933115482330322
299 1.9355353116989136
399 1.3561317920684814
499 0.998731791973114
599 0.7554249167442322
699 0.5831341743469238
799 0.45905551314353943
899 0.3688798248767853
999 0.30284053087234497
注释代码中6,7两行
seed = 3
torch.manual_seed(seed)
重新运行两次:
第一次运行结果
Device: cuda
99 10.655608177185059
199 3.6195263862609863
299 1.653144359588623
399 0.9989959001541138
499 0.712784469127655
599 0.5509689450263977
699 0.44407185912132263
799 0.368024617433548
899 0.3116675019264221
999 0.2681158781051636
第二次运行结果
Device: cuda
99 10.655608177185059
199 3.6195263862609863DNN
299 1.653144359588623
399 0.9989959001541138
499 0.712784469127655
599 0.5509689450263977
699 0.44407185912132263
799 0.368024617433548
899 0.3116675019264221
999 0.2681158781051636
可以看到两次运行的结果是一样的
上面的简单实例用仅仅设置了pytorch的随机数种子,但是当涉及到卷积操作的时候,这样是不够的.因为此时涉及到CuDNN加速GPU操作.,实际上只要增加一下代码就可以
seed = 3
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
将以上代码加入pytorch图像分类代码
第一次运行结果
Device: cuda
[1, 2000] loss: 2.192
[1, 4000] loss: 1.824
[1, 6000] loss: 1.613
[1, 8000] loss: 1.532
[1, 10000] loss: 1.470
[1, 12000] loss: 1.429
[2, 2000] loss: 1.378
[2, 4000] loss: 1.317
[2, 6000] loss: 1.291
[2, 8000] loss: 1.298
[2, 10000] loss: 1.264
[2, 12000] loss: 1.255
Finished Training
第二次运行结果
Device: cuda
[1, 2000] loss: 2.192
[1, 4000] loss: 1.824
[1, 6000] loss: 1.613
[1, 8000] loss: 1.532
[1, 10000] loss: 1.470
[1, 12000] loss: 1.429
[2, 2000] loss: 1.378
[2, 4000] loss: 1.317
[2, 6000] loss: 1.291
[2, 8000] loss: 1.298
[2, 10000] loss: 1.264
[2, 12000] loss: 1.255
Finished Training
如果涉及到numpy还需要设置numpy的初始化种子
seed = 3
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False