PyTorch 常用代码段整理合集

本文代码基于PyTorch 1.0版本,需要用到以下包


 
   
   
   
   
  1. import collections
  2. import os
  3. import shutil
  4. import tqdm
  5. import numpy as np
  6. import PIL.Image
  7. import torch
  8. import torchvision

1. 基础配置

检查PyTorch版本


 
   
   
   
   
  1. torch .__version__ # PyTorch version
  2. torch .version .cuda # Corresponding CUDA version
  3. torch .backends .cudnn .version() # Corresponding cuDNN version
  4. torch .cuda .get_device_name(0) # GPU type

更新PyTorch

PyTorch将被安装在anaconda3/lib/python3.7/site-packages/torch/目录下。

conda update pytorch torchvision -c pytorch
 
   
   
   
   

固定随机种子


 
   
   
   
   
  1. torch .manual_seed(0)
  2. torch .cuda .manual_seed_all(0)

指定程序运行在特定GPU卡上

在命令行指定环境变量

CUDA_VISIBLE_DEVICES=0,1 python train.py
 
   
   
   
   

或在代码中指定

os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
 
   
   
   
   

判断是否有CUDA支持

torch.cuda.is_available()
 
   
   
   
   

设置为cuDNN benchmark模式

Benchmark模式会提升计算速度,但是由于计算中有随机性,每次网络前馈结果略有差异。

torch.backends.cudnn.benchmark = True
 
   
   
   
   

如果想要避免这种结果波动,设置

torch.backends.cudnn.deterministic = True
 
   
   
   
   

清除GPU存储

有时Control-C中止运行后GPU存储没有及时释放,需要手动清空。在PyTorch内部可以

torch.cuda.empty_cache()
 
   
   
   
   

或在命令行可以先使用ps找到程序的PID,再使用kill结束该进程


 
   
   
   
   
  1. ps aux | grep python
  2. kill - 9 [pid]

或者直接重置没有被清空的GPU

nvidia-smi --gpu-reset -i [gpu_id]
 
   
   
   
   

2. 张量处理

张量基本信息


 
   
   
   
   
  1. tensor. type() # Data type
  2. tensor.size() # Shape of the tensor. It is a subclass of Python tuple
  3. tensor.dim() # Number of dimensions.

数据类型转换


 
   
   
   
   
  1. # Set default tensor type. Float in PyTorch is much faster than double.
  2. torch.set_default_tensor_type(torch.FloatTensor)
  3. # Type convertions.
  4. tensor = tensor.cuda()
  5. tensor = tensor.cpu()
  6. tensor = tensor.float()
  7. tensor = tensor.long()

torch.Tensor与np.ndarray转换


 
   
   
   
   
  1. # torch.Tensor -> np.ndarray.
  2. ndarray = tensor.cpu().numpy()
  3. # np.ndarray -> torch.Tensor.
  4. tensor = torch.from_numpy(ndarray). float()
  5. tensor = torch.from_numpy(ndarray.copy()). float() # If ndarray has negative stride

torch.Tensor与PIL.Image转换

PyTorch中的张量默认采用N×D×H×W的顺序,并且数据范围在[0, 1],需要进行转置和规范化。


 
   
   
   
   
  1. # torch.Tensor -> PIL.Image.
  2. image = PIL.Image.fromarray(torch.clamp(tensor * 255, min= 0, max= 255
  3. ). byte().permute( 1, 2, 0).cpu().numpy())
  4. image = torchvision.transforms.functional.to_pil_image(tensor) # Equivalently way
  5. # PIL.Image -> torch.Tensor.
  6. tensor = torch.from_numpy(np.asarray(PIL.Image.open(path))
  7. ).permute( 2, 0, 1). float() / 255
  8. tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) # Equivalently way

np.ndarray与PIL.Image转换


 
   
   
   
   
  1. # np.ndarray -> PIL.Image.
  2. image = PIL.Image.fromarray(ndarray.astypde(np.uint8))
  3. # PIL.Image -> np.ndarray.
  4. ndarray = np.asarray(PIL.Image.open(path))

从只包含一个元素的张量中提取值

这在训练时统计loss的变化过程中特别有用。否则这将累积计算图,使GPU存储占用量越来越大。

value = tensor.item()
 
   
   
   
   

张量形变

张量形变常常需要用于将卷积层特征输入全连接层的情形。相比torch.view,torch.reshape可以自动处理输入张量不连续的情况。

tensor = torch.reshape(tensor, shape)
 
   
   
   
   

打乱顺序

tensor = tensor[torch.randperm(tensor.size(0))]  # Shuffle the first dimension
 
   
   
   
   

水平翻转

PyTorch不支持tensor[::-1]这样的负步长操作,水平翻转可以用张量索引实现。


 
   
   
   
   
  1. # Assume tensor has shape N*D*H*W.
  2. tensor = tensor[:, :, :, torch.arange(tensor.size( 3) - 1, -1, -1). long()]

复制张量

有三种复制的方式,对应不同的需求。


 
   
   
   
   
  1. # Operation | New/Shared memory | Still in computation graph |
  2. tensor. clone() # | New | Yes |
  3. tensor.detach() # | Shared | No |
  4. tensor.detach. clone()() # | New | No |

拼接张量

注意torch.cat和torch.stack的区别在于torch.cat沿着给定的维度拼接,而torch.stack会新增一维。例如当参数是3个10×5的张量,torch.cat的结果是30×5的张量,而torch.stack的结果是3×10×5的张量。


 
   
   
   
   
  1. tensor = torch.cat(list_of_tensors, dim= 0)
  2. tensor = torch.stack(list_of_tensors, dim= 0)

将整数标记转换成独热(one-hot)编码

PyTorch中的标记默认从0开始。


 
   
   
   
   
  1. N = tensor.size( 0)
  2. one_hot = torch.zeros(N, num_classes). long()
  3. one_hot.scatter_( dim= 1, index=torch.unsqueeze(tensor, dim= 1), src=torch.ones(N, num_classes). long())

得到非零/零元素


 
   
   
   
   
  1. torch.nonzero(tensor) # Index of non-zero elements
  2. torch.nonzero(tensor == 0) # Index of zero elements
  3. torch.nonzero(tensor).size( 0) # Number of non-zero elements
  4. torch.nonzero(tensor == 0).size( 0) # Number of zero elements

判断两个张量相等


 
   
   
   
   
  1. torch.allclose(tensor1, tensor2) # float tensor
  2. torch.equal(tensor1, tensor2) # int tensor

张量扩展


 
   
   
   
   
  1. # Expand tensor of shape 64*512 to shape 64*512*7*7.
  2. torch .reshape( tensor, (64, 512, 1, 1)) .expand(64, 512, 7, 7)

矩阵乘法


 
   
   
   
   
  1. # Matrix multiplication: (m*n) * (n*p) -> (m*p).
  2. result = torch.mm(tensor1, tensor2)
  3. # Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p).
  4. result = torch.bmm(tensor1, tensor2)
  5. # Element-wise multiplication.
  6. result = tensor1 * tensor2

计算两组数据之间的两两欧式距离


 
   
   
   
   
  1. # X1 is of shape m*d, X2 is of shape n*d.
  2. dist = torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim= 2))

3. 模型定义

卷积层

最常用的卷积层配置是


 
   
   
   
   
  1. conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size= 3, stride= 1, padding= 1, bias= True)
  2. conv = torch.nn.Conv2d(in_channels, out_channels, kernel_size= 1, stride= 1, padding= 0, bias= True)

如果卷积层配置比较复杂,不方便计算输出大小时,可以利用如下可视化工具辅助

Convolution Visualizer​ezyang.github.io

 

GAP(Global average pooling)层

gap = torch.nn.AdaptiveAvgPool2d(output_size=1)
 
   
   
   
   

双线性汇合(bilinear pooling)[1]


 
   
   
   
   
  1. X = torch.reshape(N, D, H * W) # Assume X has shape N*D*H*W
  2. X = torch.bmm(X, torch.transpose(X, 1, 2)) / (H * W) # Bilinear pooling
  3. assert X.size() == (N, D, D)
  4. X = torch.reshape(X, (N, D * D))
  5. X = torch.sign(X) * torch.sqrt(torch.abs(X) + 1e-5) # Signed-sqrt normalization
  6. X = torch.nn.functional.normalize(X) # L2 normalization

多卡同步BN(Batch normalization)

当使用torch.nn.DataParallel将代码运行在多张GPU卡上时,PyTorch的BN层默认操作是各卡上数据独立地计算均值和标准差,同步BN使用所有卡上的数据一起计算BN层的均值和标准差,缓解了当批量大小(batch size)比较小时对均值和标准差估计不准的情况,是在目标检测等任务中一个有效的提升性能的技巧。

vacancy/Synchronized-BatchNorm-PyTorch​github.com图标

现在PyTorch官方已经支持同步BN操作


 
   
   
   
   
  1. sync_bn = torch.nn.SyncBatchNorm(num_features, eps= 1e-05, momentum= 0.1, affine= True,
  2. track_running_stats= True)

将已有网络的所有BN层改为同步BN层


 
   
   
   
   
  1. def convertBNtoSyncBN( module, process_group=None):
  2. '''Recursively replace all BN layers to SyncBN layer.
  3. Args:
  4. module[torch.nn. Module]. Network
  5. '''
  6. if isinstance( module, torch.nn.modules.batchnorm._BatchNorm):
  7. sync_bn = torch.nn.SyncBatchNorm( module.num_features, module.eps, module.momentum,
  8. module.affine, module.track_running_stats, process_group)
  9. sync_bn.running_mean = module.running_mean
  10. sync_bn.running_var = module.running_var
  11. if module.affine:
  12. sync_bn.weight = module.weight.clone().detach()
  13. sync_bn.bias = module.bias.clone().detach()
  14. return sync_bn
  15. else:
  16. for name, child_module in module.named_children():
  17. setattr( module, name) = convert_syncbn_model(child_module, process_group=process_group))
  18. return module

类似BN滑动平均

如果要实现类似BN滑动平均的操作,在forward函数中要使用原地(inplace)操作给滑动平均赋值。


 
   
   
   
   
  1. class BN(torch.nn.Module)
  2. def __init__(self):
  3. ...
  4. self.register_buffer( 'running_mean', torch.zeros(num_features))
  5. def forward(self, X):
  6. ...
  7. self.running_mean += momentum * (current - self.running_mean)

计算模型整体参数量

num_parameters = sum(torch.numel(parameter) for parameter in model.parameters())
 
   
   
   
   

类似Keras的model.summary()输出模型信息

sksq96/pytorch-summary​github.com图标

模型权值初始化

注意model.modules()和model.children()的区别:model.modules()会迭代地遍历模型的所有子层,而model.children()只会遍历模型下的一层。


 
   
   
   
   
  1. # Common practise for initialization.
  2. for layer in model.modules():
  3. if isinstance(layer, torch.nn.Conv2d):
  4. torch.nn.init.kaiming_normal_(layer.weight, mode= 'fan_out',
  5. nonlinearity= 'relu')
  6. if layer.bias is not None:
  7. torch.nn.init.constant_(layer.bias, val= 0.0)
  8. elif isinstance(layer, torch.nn.BatchNorm2d):
  9. torch.nn.init.constant_(layer.weight, val= 1.0)
  10. torch.nn.init.constant_(layer.bias, val= 0.0)
  11. elif isinstance(layer, torch.nn.Linear):
  12. torch.nn.init.xavier_normal_(layer.weight)
  13. if layer.bias is not None:
  14. torch.nn.init.constant_(layer.bias, val= 0.0)
  15. # Initialization with given tensor.
  16. layer.weight = torch.nn.Parameter(tensor)

部分层使用预训练模型

注意如果保存的模型是torch.nn.DataParallel,则当前的模型也需要是torch.nn.DataParallel。torch.nn.DataParallel(model).module == model。

model.load_state_dict(torch.load('model,pth'), strict=False)
 
   
   
   
   

将在GPU保存的模型加载到CPU

model.load_state_dict(torch.load('model,pth', map_location='cpu'))
 
   
   
   
   

4. 数据准备、特征提取与微调

图像分块打散(image shuffle)/区域混淆机制(region confusion mechanism,RCM)[2]


 
   
   
   
   
  1. # X is torch.Tensor of size N*D*H*W.
  2. # Shuffle rows
  3. Q = (torch.unsqueeze(torch.arange(num_blocks), dim= 1) * torch.ones( 1, num_blocks). long()
  4. + torch.randint(low=-neighbour, high=neighbour, size=(num_blocks, num_blocks)))
  5. Q = torch.argsort(Q, dim= 0)
  6. assert Q.size() == (num_blocks, num_blocks)
  7. X = [torch.chunk(row, chunks=num_blocks, dim= 2)
  8. for row in torch.chunk(X, chunks=num_blocks, dim= 1)]
  9. X = [[X[Q[i, j].item()][j] for j in range(num_blocks)]
  10. for i in range(num_blocks)]
  11. # Shulle columns.
  12. Q = (torch.ones(num_blocks, 1). long() * torch.unsqueeze(torch.arange(num_blocks), dim= 0)
  13. + torch.randint(low=-neighbour, high=neighbour, size=(num_blocks, num_blocks)))
  14. Q = torch.argsort(Q, dim= 1)
  15. assert Q.size() == (num_blocks, num_blocks)
  16. X = [[X[i][Q[i, j].item()] for j in range(num_blocks)]
  17. for i in range(num_blocks)]
  18. Y = torch.cat([torch.cat(row, dim= 2) for row in X], dim= 1)

得到视频数据基本信息


 
   
   
   
   
  1. import cv2
  2. video = cv2.VideoCapture(mp4_path)
  3. height = int(video. get(cv2.CAP_PROP_FRAME_HEIGHT))
  4. width = int(video. get(cv2.CAP_PROP_FRAME_WIDTH))
  5. num_frames = int(video. get(cv2.CAP_PROP_FRAME_COUNT))
  6. fps = int(video. get(cv2.CAP_PROP_FPS))
  7. video.release()

TSN每段(segment)采样一帧视频[3]


 
   
   
   
   
  1. K = self._num_segments
  2. if is_train:
  3. if num_frames > K:
  4. # Random index for each segment.
  5. frame_indices = torch.randint(
  6. high=num_frames // K, size=(K,), dtype=torch.long)
  7. frame_indices += num_frames // K * torch.arange(K)
  8. else:
  9. frame_indices = torch.randint(
  10. high=num_frames, size=( K - num_frames,), dtype=torch.long)
  11. frame_indices = torch. sort(torch.cat((
  12. torch.arange(num_frames), frame_indices)))[ 0]
  13. else:
  14. if num_frames > K:
  15. # Middle index for each segment.
  16. frame_indices = num_frames / K // 2
  17. frame_indices += num_frames // K * torch.arange(K)
  18. else:
  19. frame_indices = torch. sort(torch.cat((
  20. torch.arange(num_frames), torch.arange( K - num_frames))))[ 0]
  21. assert frame_indices.size() == ( K,)
  22. return [frame_indices[i] for i in range( K)]

提取ImageNet预训练模型某层的卷积特征


 
   
   
   
   
  1. # VGG-16 relu5-3 feature.
  2. model = torchvision.models.vgg16(pretrained= True).features[: -1]
  3. # VGG-16 pool5 feature.
  4. model = torchvision.models.vgg16(pretrained= True).features
  5. # VGG-16 fc7 feature.
  6. model = torchvision.models.vgg16(pretrained= True)
  7. model.classifier = torch.nn.Sequential(* list(model.classifier.children())[: -3])
  8. # ResNet GAP feature.
  9. model = torchvision.models.resnet18(pretrained= True)
  10. model = torch.nn.Sequential(collections.OrderedDict(
  11. list(model.named_children())[: -1]))
  12. with torch.no_grad():
  13. model. eval()
  14. conv_representation = model(image)

提取ImageNet预训练模型多层的卷积特征


 
   
   
   
   
  1. class FeatureExtractor(torch.nn.Module):
  2. """Helper class to extract several convolution features from the given
  3. pre-trained model.
  4. Attributes:
  5. _model, torch.nn.Module.
  6. _layers_to_extract, list or set
  7. Example:
  8. >>> model = torchvision.models.resnet152(pretrained=True)
  9. >>> model = torch.nn.Sequential(collections.OrderedDict(
  10. list(model.named_children())[:-1]))
  11. >>> conv_representation = FeatureExtractor(
  12. pretrained_model=model,
  13. layers_to_extract={'layer1', 'layer2', 'layer3', 'layer4'})(image)
  14. """
  15. def __init__(self, pretrained_model, layers_to_extract):
  16. torch.nn.Module.__init__(self)
  17. self._model = pretrained_model
  18. self._model.eval()
  19. self._layers_to_extract = set(layers_to_extract)
  20. def forward(self, x):
  21. with torch.no_grad():
  22. conv_representation = []
  23. for name, layer in self._model.named_children():
  24. x = layer(x)
  25. if name in self._layers_to_extract:
  26. conv_representation.append(x)
  27. return conv_representation

其他预训练模型

Cadene/pretrained-models.pytorch​github.com图标

微调全连接层


 
   
   
   
   
  1. model = torchvision.models.resnet18(pretrained= True)
  2. for param in model.parameters():
  3. param.requires_grad = False
  4. model.fc = nn.Linear( 512, 100) # Replace the last fc layer
  5. optimizer = torch.optim.SGD(model.fc.parameters(), lr= 1e-2, momentum= 0.9, weight_decay= 1e-4)

以较大学习率微调全连接层,较小学习率微调卷积层


 
   
   
   
   
  1. model = torchvision.models.resnet18(pretrained= True)
  2. finetuned_parameters = list(map(id, model.fc.parameters()))
  3. conv_parameters = (p for p in model.parameters() if id(p) not in finetuned_parameters)
  4. parameters = [{ 'params': conv_parameters, 'lr': 1e-3},
  5. { 'params': model.fc.parameters()}]
  6. optimizer = torch.optim.SGD(parameters, lr= 1e-2, momentum= 0.9, weight_decay= 1e-4)

5. 模型训练

常用训练和验证数据预处理

其中ToTensor操作会将PIL.Image或形状为H×W×D,数值范围为[0, 255]的np.ndarray转换为形状为D×H×W,数值范围为[0.0, 1.0]的torch.Tensor。


 
   
   
   
   
  1. train_transform = torchvision.transforms.Compose([
  2. torchvision.transforms.RandomResizedCrop(size= 224,
  3. scale=( 0.08, 1.0)),
  4. torchvision.transforms.RandomHorizontalFlip(),
  5. torchvision.transforms.ToTensor(),
  6. torchvision.transforms.Normalize(mean=( 0.485, 0.456, 0.406),
  7. std=( 0.229, 0.224, 0.225)),
  8. ])
  9. val_transform = torchvision.transforms.Compose([
  10. torchvision.transforms.Resize( 256),
  11. torchvision.transforms.CenterCrop( 224),
  12. torchvision.transforms.ToTensor(),
  13. torchvision.transforms.Normalize(mean=( 0.485, 0.456, 0.406),
  14. std=( 0.229, 0.224, 0.225)),
  15. ])

训练基本代码框架


 
   
   
   
   
  1. for t in epoch(80):
  2. for images, labels in tqdm.tqdm(train_loader, desc='Epoch %3d' % (t + 1)):
  3. images, labels = images.cuda(), labels.cuda()
  4. scores = model(images)
  5. loss = loss_function(scores, labels)
  6. optimizer.zero_grad()
  7. loss.backward()
  8. optimizer.step()

标记平滑(label smoothing)[4]


 
   
   
   
   
  1. for images, labels in train_loader:
  2. images, labels = images.cuda(), labels.cuda()
  3. N = labels.size( 0)
  4. # C is the number of classes.
  5. smoothed_labels = torch.full(size=(N, C), fill_value= 0.1 / (C - 1)).cuda()
  6. smoothed_labels.scatter_( dim= 1, index=torch.unsqueeze(labels, dim= 1), value= 0.9)
  7. score = model(images)
  8. log_prob = torch.nn.functional.log_softmax(score, dim= 1)
  9. loss = -torch.sum(log_prob * smoothed_labels) / N
  10. optimizer.zero_grad()
  11. loss.backward()
  12. optimizer. step()

Mixup[5]


 
   
   
   
   
  1. beta_distribution = torch.distributions.beta.Beta(alpha, alpha)
  2. for images, labels in train_loader:
  3. images, labels = images.cuda(), labels.cuda()
  4. # Mixup images.
  5. lambda _ = beta_distribution.sample([]).item()
  6. index = torch.randperm(images.size( 0)).cuda()
  7. mixed_images = lambda _ * images + ( 1 - lambda _) * images[ index, :]
  8. # Mixup loss.
  9. scores = model(mixed_images)
  10. loss = (lambda _ * loss_function(scores, labels)
  11. + ( 1 - lambda _) * loss_function(scores, labels[ index]))
  12. optimizer.zero_grad()
  13. loss.backward()
  14. optimizer.step()

L1正则化


 
   
   
   
   
  1. l1_regularization = torch.nn.L1Loss(reduction= 'sum')
  2. loss = ... # Standard cross-entropy loss
  3. for param in model.parameters():
  4. loss += lambda_ * torch.sum(torch.abs(param))
  5. loss.backward()

不对偏置项进行L2正则化/权值衰减(weight decay)


 
   
   
   
   
  1. bias_list = (param for name, param in model.named_parameters() if name[- 4:] == 'bias')
  2. others_list = (param for name, param in model.named_parameters() if name[- 4:] != 'bias')
  3. parameters = [ {'parameters': bias_list, 'weight_decay': 0},
  4. {'parameters': others_list}]
  5. optimizer = torch.optim.SGD(parameters, lr= 1e- 2, momentum= 0.9, weight_decay= 1e- 4)

梯度裁剪(gradient clipping)

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=20)
 
   
   
   
   

计算Softmax输出的准确率


 
   
   
   
   
  1. score = model(images)
  2. prediction = torch.argmax(score, dim= 1)
  3. num_correct = torch.sum(prediction == labels).item()
  4. accuruacy = num_correct / labels.size( 0)

可视化模型前馈的计算图

szagoruyko/pytorchviz​github.com图标

可视化学习曲线

有Facebook自己开发的Visdom和Tensorboard(仍处于实验阶段)两个选择。

facebookresearch/visdom​github.com图标torch.utils.tensorboard - PyTorch master documentation​pytorch.org

 


 
   
   
   
   
  1. # Example using Visdom.
  2. vis = visdom.Visdom(env= 'Learning curve', use_incoming_socket= False)
  3. assert self._visdom.check_connection()
  4. self._visdom.close()
  5. options = collections.namedtuple( 'Options', [ 'loss', 'acc', 'lr'])(
  6. loss={ 'xlabel': 'Epoch', 'ylabel': 'Loss', 'showlegend': True},
  7. acc={ 'xlabel': 'Epoch', 'ylabel': 'Accuracy', 'showlegend': True},
  8. lr={ 'xlabel': 'Epoch', 'ylabel': 'Learning rate', 'showlegend': True})
  9. for t in epoch( 80):
  10. tran(...)
  11. val(...)
  12. vis.line(X=torch.Tensor([t + 1]), Y=torch.Tensor([train_loss]),
  13. name= 'train', win= 'Loss', update= 'append', opts=options.loss)
  14. vis.line(X=torch.Tensor([t + 1]), Y=torch.Tensor([val_loss]),
  15. name= 'val', win= 'Loss', update= 'append', opts=options.loss)
  16. vis.line(X=torch.Tensor([t + 1]), Y=torch.Tensor([train_acc]),
  17. name= 'train', win= 'Accuracy', update= 'append', opts=options.acc)
  18. vis.line(X=torch.Tensor([t + 1]), Y=torch.Tensor([val_acc]),
  19. name= 'val', win= 'Accuracy', update= 'append', opts=options.acc)
  20. vis.line(X=torch.Tensor([t + 1]), Y=torch.Tensor([lr]),
  21. win= 'Learning rate', update= 'append', opts=options.lr)

得到当前学习率


 
   
   
   
   
  1. # If there is one global learning rate (which is the common case).
  2. lr = next(iter(optimizer.param_groups))[ 'lr']
  3. # If there are multiple learning rates for different layers.
  4. all_lr = []
  5. for param_group in optimizer.param_groups:
  6. all_lr.append(param_group[ 'lr'])

学习率衰减


 
   
   
   
   
  1. # Reduce learning rate when validation accuarcy plateau.
  2. scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode= 'max', patience= 5, verbose=True)
  3. for t in range(0, 80):
  4. train( ...); val(...)
  5. scheduler.step(val_acc)
  6. # Cosine annealing learning rate.
  7. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max= 80)
  8. # Reduce learning rate by 10 at given epochs.
  9. scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[ 50, 70], gamma= 0.1)
  10. for t in range(0, 80):
  11. scheduler.step()
  12. train(...); val(...)
  13. # Learning rate warmup by 10 epochs.
  14. scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda t: t / 10)
  15. for t in range(0, 10):
  16. scheduler.step()
  17. train(...); val(...)

保存与加载断点

注意为了能够恢复训练,我们需要同时保存模型和优化器的状态,以及当前的训练轮数。


 
   
   
   
   
  1. # Save checkpoint.
  2. is_best = current_acc > best_acc
  3. best_acc = max(best_acc, current_acc)
  4. checkpoint = {
  5. 'best_acc': best_acc,
  6. 'epoch': t + 1,
  7. 'model': model.state_dict(),
  8. 'optimizer': optimizer.state_dict(),
  9. }
  10. model_path = os.path.join( 'model', 'checkpoint.pth.tar')
  11. torch.save(checkpoint, model_path)
  12. if is_best:
  13. shutil.copy( 'checkpoint.pth.tar', model_path)
  14. # Load checkpoint.
  15. if resume:
  16. model_path = os.path.join( 'model', 'checkpoint.pth.tar')
  17. assert os.path.isfile(model_path)
  18. checkpoint = torch.load(model_path)
  19. best_acc = checkpoint[ 'best_acc']
  20. start_epoch = checkpoint[ 'epoch']
  21. model.load_state_dict(checkpoint[ 'model'])
  22. optimizer.load_state_dict(checkpoint[ 'optimizer'])
  23. print( 'Load checkpoint at epoch %d.' % start_epoch)

计算准确率、查准率(precision)、查全率(recall)


 
   
   
   
   
  1. # data[ 'label'] and data[ 'prediction'] are groundtruth label and prediction
  2. # for each image, respectively.
  3. accuracy = np.mean( data[ 'label'] == data[ 'prediction']) * 100
  4. # Compute recision and recall for each class.
  5. for c in range(len(num_classes)):
  6. tp = np.dot(( data[ 'label'] == c).astype(int),
  7. ( data[ 'prediction'] == c).astype(int))
  8. tp_fp = np.sum( data[ 'prediction'] == c)
  9. tp_fn = np.sum( data[ 'label'] == c)
  10. precision = tp / tp_fp * 100
  11. recall = tp / tp_fn * 100

6. 模型测试

计算每个类别的查准率(precision)、查全率(recall)、F1和总体指标


 
   
   
   
   
  1. import sklearn.metrics
  2. all_label = []
  3. all_prediction = []
  4. for images, labels in tqdm.tqdm(data_loader):
  5. # Data.
  6. images, labels = images.cuda(), labels.cuda()
  7. # Forward pass.
  8. score = model(images)
  9. # Save label and predictions.
  10. prediction = torch.argmax(score, dim= 1)
  11. all_label.append(labels.cpu().numpy())
  12. all_prediction.append(prediction.cpu().numpy())
  13. # Compute RP and confusion matrix.
  14. all_label = np.concatenate(all_label)
  15. assert len(all_label.shape) == 1
  16. all_prediction = np.concatenate(all_prediction)
  17. assert all_label.shape == all_prediction.shape
  18. micro_p, micro_r, micro_f1, _ = sklearn.metrics.precision_recall_fscore_support(
  19. all_label, all_prediction, average= 'micro', labels=range(num_classes))
  20. class_p, class_r, class_f1, class_occurence = sklearn.metrics.precision_recall_fscore_support(
  21. all_label, all_prediction, average= None, labels=range(num_classes))
  22. # Ci,j = #{y=i and hat_y=j}
  23. confusion_mat = sklearn.metrics.confusion_matrix(
  24. all_label, all_prediction, labels=range(num_classes))
  25. assert confusion_mat.shape == (num_classes, num_classes)

将各类结果写入电子表格


 
   
   
   
   
  1. import csv
  2. # Write results onto disk.
  3. with open(os.path.join(path, filename), 'wt', encoding= 'utf-8') as f:
  4. f = csv.writer(f)
  5. f.writerow([ 'Class', 'Label', '# occurence', 'Precision', 'Recall', 'F1',
  6. 'Confused class 1', 'Confused class 2', 'Confused class 3',
  7. 'Confused 4', 'Confused class 5'])
  8. for c in range(num_classes):
  9. index = np.argsort(confusion_mat[:, c])[::- 1][: 5]
  10. f.writerow([
  11. label2class[c], c, class_occurence[c], '%4.3f' % class_p[c],
  12. '%4.3f' % class_r[c], '%4.3f' % class_f1[c],
  13. '%s:%d' % (label2class[ index[ 0]], confusion_mat[ index[ 0], c]),
  14. '%s:%d' % (label2class[ index[ 1]], confusion_mat[ index[ 1], c]),
  15. '%s:%d' % (label2class[ index[ 2]], confusion_mat[ index[ 2], c]),
  16. '%s:%d' % (label2class[ index[ 3]], confusion_mat[ index[ 3], c]),
  17. '%s:%d' % (label2class[ index[ 4]], confusion_mat[ index[ 4], c])])
  18. f.writerow([ 'All', '', np.sum(class_occurence), micro_p, micro_r, micro_f1,
  19. '', '', '', '', ''])

7. PyTorch其他注意事项

模型定义

  • 建议有参数的层和汇合(pooling)层使用torch.nn模块定义,激活函数直接使用torch.nn.functional。torch.nn模块和torch.nn.functional的区别在于,torch.nn模块在计算时底层调用了torch.nn.functional,但torch.nn模块包括该层参数,还可以应对训练和测试两种网络状态。使用torch.nn.functional时要注意网络状态,如

 
   
   
   
   
  1. def forward(self, x):
  2. ...
  3. x = torch.nn.functional.dropout(x, p= 0. 5, training= self.training)
  • model(x)前用model.train()和model.eval()切换网络状态。
  • 不需要计算梯度的代码块用with torch.no_grad()包含起来。model.eval()和torch.no_grad()的区别在于,model.eval()是将网络切换为测试状态,例如BN和随机失活(dropout)在训练和测试阶段使用不同的计算方法。torch.no_grad()是关闭PyTorch张量的自动求导机制,以减少存储使用和加速计算,得到的结果无法进行loss.backward()。
  • torch.nn.CrossEntropyLoss的输入不需要经过Softmax。torch.nn.CrossEntropyLoss等价于torch.nn.functional.log_softmax + torch.nn.NLLLoss。
  • loss.backward()前用optimizer.zero_grad()清除累积梯度。optimizer.zero_grad()和model.zero_grad()效果一样。

PyTorch性能与调试

  • torch.utils.data.DataLoader中尽量设置pin_memory=True,对特别小的数据集如MNIST设置pin_memory=False反而更快一些。num_workers的设置需要在实验中找到最快的取值。
  • 用del及时删除不用的中间变量,节约GPU存储。
  • 使用inplace操作可节约GPU存储,如
x = torch.nn.functional.relu(x, inplace=True)
 
   
   
   
   

此外,还可以通过torch.utils.checkpoint前向传播时只保留一部分中间结果来节约GPU存储使用,在反向传播时需要的内容从最近中间结果中计算得到。

  • 减少CPU和GPU之间的数据传输。例如如果你想知道一个epoch中每个mini-batch的loss和准确率,先将它们累积在GPU中等一个epoch结束之后一起传输回CPU会比每个mini-batch都进行一次GPU到CPU的传输更快。
  • 使用半精度浮点数half()会有一定的速度提升,具体效率依赖于GPU型号。需要小心数值精度过低带来的稳定性问题。
  • 时常使用assert tensor.size() == (N, D, H, W)作为调试手段,确保张量维度和你设想中一致。
  • 除了标记y外,尽量少使用一维张量,使用n*1的二维张量代替,可以避免一些意想不到的一维张量计算结果。
  • 统计代码各部分耗时

 
   
   
   
   
  1. with torch.autograd.profiler.profile(enabled= True, use_cuda= False) as profile:
  2. ...
  3. print(profile)

或者在命令行运行

python -m torch.utils.bottleneck main.py
 
   
   
   
   

参考资料

  • PyTorch官方代码:pytorch/examples
  • PyTorch论坛:PyTorch Forums
  • PyTorch文档:http://pytorch.org/docs/stable/index.html
  • 其他基于PyTorch的公开实现代码,无法一一列举

参考

  1. ^T.-Y. Lin, A. RoyChowdhury, and S. Maji. Bilinear CNN models for fine-grained visual recognition. In ICCV, 2015.
  2. ^Y. Chen, Y. Bai, W. Zhang, and T. Mei. Destruction and construction learning for fine-grained image recognition. In CVPR, 2019.
  3. ^L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. V. Gool. Temporal segment networks: Towards good practices for deep action recognition. In ECCV, 2016.
  4. ^C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna: Rethinking the Inception architecture for computer vision. In CVPR, 2016.
  5. ^H. Zhang, M. Cissé, Y. N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. In ICLR, 2018.

你可能感兴趣的:(pytorch,pytorch)