Pytorch笔记

Pytorch Notebook

由于使用emacs-org编辑,为方便暂且使用了英文

Table of Contents

  1. tensor
    1. create
    2. cloning
    3. operation
      1. in-place operations
      2. transpose (permute)
      3. about size and indexing
      4. add
    4. with numpy
    5. cuda
  2. autograd
    1. track and gradient computing
    2. function
    3. backward()
    4. torch.no_grad()
  3. neural network
    1. structured construction
      1. layers (no order)
      2. forward propagate structure (ordered)
    2. sequential construction
  4. data load
    1. torchvision
  5. optimizer
  6. train
    1. gpu support
    2. loss function
    3. train
    4. about step()s
      1. optimizer.step(self, closure = None)
      2. schedular.step()
  7. model I/O
    1. method 1 (recommended)
    2. method 2
  8. evaluate
  9. models
    1. attributes
    2. pretrained models
      1. torchvision.models
  10. sundry
  11. problem shooting
  • pytorch is deeplearning’s numpy
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.utils.data as Data
    import torch.optim as optim
    import numpy as np
    

tensor

create

  • uninitialized tensor:
    x = torch.empty(5, 3)

  • random tensor:
    x = torch.rand(5, 3)

  • zeros:
    x = torch.zeros(5, 3)

  • define dtype:
    x = torch.zeros(5, 3, dtype = torch.long)

  • from known data:
    x = torch.tensor([5.5, 3])

cloning

tryna reuse existing tensor’s properties.

  • new_* methods:

    x = x.new_ones(5, 3, dtype = torch.double)# 64-bit
    
  • copy the size:

    x = torch.randn_like(x, dtype = torch.float)# 32-bit*
    

operation

in-place operations

write ‘_’ behind.

ex. y.add_(x) -> +=

x.t_() -> directly transpose x

transpose (permute)

x = x.permute(1, 2, 0)

about size and indexing

  1. get size: x.size(axes)

  2. resize:

     x = torch.randn(4, 4)
     y = x.view(16)
     z = x.view(-1, 8)
     # '-1's size will be inferred from other dims
    
     # use .item() to get a scalar to python number
     x = torch.randn(1)
     num = x.item()
     ```
    
    

add

 # simply
 x + y
 torch.add(x, y)
 
 # introduce the result
 torch.add(x, y, out = result)
 
 # in-place (+=)
 y.add_(x)

with numpy

numpy-form and torch-form
share the same memory location,
change together.

  • torch.from_numpy(npdata)
  • torchdata.numpy()
    npdata = np.arange(6).reshape(2, 3)
    np2torch = torch.from_numpy(npdata)
    '''
    tensor([[0, 1, 2],
            [3, 4, 5]], dtype=torch.int32)
    '''
    torch2np = np2torch.numpy()
    

cuda

 if torch.cuda.is_available():
     device = torch.device('cuda')
     # directly create on GPU
     y = torch.ones_like(x, device = device)
     # copy to GPU
     x = x.to(device)
     # or x.to('cuda')
 
     z = x + y
     # tensor([0.1034], device='cuda:0')
     z.to('cpu')

autograd

track and gradient computing

  • set sometensor.requires_grad True,
    to keep track of all the computations.
    (enable training)

  • call .backward() to compute all gradients.

  • gradient accumulate to .grad attribute.

  • stop tracking: .detach().

  • prevent tracking: use code block
    with torch.no_grad():.

function

for operation-created tensor,
tensor.grad_fn refer to a function that has
created the tensor.

for user-defined tensor, .grad_fn is None.

backward()

for non-scalar, specify a gradient that is a tensor
of matching shape.

torch.no_grad()

use with torch.no_grad(): when testing the model.

neural network

the typical learning precedure:

  • define the network, define the learnable params.
  • iterate over a dataset of inputs.
  • process the input through the network.
  • compute the loss.
  • back-propagate.
  • update the params.
    (weight = weight - learningrate * gradient)

structured construction

layers (no order)

import torch.nn as nn

define in net_class’s __init__()

 class LeNet(nn.Module):
     def __init__(self):
         super(Net, self).__init__()
         self.conv1 = nn.Conv2d(3, 6, 5)
         self.pool = nn.MaxPool2d(2, 2)
         self.conv2 = nn.Conv2d(6, 16, 5)
         self.fc1 = nn.Linear(16 * 5 * 5, 120)
         self.fc2 = nn.Linear(120, 84)
         self.fc3 = nn.Linear(84, 10)

forward propagate structure (ordered)

import torch.nn.functional as F

define in net_class’s forward()

 class LeNet(nn.Module):
     def __init_(self):# layers
     def forward(self, x):
         x = self.conv1(x)
         x = F.relu(x)
         x = self.pool(x)
 
         # write simply with nested structure
         x = self.pool(F.relu(self.conv2(x)))
 
         x = x.view(-1, 16 * 5 * 5)
         # single output/input is/should be row vector
         # -1 is for batchsize
 
         x = F.relu(self.fc1(x))
         x = F.relu(self.fc2(x))
         x = self.fc3(x)
         return x

sequential construction

 net = nn.Sequential(
     nn.Linear(2, 10),
     nn.ReLU(),# btw, this ReLU is a class
     nn.Linear(10, 2)
 )

data load

transforms.ToTensor <-> transforms.ToPILImage()

 import torch.utils.data as Data
 mydataset = Data.TensorDataset(data_tensor = x, target_tensor = y)
 mydataloader = Data.DataLoader(
     dataset = mydataset,
     batch_size = BATCH_SIZE,
     shuffle = True,
     num_workers = 2
 )

torchvision

 transform = transforms.Compose(
     [transforms.ToTensor(),
      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
 
 trainset = torchvision.datasets.CIFAR10(root = './data', train = True,
                                         download = True, transform = transform)
 trainloader = torch.utils.data.DataLoader(trainset, batch_size = BATCH_SIZE,
                                           shuffle = True, num_workers = 0)
 
 transforms.RandomResizedCrop((height, width))

optimizer

 import torch.optim as optim
 optimizer = optim.SGD(net.parameters(), lr = 0.001, momentum = 0.9)

train

gpu support

 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 print(device)
 net = Net()
 net.to(device)
 '''...'''
 for epoch in range(epochs):
     for i, data in enumerate(trainloader, 0):
         inputs, labels = data[0].to(device), data[1].to(device)

loss function

 import torch.nn as nn
 criterion = nn.CrossEntropyLoss()

train

 for epoch in range(2):
     trainingloss = 0.0
     for i, data in enumerate(trainloader, 0):
         # for gpu support
         inputs, labels = data[0].to(device), data[1].to(device)
         # clear the gradient buffer
         optimizer.zero_grad()
         # forward
         outputs = net(inputs)
         # loss computing
         loss = criterion(outputs, labels)
         # back propagate
         loss.backward()
         # update weights
         optimizer.step()

about step()s

optimizer.step(self, closure = None)

usually used every mini-batch to update the weights.

  • closure (callable, optional): A closure that reevaluates the model
    proceed back-propagation, and returns the loss.

if closure isn’t passed, a backward() should be
proceeded before optimizer.step().

schedular.step()

usually used every epoch to adjust learning rate.

model I/O

method 1 (recommended)

only save the weights, not structure.

needa reconstruct the net when evaluating.

 PATH = './example-model.pth'
 # save
 torch.save(net.state_dict(), PATH)
 # load
 net = Net()# reconstruct the network
 net.load_state_dict(torch.load(PATH))

method 2

save all, but unstable for refactor or transfer usage.

 PATH = './example-model.pth'
 # save
 torch.save(net, PATH)
 # load
 net = torch.load(PATH)

evaluate

 class Net(nn.Module):# copy the structure
 net = Net()
 net.load_state_dict(torch.load(PATH))
 # evaluate
 class_correct = list(0. for i in range(10))# 10 classes example
 class_total = list(0. for i in range(10))
 with torch.no_grad():
     for data in testloader:
         images, labels = data
         outputs = net(images)
         _, predicted = torch.max(outputs, 1)
         c = (predicted == labels).squeeze()
 
         for i in range(4):
             label = labels[i]
             class_correct[label] += c[i].item()
             class_total[label] += 1

models

attributes

  • modules() -> all the working modules in a network.

pretrained models

torchvision.models

 import torchvision.models as models
 import torchvision.transforms as transforms
 vgg16 = models.vgg16(pretrained = True).eval()
 # all the models use the same normalization
 normalization = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                      std=[0.229, 0.224, 0.225])

sundry

  • normalization: (with mean and std, x -= mean /= std)

is for making the data centralized, thus making the
distribution normal, so as to bettern the classification
performance.

  • torch.max(input, dim,) -> (Tensor, LongTensor): - torch.max(a, 0) returns each column’s max value,
    then their index.

    • torch.max(a, 1) returns each row’s max value,
      then their columns.
  • torch.nn.functional.softmax(input, dim) -> (Tensor): - softmax(a, 0) change a into Tensor that have all column
    sum as 1.

    • softmax(a, 1) change a into Tensor that have all row
      sum as 1.
  • Tensor.squeeze(): squeeze the length 1 dimensions in the Tensor.

     t = torch.Tensor([[1], [2], [3]])
     t.squeeze()
     # tensor([1., 2., 3.])
    
  • torch.bmm(batch1, batch2, out = None) -> Tensor: batch-matmul, say batch1.size() = [2, 3, 4],
    and batch2.size() = [2, 4, 5],
    so the result’s size() would be [2, 3, 5].

  • torch.unsqueeze(input, dim, output = None) -> Tensor: returns a new tensor with a dimension of size one
    inserted at the specified position.

    the new Tensor shares the same underlying data with this Tensor.

    • positive dim: range from 0 to input.dim().
    • negative dim: counting backward.
  • prediction first, label second: when calling lossfunctions, we should pass predicted and label
    in order.

  • labels are LongTensor (64-bit) by default.

  • paddings

    • nn.ReflectionPad1d(padding) ~ nn.ReflectionPad3d(padding): use the reflection of the opposite boundary to pad.

      • padding is number: pad all directions for the same length.
      • or padding is (left_padding, right_padding).
    • nn.ReplicationPad1d(padding) ~ nn.ReplicationPad3d(padding): use the copy of the original boundary to pad.

    • nn.ConstantPad1d(padding, value) ~ nn.ConstantPad3d(padding, value): use the same value to pad all directions.

    • F.pad(input, pad, mode = 'constant', value = 0)

problem shooting

  • BrokenPipe Error: encountering this on windows when downloading dataset:
    set the num_workers to 0.

  • TypeError: ‘module’ object is not callable: - maybe it’s your capital letters’ problem.

    like datasets.MNIST shouldn’t be datasets.mnist.

  • Adding softmax layer to CIFAR10-lenet makes the training slower.

  • "trying to backward multiple times without ‘retained = True’": see if mse_loss’s parameters’ shape don’t match.

你可能感兴趣的:(Pytorch笔记)