深度学习笔记(四)—— 前馈神经网络的 PyTorch 实现

1 AUTOGRAD AUTOMATIC DIFFERENTIATION

Central to all neural networks in PyTorch is the autograd package. Let’s first briefly visit this, and we
will then go to training our first neural network.

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-
run framework, which means that your backprop is defined by how your code is run, and that every single
iteration can be different.

1.1 Create a grad tracked tensor

torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

define a function:
x = [[1,1],[1,1]]
y = x + 2
z = y ^ 2 * 3
out = z.mean()

import torch

# create a tensor with setting its .requires_grad as Ture
x = torch.ones(2, 2, requires_grad=True)
print(x)

x1 = torch.ones(2,2,requires_grad=False)
# x1.requires_grad_(True)
print(x1)
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[1., 1.],
        [1., 1.]])

1.2 Do a tensor operation

y = x + 2
print(y)

y1 = x1 + 2
print(y1)
tensor([[3., 3.],
        [3., 3.]], grad_fn=)
tensor([[3., 3.],
        [3., 3.]])

y was created as a result of an operation, so it has a grad_fn.But y1 not

print(y.grad_fn)
print(y1.grad_fn)

None

1.3 More operations on y

z = y * y * 3
z1 = y1 * y1 * 3
out = z.mean()   #calculate z average value
out1 = z1.mean()   #calculate z1 average value

print(z, out)
print(z1, out1)
tensor([[27., 27.],
        [27., 27.]], grad_fn=) tensor(27., grad_fn=)
tensor([[27., 27.],
        [27., 27.]]) tensor(27.)

.requires_grad_( ) changes an existing Tensor’s requires_grad flag in-place. The input flag defaults to False if not given.

Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of
computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor
(except for Tensors created by the user - their grad_fn is None).

a = torch.randn(2, 2)    # a is created by user, its .grad_fn is None
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)   # change the attribute .grad_fn of a
print(a.requires_grad)
b = (a * a).sum()        # add all elements of a  to b
print(b.grad_fn)
False
True

2 Gradients

2.1 Backprop

Because out contains a single scalar, out.backward( ) is equivalent to out.backward(torch.tensor(1.))

out.backward()
# out.backward(torch.tensor(1.))
# out1.backward()

you can get parameters gradient like below:

x_grad = x.grad
y_grad = y.grad
z_grad = z.grad
print(x_grad)
print(y_grad)
print(z_grad)
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])
None
None

2.2 Jacobian-vector product example

If you want to compute the derivatives, you can call .backward( ) on a Tensor. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to .backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

define a function:
x = [1, 1, 1]
y = x + [1, 2, 3]
z = y ^ 3

x = torch.ones(3, requires_grad=True)
y = x + torch.tensor([1., 2., 3.])
z = y * y * y
print(z)

v = torch.tensor([1, 0.1, 0.01])
# z is a vector, so you need to specify a gradient whose size is the same as z
z.backward(v)    
print(x.grad)
tensor([ 8., 27., 64.], grad_fn=)
tensor([12.0000,  2.7000,  0.4800])

2.3 Problem 1

What is the meaning of the in-argument in the .backward() method? Try different input and answer.

The passed argument is the value of the biased derivative.

Specifically, when obj.backward(val),

Note that normally val=torch.ones(), and obj.backward() is equivalent to obj.backward(torch.tensor(1.)).

3 NEURAL NETWORKS

A typical training procedure for a neural network is as follows:

  • Define the neural network that has some learnable parameters (or weights)
  • Iterate over a dataset of inputs
  • Process input through the network
  • Empty the parameters in optimizer
  • Compute the loss (how far is the output from being correct)
  • Propagate gradients back into the network’s parameters
  • Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

Let’s define a network to classify points of gaussian distribution to three classes.

3.1 Show all points

Show all points (containing trainset and testset) you will use.

# show all points, you can skip this cell
def show_original_points():
    label_csv = open('./labels/label.csv', 'r')
    label_writer = csv.reader(label_csv)
    class1_point = []
    class2_point = []
    class3_point = []
    for item in label_writer:
        if item[2] == '0':
            class1_point.append([item[0], item[1]])
        elif item[2] == '1':
            class2_point.append([item[0], item[1]])
        else:
            class3_point.append([item[0], item[1]])
    data1 = np.array(class1_point, dtype=float)
    data2 = np.array(class2_point, dtype=float)
    data3 = np.array(class3_point, dtype=float)
    x1, y1 = data1.T
    x2, y2 = data2.T
    x3, y3 = data3.T
    plt.figure()
    plt.scatter(x1, y1, c='b', marker='.')
    plt.scatter(x2, y2, c='r', marker='.')
    plt.scatter(x3, y3, c='g', marker='.')
    plt.axis()
    plt.title('scatter')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.show()

3.2 Define a network

When you define a network, your class must to inherit nn.Moudle, then you should to overload __init__ method and forward method

Network(
​ (hidden): Linear(in_features=2, out_features=5, bias=True)
​ (sigmiod): Sigmoid()
​ (predict): Linear(in_features=5, out_features=3, bias=True)
)

import numpy as np
import matplotlib.pyplot as plt
import torchvision
import torch
import pandas as pd
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import time
import csv
import numpy as np
class Network(nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        '''
        Args:
            n_feature(int): size of input tensor
            n_hidden(int): size of hidden layer 
            n_output(int): size of output tensor
        '''
        super(Network, self).__init__()
        # define a liner layer
        self.hidden = nn.Linear(n_feature, n_hidden)
        # define sigmoid activation 
        self.sigmoid = nn.Sigmoid()
        self.predict = nn.Linear(n_hidden, n_output)

    def forward(self, x):
        '''
        x(tensor): inputs of the network
        '''
        # hidden layer
        h1 = self.hidden(x)
        # activate function
        h2 = self.sigmoid(h1)
        # output layer
        out = self.predict(h2)
        '''
        Linear classifier often follows softmax to output probability,
        however the loss function CrossEntropy we used have done this 
        operation, so we don't use softmax function here.
        '''
        return out

CrossEntropy written in pytorch:
https://pytorch.org/docs/stable/nn.html?highlight=crossentropy#torch.nn.CrossEntropyLoss

3.3 Overload dataset

Please skip the below cell when you are trying to train a model.

class PointDataset(Dataset):
    def __init__(self, csv_file, transform=None):
        '''
        Args:
            csv_file(string): path of label file
            transform (callable, optional): Optional transform to be applied
                on a sample.
        '''
        self.frame = pd.read_csv(csv_file, encoding='utf-8', header=None)
        print('csv_file source ---->', csv_file)
        self.transform = transform

    def __len__(self):
        return len(self.frame)

    def __getitem__(self, idx):
        x = self.frame.iloc[idx, 0]
        y = self.frame.iloc[idx, 1]
        point = np.array([x, y])
        label = int(self.frame.iloc[idx, 2])
        if self.transform is not None:
            point = self.transform(point)
        sample = {'point': point, 'label': label}
        return sample

3.4 Train function

Train a model and show running_loss curve ana show accuracy curve.

def train(classifier_net, trainloader, testloader, device, lr, optimizer):
    '''
    Args:
        classifier_net(nn.model): train model
        trainloader(torch.utils.data.DateLoader): train loader
        testloader(torch.utils.data.DateLoader): test loader
        device(torch.device): the evironment your model training
        LR(float): learning rate
    '''
    # loss function
    criterion = nn.CrossEntropyLoss().to(device)
    
    optimizer = optimizer
    
    # save the mean value of loss in an epoch
    running_loss = []
    
    running_accuracy = []
    
    # count loss in an epoch 
    temp_loss = 0.0
    
    # count the iteration number in an epoch
    iteration = 0 

    for epoch in range(epoches):
        
        '''
        adjust learning rate when you are training the model
        '''
        # adjust learning rate
        # if epoch % 100 == 0 and epoch != 0:
        #     LR = LR * 0.1
        #     for param_group in optimizer.param_groups:
        #         param_group['lr'] = LR

        for i, data in enumerate(trainloader):
            point, label = data['point'], data['label']
            point, label = point.to(device).to(torch.float32), label.to(device)
            outputs = classifier_net(point)
            
            '''# TODO'''
            optimizer.zero_grad()
            loss = criterion(outputs, label) 
            loss.backward()
            optimizer.step()
            
            '''# TODO END'''
            
            # save loss in a list
            temp_loss += loss.item()
            iteration +=1
            # print loss value 
#             print('[{0:d},{1:5.0f}] loss {2:.5f}'.format(epoch + 1, i, loss.item()))
            #slow down speed of print function
            # time.sleep(0.5)
        running_loss.append(temp_loss / iteration)
        temp_loss = 0
        iteration = 0
        print('test {}:----------------------------------------------------------------'.format(epoch))
        
        # call test function and return accuracy
        running_accuracy.append(predict(classifier_net, testloader, device))
    
    # show loss curve
    show_running_loss(running_loss)
    
    # show accuracy curve
    show_accuracy(running_accuracy)
    
    return classifier_net

3.5 Test function

Test the performance of your model

# show accuracy curve, you can skip this cell.
def show_accuracy(running_accuracy):
    x = np.array([i for i in range(len(running_accuracy))])
    y = np.array(running_accuracy)
    plt.figure()
    plt.plot(x, y, c='b')
    plt.axis()
    plt.title('accuracy curve:')
    plt.xlabel('step')
    plt.ylabel('accuracy value')
    plt.show()
# show running loss curve, you can skip this cell.
def show_running_loss(running_loss):
    # generate x value
    x = np.array([i for i in range(len(running_loss))])
    # generate y value
    y = np.array(running_loss)
    # define a graph
    plt.figure()
    # generate curve
    plt.plot(x, y, c='b')
    # show axis
    plt.axis()
    # define title
    plt.title('loss curve:')
    #define the name of x axis
    plt.xlabel('step')
    plt.ylabel('loss value')
    # show graph
    plt.show()
def predict(classifier_net, testloader, device):
#     correct = [0 for i in range(3)]
#     total = [0 for i in range(3)]
    correct = 0
    total = 0
    
    with torch.no_grad():
        '''
        you can also stop autograd from tracking history on Tensors with .requires_grad=True 
        by wrapping the code block in with torch.no_grad():
        '''
        for data in testloader:
            point, label = data['point'], data['label']
            point, label = point.to(device).to(torch.float32), label.to(device)
            outputs = classifier_net(point)
            '''
            if you want to get probability of the model prediction,
            you can use softmax function here to transform outputs to probability.
            '''
            # transform the prediction to one-hot form
            _, predicted = torch.max(outputs, 1)
            print('model prediction: ', predicted)
            print('ground truth:', label, '\n')
            correct += (predicted == label).sum()
            total += label.size(0)
            print('current correct is:', correct.item())
            print('current total is:', total)
            
        print('the accuracy of the model is {0:5f}'.format(correct.item()/total))
        
    return correct.item() / total

3.6 Main function

if __name__ == '__main__':
    '''
    change train epoches here
    '''
    # number of training
    epoches = 100
    
    '''
    change learning rate here
    '''
    # learning rate
    # 1e-4 = e^-4
    lr = 1e-3
    
    '''
    change batch size here
    '''
    # batch size
    batch_size = 16
    
    
    
    
    # define a transform to pretreat data
    transform = torch.tensor
    
    # define a gpu device
    device = torch.device('cpu')
    
    # define a trainset
    trainset = PointDataset('./labels/train.csv', transform=transform)
    
    # define a trainloader
    trainloader = DataLoader(dataset=trainset, batch_size=batch_size, shuffle=True)
    
    # define a testset
    testset = PointDataset('./labels/test.csv', transform=transform)
    
    # define a testloader
    testloader = DataLoader(dataset=testset, batch_size=batch_size, shuffle=False)
    
    show_original_points()

    # define a network
    classifier_net = Network(2, 5, 3).to(device)   
    
    '''
    change optimizer here
    '''    
    # define a optimizer
    optimizer = optim.SGD(classifier_net.parameters(), lr=lr, momentum=0.9)
    # optimizer = optim.Adam(classifier_net.parameters(), lr=lr)
    # optimizer = optim.Rprop(classifier_net.parameters(), lr=lr)
    # optimizer = optim.ASGD(classifier_net.parameters(), lr=lr)
    # optimizer = optim.Adamax(classifier_net.parameters(), lr=lr)
    # optimizer = optim.RMSprop(classifier_net.parameters(), lr=lr)
    
    # get trained model
    classifier_net = train(classifier_net, trainloader, testloader, device, lr, optimizer,)
csv_file source ----> ./labels/train.csv
csv_file source ----> ./labels/test.csv
image
image
image

3.7 Problem 2

Correct the order and fill in the # TODO in the train cell below with:

# update paraeters in optimizer(update weigtht)
optimizer.step()

# calcutate loss value
loss = criterion(outputs, label)

# empty parameters in optimizer
optimizer.zero_grad()

# back propagation
loss.backward()

Correct order is:

  1. optimizer.zero_grad()
  2. loss = criterion(outputs, label)
  3. loss.backward()
  4. optimizer.step()

3.8 Problem 3

Adjust learning rate and observe the loss and accuracy curves. Illuminate the influence and causes of the learning rate on the loss and accuracy value.

  • learning rate = 1e-1
    image
  • learning rate = 1e-2
    image
  • learning rate = 1e-3
    image
  • learning rate = 1e-4
    image
  • learning rate = 5 * 1e-5
    image
  • learning rate = 1e-5
    image

Influence: It shows that when the learning rate is relatively large, we can reach the convergence in a short time and the loss function fluctuates little, vice versa.

Causes: Because the predict accuracy can be effected by some extreme samples when applying large learning rate. Also, when the learning rate is so small, the neural network can barely learn.

Lessons: We need an appropriate learning rate to ensure the validity of the experiment.

3.9 Problem 4

Adjust batch_size, batch_size=1, batch_size=210, batch_size=1~210. Illuminate the influence and causes of the batch_size on the loss and accuracy value.

  • batch_size = 1
    image
  • batch_size = 16
    image
  • batch_size = 32
    image
  • batch_size = 120
    image
  • batch_size = 210
    image

Influence: It shows that when the batch_size is relatively small, the accuracy is relatively higher at the expense of longer computing time.

Causes: Smaller batch_size relates to more iterations, we can update the weights more times. But more iterations need more computing time.

Lessons: We need an appropriate batch_size to ensure the validity of the experiment. Sometimes we need a big batch_size to accelerate convergence. But too big batch_size may induce memory overflow and accuracy decrease.

Ref.

3.10 Problem 5

Use SGD optimizer, and try to adjust momentum from 0 to 0.9, illuminate the influence of the momentum on the loss and accuracy value.

  • momentum = 0
    image
  • momentum = 0.9
    image

Momentum indicates the degree to which the original direction of update should be preserved. When updating, the previous update direction is kept to a certain extent, and the final update direction is fine-tuned using the current batch gradient. In this way, the stability can be increased to a certain etent, so as to learn faster, and there is a certain ability to get rid of local optimality.

Influence: When it is relatively large, the update inertia is bigger. When momentum = 0, the accuracy is very low, maybe because the last batch is an exception. The result is of high accuracy when momentum = .9.

Ref.

3.11 Problem 6

Try to use different optimizer, such as Adam and Rprop, to conduct the experiment. Illuminate the influence of the optimizers on the loss and accuracy value.

  • Stochastic Gradient Descent
    image
  • Adam
    image
  • Rprop
    image

Influence: Three optimizers all converge to same accuracy, but Rprop is the fast among the three aparently, but it has fluctuate after converging. Adam and SGD, thougn converge slowly, their loss curves are smother and no fluctuate appear when converging. I view the reference below and try some other optimizers as well.

  • ASGD
    image
  • Adamax
    image
  • RMSprop
    image

Ref.

3.12 Problem 7

Try adjusting the above parameters at the same time, find out what you think is the most suitable parameter (the model converges the fastest), and talk about it.

After many tries, we finally choose Adam optimizer with .01 learning rate, and batches have size of 16. High learning rate can make weights converge fast, but this also increase the risk of disturbance of exceptions. Having too much batches add burden to memory; too many iterations increase computing time. So we choose a relatively small batch_size = 16 as a trade-off. Finally, we choose Adam optimizer, considering its converging speed and its stability.

image

4 DATA LOADING AND PROCESSING TUTORIAL

(Further content, read it when you are free)

A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. In this tutorial, we will see how to load and preprocess/augment data from a non trivial dataset.

4.1 Packages installation

  • scikit-image: For image io and transforms

  • sudo apt-get install python-numpy

  • sudo apt-get install python-scipy

  • sudo apt-get install python-matplotlib

  • sudo pip install scikit-image

  • pandas: For easier csv parsing

  • sudo apt-get install python-pandas

import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

plt.ion()   # interactive mode

4.2 Annotations in array

# read a csv file by pandas
landmarks_frame = pd.read_csv('data/faces/face_landmarks.csv')

n = 0
# read image name, image name was saved in column 1.
img_name = landmarks_frame.iloc[n, 0]
# points were saved in columns from 2 to the end
landmarks = landmarks_frame.iloc[n, 1:].values
# reshape the formate of points
landmarks = landmarks.astype('float').reshape(-1, 2)

print('Image name: {}'.format(img_name))
print('Landmarks shape: {}'.format(landmarks.shape))
print('First 4 Landmarks: {}'.format(landmarks[:4]))
Image name: 0805personali01.jpg
Landmarks shape: (68, 2)
First 4 Landmarks: [[ 27.  83.]
 [ 27.  98.]
 [ 29. 113.]
 [ 33. 127.]]
def show_landmarks(image, landmarks):
    """Show image with landmarks"""
    plt.imshow(image)
    plt.scatter(landmarks[:, 0], landmarks[:, 1], s=10, marker='.', c='r')
    plt.pause(0.001)  # pause a bit so that plots are updated

plt.figure()
show_landmarks(io.imread(os.path.join('data/faces/', img_name)),
               landmarks)
plt.show()
image
class FaceLandmarksDataset(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        # combine the relative path of images 
        img_name = os.path.join(self.root_dir,
                                self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:].values
        landmarks = landmarks.astype('float').reshape(-1, 2)
        # save all data we may need during training a network in a dict
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

Important note:

To define a dataset, first we must to inherit the class torch.utils.data.Dataset. when we write ourselves dataset, it's neccesarry for us to overload the ___init____ method, ___len____ method, and ___getitem____ method. Of course you can define other method as you like.

4.3 Instantiation and Iteration

face_dataset = FaceLandmarksDataset(csv_file='data/faces/face_landmarks.csv',
                                    root_dir='data/faces/')

fig = plt.figure()

for i in range(len(face_dataset)):
    sample = face_dataset[i]

    print(i, sample['image'].shape, sample['landmarks'].shape)
    
    # create subgraph
    ax = plt.subplot(1, 4, i + 1)
    plt.tight_layout()
    ax.set_title('Sample #{}'.format(i))
    ax.axis('off')
    show_landmarks(**sample)

    if i == 3:
        plt.show()
        break
0 (324, 215, 3) (68, 2)
image
1 (500, 333, 3) (68, 2)
image
2 (250, 258, 3) (68, 2)
image
3 (434, 290, 3) (68, 2)
image

5 More materials

  1. Training a Classifier
  2. Save Model and Load Model
  3. Visualize your training phase
  4. Exploding and Vanishing Gradients
  5. Gradient disappearance and gradient explosion in neural network training
  6. tensorboardX

你可能感兴趣的:(深度学习笔记(四)—— 前馈神经网络的 PyTorch 实现)