Pytorch【60天修炼计划】之第一阶段——入门:softmax回归的实现

DAY 5


3.6 SOFTMAX回归的从零开始实现

import torch
import torchvision
import numpy as np

3.6.1 获取和读取数据

使用Fashion-MNIST数据集,设置batch大小为256

import torchvision.transforms as transforms
batch_size = 256
mnist_train = torchvision.datasets.FashionMNIST(root='./data', 
                                                train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='./data', 
                                               train=False, download=True, transform=transforms.ToTensor())
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, 
                                         shuffle=True, num_workers=4)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, 
                                        shuffle=False, num_workers=4)

3.6.2 初始化模型参数

跟线性回归中的例⼦⼀样,我们将使⽤向量表示每个样本。
已知每个样本输⼊是⾼和宽均为28像素的图像。模型的输⼊向量的⻓度是28 × 28 = 784:该向量的每个元素对应图像中每个像素。由于图像有 10个类别,单层神经⽹络输出层的输出个数为10,因此softmax回归的权重和偏差参数分别为784 × 101 × 10的矩阵。

num_inputs = 784
num_outputs = 10

W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype = torch.float)
b = torch.zeros(num_outputs, dtype = torch.float)

# 参数设置为梯度
W.requires_grad_(requires_grad = True)
b.requires_grad_(requires_grad = True)
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)

3.6.3 实现SOFTMAX运算

softmax的公式为
y k = e x p ( x k ) ∑ i = 1 m e x p ( x i ) , k = 1 , . . . , m y_k = \frac{exp(x_k)}{\sum_{i=1}^m{exp(x_i)}}, k = 1,...,m yk=i=1mexp(xi)exp(xk),k=1,...,m
其中 ∑ i = 1 m y i = 1 \sum_{i=1}^m{y_i} = 1 i=1myi=1

# softmax
def softmax(X):
    X_exp = X.exp()
    # dim = 0 对同一列的元素求和, dim = 1 对同一行元素求和
    partition = X_exp.sum(dim = 1, keepdim = True)
    return X_exp / partition # broadcast
    
# 当输入为随机数时,我们将每个元素都变成了非负数,且每一行的和为1
X = torch.rand((2, 5))
X_prob = softmax(X)
print(X_prob, X_prob.sum(dim= 1))
tensor([[0.2833, 0.1352, 0.2536, 0.1701, 0.1577],
        [0.2230, 0.1497, 0.2517, 0.1424, 0.2332]]) tensor([1.0000, 1.0000])

3.6.4 定义模型

def net(X):
    return softmax(torch.mm(X.view(-1, num_inputs), W) + b)

3.6.5 定义损失函数

softmax运算使用的是交叉熵损失函数:
H ( y ( i ) , y ^ ( i ) ) = − ∑ j = 1 q y j ( i ) l o g y ^ j ( i ) H(y^{(i)}, \hat{y}^{(i)}) = -\sum_{j=1}^qy_j^{(i)}log\hat{y}_j^{(i)} H(y(i),y^(i))=j=1qyj(i)logy^j(i)

若训练数据集的样本数为n,则交叉熵损失函数定义为
ζ ( Θ ) = 1 n ∑ i = 1 n H ( y ( i ) , y ^ ( i ) ) \zeta(\Theta) = \frac{1}{n}\sum_{i=1}^nH(y^{(i)}, \hat{y}^{(i)}) ζ(Θ)=n1i=1nH(y(i),y^(i))

为了得到标签的预测概率,我们可以使⽤ gather 函数

def cross_entropy(y_hat, y):
    return - torch.log(y_hat.gather(1, y.view(-1, 1)))

3.6.6 计算分类准确率

给定⼀个类别的预测概率分布 y_hat ,我们把预测概率最⼤的类别作为输出类别。如果它与真实类别 y ⼀致,说明这次预测是正确的。分类准确率即正确预测数量与总预测数量之⽐。

下⾯定义准确率 accuracy 函数。其中 y_hat.argmax(dim=1) 返回矩阵 y_hat 每 ⾏ 中 最 ⼤ 元 素 的 索 引 , 且 返 回 结 果 与 变 量 y 形 状 相 同 。 相 等 条 件 判 断式 (y_hat.argmax(dim=1) == y) 是⼀个类型为 ByteTensor 的 Tensor ,我们⽤ float() 将其转换为值为 0(相等为假)或 1(相等为真)的浮点型 Tensor 。

def arruracy(y_hat, y):
    return (y_hat.argmax(dim = 1) == y).float().mean().item()
# 定义一个评价net在数据集上的准确率
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim = 1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n

3.6.7 训练模型

使⽤⼩批量随机梯度下降来优化模型的损失函数。在训练模型时,迭代周期数 num_epochs 和学习率 lr 都是可以调的超参数

num_epochs, lr = 5, 0.1

def train(net, train_iter, test_iter, loss, num_epochs, batch_size, 
          params = None, lr = None, optimizer = None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
            
            l.backward()
            if optimizer is None:
                for param in params:
                    param.data -= lr * param.grad / batch_size # 注意这⾥更改param
            else:
                optimizer.step()
            
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f' 
              %(epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
train(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)
epoch 1, loss 0.7861, train acc 0.750, test acc 0.793
epoch 2, loss 0.5715, train acc 0.812, test acc 0.810
epoch 3, loss 0.5263, train acc 0.826, test acc 0.818
epoch 4, loss 0.5008, train acc 0.833, test acc 0.824
epoch 5, loss 0.4842, train acc 0.837, test acc 0.827

3.7 SOFTMAX回归的简洁实现

import torch.nn as nn
class LinearNet(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)
    
    def forward(self, x): # x shape: (batch_size, 1, 28, 28),则用view()将x的形状转换为(batch_size, 784)
        y = self.linear(x.view(x.shape[0], -1))
        return y
    
net = LinearNet(num_inputs, num_outputs)
# 对 x 的形状转换的这个功能⾃定义⼀个 `FlattenLayer`类
class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x):
        return x.view(x.shape[0], -1)

# 定义这个模型
from collections import OrderedDict
net = nn.Sequential(
    OrderedDict([
        ('flatten', FlattenLayer()),
        ('linear', nn.Linear(num_inputs, num_outputs))
    ])
)
# 我们使⽤均值为0、标准差为0.01的正态分布随机初始化模型的权重参数
from torch.nn import init
init.normal_(net.linear.weight, mean = 0, std = 0.01)
init.constant_(net.linear.bias, val = 0)
Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)
# 定义损失函数和优化器
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr = 0.1)
# 训练模型
num_epochs = 5
train(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)
epoch 1, loss 0.0031, train acc 0.748, test acc 0.788
epoch 2, loss 0.0022, train acc 0.813, test acc 0.813
epoch 3, loss 0.0021, train acc 0.826, test acc 0.813
epoch 4, loss 0.0020, train acc 0.832, test acc 0.821
epoch 5, loss 0.0019, train acc 0.836, test acc 0.826

3.8 多层感知机

前面讲了单层神经网络,而深度学习中神经网络往往有很多层,所以接下来开始进行多层神经网络的实现

H = ϕ ( X W h + b h ) , H = \phi(XW_h + b_h), H=ϕ(XWh+bh),
O = H W o + b o O = HW_o + b_o O=HWo+bo

其中 ϕ \phi ϕ为激活函数。常用的激活函数有Relu, sigmoid, tanh。
在分类问题中,我们可以对输出 O O O做softmax运算,并使⽤softmax回归中的交叉熵损失函数。在回归问题中,我们将输出层的输出个数设为1,并将输出 O O O直接提供给线性回归中使⽤的平⽅损失函数

import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision.transforms as transforms

batch_size = 256
mnist_train = torchvision.datasets.FashionMNIST(root='./data', 
                                                train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='./data', 
                                               train=False, download=True, transform=transforms.ToTensor())
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, 
                                         shuffle=True, num_workers=4)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, 
                                        shuffle=False, num_workers=4)

3.8.1 定义模型参数

设置超参数隐藏单元个数为256

num_inputs, num_outputs, num_hiddens = 784, 10, 256

# 对 x 的形状转换的这个功能⾃定义⼀个 `FlattenLayer`类
class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x):
        return x.view(x.shape[0], -1)
    
net = nn.Sequential(
    FlattenLayer(),
    nn.Linear(num_inputs, num_hiddens),
    nn.ReLU(),
    nn.Linear(num_hiddens, num_outputs),
)

from torch.nn import init

for params in net.parameters():
    init.normal_(params, mean = 0, std = 0.01)

3.8.2 训练函数

# 定义一个评价net在数据集上的准确率
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim = 1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n

def train(net, train_iter, test_iter, loss, num_epochs, batch_size, 
          params = None, lr = None, optimizer = None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
            
            l.backward()
            if optimizer is None:
                for param in params:
                    param.data -= lr * param.grad / batch_size # 注意这⾥更改param
            else:
                optimizer.step()
            
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f' 
              %(epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

3.8.3 读取数据并训练数据

batch_size = 256
loss = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr = 0.5)

num_epochs = 5
train(net, train_iter, test_iter, loss, num_epochs,batch_size, None, None, optimizer)
epoch 1, loss 0.0019, train acc 0.823, test acc 0.815
epoch 2, loss 0.0017, train acc 0.841, test acc 0.829
epoch 3, loss 0.0015, train acc 0.856, test acc 0.844
epoch 4, loss 0.0015, train acc 0.863, test acc 0.806
epoch 5, loss 0.0014, train acc 0.868, test acc 0.859

你可能感兴趣的:(Pytorch,Pytorch,【60天修炼计划】)