Pytorch搭建CIFAR10的CNN卷积神经网络

源码参考https://github.com/zergtant/pytorch-handbook/blob/master/chapter1/4_cifar10_tutorial.ipynb
稍作修改

CIFAR10数据

CIFAR10是基本的图片数据库,共十个分类,训练集有50000张图片,测试集有10000张图片,图片均为32*32分辨率。Pytorch的torchvision可以很方便的下载使用CIFAR10的数据,代码如下:

import torch
import torchvision
import torchvision.transforms as transforms

#定义超参数
BATCH_SIZE = 4
EPOCH = 2

#torchvision模块载入CIFAR10数据集,并且通过transform归一化到[0,1]
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data',train = True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,batch_size = BATCH_SIZE,
                                          shuffle = True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data',train = False,
                                        download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset,batch_size = BATCH_SIZE,
                                          shuffle = False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')                                          

代码注释

pytorch提供了很方便的接口下载常用数据库,包括MNIST,CIFAR10等,并且输出训练集以及测试集:

  1. torchvision.datasets.CIFAR10()直接下载所有数据,通过train=True/False可以确定赋给训练集或者测试集,数据为32323的[0,255]的RGB image图像;
  2. 用于训练的数据集通常有归一化需求,读取数据的时候可以直接通过transform=transform实现,一般来说transform = torchvision.transforms.ToTensor()可以使的torchvision将[0,255]输出为[0,1]的float RGB,本文中继续做了归一化normalization;
  3. transforms.Compose实现多个transform命令组合,本文中transforms.ToTensor()实现输出为[0,1],紧接着 transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))将[0,1]的RGB归一化为[-1,1]。需要注意的是原本以为第一个参数应该是(0,0,0)才是归一化到均值为0。但是通过transforms的源码发现:
    output[channel] = (input[channel] - mean[channel]) / std[channel]
    也就是说((0,1)-0.5)/0.5=(-1,1);
  4. torchvision.datasets.CIFAR10输出的实际已经是dataset了,可以直接用dataloader进行批加载,BATCH_SIZE是每批取的数据,本文每次计算迭代取四张图;
  5. torch.utils.data.DataLoader还有个需要注意的点是num_workers=2同时并行两个核,但是如果直接运行会报RuntimeError:
    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.
    这是因为python如果要用到并行计算多进程必须在主程序中,需要if name == ‘main’:来运行主程序,具体可参见知乎的一片文章https://zhuanlan.zhihu.com/p/39542342;

显示图片

plt.imshow(trainset.data[86]) #trainset.data中储存了原始数据,并且是array格式
plt.show()

dataiter = iter(trainloader)
images, labels = dataiter.next()
images_comb = torchvision.utils.make_grid(images)
images_comb_unnor = (images_comb*0.5+0.5).numpy()
plt.imshow(np.transpose(images_comb_unnor, (1, 2, 0)))
plt.show()

Python的模块matplotlib是很方便的绘图:

  1. plt.imshow()支持numpy array,可以对(M,N,3)的RGB输出图像,RGB值可以是[0,1]的float,也可以是[0,255]的int;
  2. 有意思的是一致以为trainset经过transform出来就已经是[-1,1]的tensor,但实际上trainset.data中还是保留了原始的array[0.255](32323),plt.imshow()可是直接生成图片;
  3. trainset本身传递的是元组,分别是image和label,image为torch.tensor,[-1,1] (33232),如下:
trainset[0][1]
Out[13]: 6
trainset[0][0]
Out[14]: 
tensor([[[-0.5373, -0.6627, -0.6078,  ...,  0.2392,  0.1922,  0.1608],
         [-0.8745, -1.0000, -0.8588,  ..., -0.0353, -0.0667, -0.0431],
         [-0.8039, -0.8745, -0.6157,  ..., -0.0745, -0.0588, -0.1451],
         ...,
         [ 0.6314,  0.5765,  0.5529,  ...,  0.2549, -0.5608, -0.5843],
         [ 0.4118,  0.3569,  0.4588,  ...,  0.4431, -0.2392, -0.3490],
         [ 0.3882,  0.3176,  0.4039,  ...,  0.6941,  0.1843, -0.0353]],
        [[-0.5137, -0.6392, -0.6235,  ...,  0.0353, -0.0196, -0.0275],
         [-0.8431, -1.0000, -0.9373,  ..., -0.3098, -0.3490, -0.3176],
         [-0.8118, -0.9451, -0.7882,  ..., -0.3412, -0.3412, -0.4275],
         ...,
         [ 0.3333,  0.2000,  0.2627,  ...,  0.0431, -0.7569, -0.7333],
         [ 0.0902, -0.0353,  0.1294,  ...,  0.1608, -0.5137, -0.5843],
         [ 0.1294,  0.0118,  0.1137,  ...,  0.4431, -0.0745, -0.2784]],
        [[-0.5059, -0.6471, -0.6627,  ..., -0.1529, -0.2000, -0.1922],
         [-0.8431, -1.0000, -1.0000,  ..., -0.5686, -0.6078, -0.5529],
         [-0.8353, -1.0000, -0.9373,  ..., -0.6078, -0.6078, -0.6706],
         ...,
         [-0.2471, -0.7333, -0.7961,  ..., -0.4510, -0.9451, -0.8431],
         [-0.2471, -0.6706, -0.7647,  ..., -0.2627, -0.7333, -0.7333],
         [-0.0902, -0.2627, -0.3176,  ...,  0.0980, -0.3412, -0.4353]]])
trainset[0][0].shape
Out[15]: torch.Size([3, 32, 32])
type(trainset[0])
Out[16]: tuple
type(trainset[0])
Out[19]: tuple
type(trainset[0][0])
Out[20]: torch.Tensor
type(trainset[0][1])
Out[21]: int
  1. (images_comb*0.5+0.5).numpy()其实是反归一化过程,将 [-1,1] 恢复成[0,1] 的array,并转换为[32,32,3](np.transpose(images_comb_unnor, (1, 2, 0))),供imshow()输入。

定义卷积神经网络

class CNN_NET(torch.nn.Module):
    def __init__(self):
        super(CNN_NET,self).__init__()
        self.conv1 = torch.nn.Conv2d(in_channels = 3,
                                     out_channels = 6,
                                     kernel_size = 5,
                                     stride = 1,
                                     padding = 0)
        self.pool = torch.nn.MaxPool2d(kernel_size = 2,
                                       stride = 2)
        self.conv2 = torch.nn.Conv2d(6,16,5)
        self.fc1 = torch.nn.Linear(16*5*5,120)
        self.fc2 = torch.nn.Linear(120,84)
        self.fc3 = torch.nn.Linear(84,10)

    def forward(self,x):
        x=self.pool(F.relu(self.conv1(x)))
        x=self.pool(F.relu(self.conv2(x)))
        x=x.view(-1,16*5*5) #卷积结束后将多层图片平铺batchsize行16*5*5列,每行为一个sample,16*5*5个特征
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = CNN_NET()
  1. 网络的搭建基于继承torch.nn.Module的父类,基本格式如下;
class CNN_NET(torch.nn.Module):
    def __init__(self):
        super(CNN_NET,self).__init__()
        ...
    def forward(self,x):
    	return x  
  1. CNN的网络基本结构为卷基层,激活层,池化层,全连接层。torch.nn.Conv2d()中定义输入层数in_channels,输出层数out_channels,卷积核尺寸kernel_size,卷积步长stride,填充padding等参数;torch.nn.MaxPool2d()最大池化,同样定义卷积尺寸和步长等参数;
  2. 全连接层将卷积完的图像铺平,接上全连接神经网络。全连接层的建立需要注意输入特征数要与卷积输出一致,卷积输出平铺尺寸为W=MMC,M为长宽尺寸,C为层数(通道数), 具体计算方式如下:
    对于卷积或者池化计算方法通用,M=(N-kernel_size+2*padding)/stride +1向下取整,N为输入N×N图像的长宽尺寸。
  3. 全连接层的最终输出是10,对应CIFAR10分类标签的10个类。

损失函数和优化器

import torch.optim as optim

optimizer = optim.SGD(net.parameters(),lr=0.001,momentum=0.9)
loss_func =torch.nn.CrossEntropyLoss() # 预测值和真实值的误差计算公式 (交叉熵)
  1. 随机梯度下降作为优化器,交叉熵为损失函数

CNN训练

for epoch in range(EPOCH):
    running_loss = 0.0
    for step, (b_x,b_y)in enumerate(trainloader):
        outputs = net(b_x) # 喂给 net 训练数据 x, 输出预测值
        loss = loss_func(outputs, b_y) # 计算两者的误差
        optimizer.zero_grad() # 清空上一步的残余更新参数值
        loss.backward() # 误差反向传播, 计算参数更新值
        optimizer.step() # 将参数更新值施加到 net 的 parameters 上
        # 打印状态信息
        running_loss += loss.item()
        if step % 1000 == 999:    # 每2000批次打印一次
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, step + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')
  1. EPOCH将样本反复训练多次;
  2. loss = loss_func(outputs, b_y)计算预测值与实际label的交叉熵误差,需要注意的是算误差的时候, 传入的是真实值,不是 one-hot 形式的, 而是1D Tensor, (batch,),torch.nn.CrossEntropyLoss()会将传入的tensor进行onehot然后通过softmax激活计算误差;
  3. 可以看到loss是不断降低的,但是由于EPOCH只是2,loss的降低还不够。另外一共50000张图片,batchsize为4,每个epoch下一共有12500step
[1,  2000] loss: 2.186
[1,  4000] loss: 1.879
[1,  6000] loss: 1.671
[1,  8000] loss: 1.594
[1, 10000] loss: 1.537
[1, 12000] loss: 1.479
[2,  2000] loss: 1.408
[2,  4000] loss: 1.400
[2,  6000] loss: 1.360
[2,  8000] loss: 1.342
[2, 10000] loss: 1.337
[2, 12000] loss: 1.283

验证测试集精度

correct = 0
total = 0
with torch.no_grad():
    #不计算梯度,节省时间
    for (images,labels) in testloader:
        outputs = net(images)
        numbers,predicted = torch.max(outputs.data,1)
        total +=labels.size(0)
        correct+=(predicted==labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

  1. with torch.no_grad()主要是在预测的时候告诉torch下面的变量可以不需要计算梯度,节省计算时间和空间,除此之外加不加这句代码对结果没有影响;
  2. output也是tensor
  3. torch.max(input) → Tensor
    torch.max(a,1) 返回每一行中最大值的那个元素,且返回其索引(返回最大元素在这一行的列索引)
    torch.max(a,0) 返回每一列中最大值的那个元素,且返回索引(返回最大元素在这一列的行索引)
    注意 返回的也是tensor;
  4. 也可以对outputs.data进行F.softmax()激活后再求最大值,但是输出索引不会有变化;torch.max(F.softmax(outputs),1)
  5. 最新的torch 1.0版本以上,torch.Tensor与torch.Tensor.data似乎已经合并,因此不需要.data来取tensor中的值;
  6. 当前网络的计算精度为53%,下面通过对CNN网络的一些参数修改可以提升精度。

网格参数调整

class CNN_NET(torch.nn.Module):
    def __init__(self):
        super(CNN_NET,self).__init__()
        self.conv1 = torch.nn.Conv2d(in_channels = 3,
                                     out_channels = 64,
                                     kernel_size = 5,
                                     stride = 1,
                                     padding = 0)
        self.pool = torch.nn.MaxPool2d(kernel_size = 3,
                                       stride = 2)
        self.conv2 = torch.nn.Conv2d(64,64,5)
        self.fc1 = torch.nn.Linear(64*4*4,384)
        self.fc2 = torch.nn.Linear(384,192)
        self.fc3 = torch.nn.Linear(192,10)

    def forward(self,x):
        x=self.pool(F.relu(self.conv1(x)))
        x=self.pool(F.relu(self.conv2(x)))
        x=x.view(-1,64*4*4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
  1. 增加网络复杂度,包括卷积输出层数,全连接层节点数等,可以将精度提升到63%;
  2. 如果在把epoch增加到8,则能继续提升到73%;

完整代码

########################更新卷积参数############################
import numpy as np
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.nn.functional as F

#hyper parameter
BATCH_SIZE = 4
EPOCH = 2

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data',train = True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,batch_size = BATCH_SIZE,
                                          shuffle = True, num_workers=1)

testset = torchvision.datasets.CIFAR10(root='./data',train = False,
                                        download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset,batch_size = BATCH_SIZE,
                                          shuffle = False, num_workers=1)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# plt.imshow(trainset.data[86]) #trainset.data中储存了原始数据,并且是array格式
# plt.show()

# dataiter = iter(trainloader)
# images, labels = dataiter.next()
# images_comb = torchvision.utils.make_grid(images)
# images_comb_unnor = (images_comb*0.5+0.5).numpy()
# plt.imshow(np.transpose(images_comb_unnor, (1, 2, 0)))
# plt.show()

class CNN_NET(torch.nn.Module):
    def __init__(self):
        super(CNN_NET,self).__init__()
        self.conv1 = torch.nn.Conv2d(in_channels = 3,
                                     out_channels = 64,
                                     kernel_size = 5,
                                     stride = 1,
                                     padding = 0)
        self.pool = torch.nn.MaxPool2d(kernel_size = 3,
                                       stride = 2)
        self.conv2 = torch.nn.Conv2d(64,64,5)
        self.fc1 = torch.nn.Linear(64*4*4,384)
        self.fc2 = torch.nn.Linear(384,192)
        self.fc3 = torch.nn.Linear(192,10)

    def forward(self,x):
        x=self.pool(F.relu(self.conv1(x)))
        x=self.pool(F.relu(self.conv2(x)))
        x=x.view(-1,64*4*4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = CNN_NET()

import torch.optim as optim

optimizer = optim.SGD(net.parameters(),lr=0.001,momentum=0.9)
loss_func =torch.nn.CrossEntropyLoss()

for epoch in range(EPOCH):
    running_loss = 0.0
    for step, data in enumerate(trainloader):
        b_x,b_y=data
        outputs = net.forward(b_x)
        loss = loss_func(outputs, b_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 打印状态信息
        running_loss += loss.item()
        if step % 1000 == 999:    # 每2000批次打印一次
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, step + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

dataiter = iter(trainloader)
images, labels = dataiter.next()
images_comb = torchvision.utils.make_grid(images)
images_comb_unnor = (images_comb*0.5+0.5).numpy()
plt.imshow(np.transpose(images_comb_unnor, (1, 2, 0)))
plt.show()

predicts=net.forward(images)




########测试集精度#######
correct = 0
total = 0
with torch.no_grad():
    #不计算梯度,节省时间
    for (images,labels) in testloader:
        outputs = net(images)
        numbers,predicted = torch.max(outputs.data,1)
        total +=labels.size(0)
        correct+=(predicted==labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

你可能感兴趣的:(技术沉淀)