只是个人学习的笔记,内容主要是pytorch官网的tutorials部分的Deep Learning with PyTorch: A 60 Minute Blitz,其中添加了我自己的说明
pytorch官网有对下部分内容的大全面说明,pytorch官网有的,我这里重复说明较少
以后复习起来,将这里的内容和pytorch官网对照看
import torch
import torch.nn as nn
import torch.nn.functional as F
#An nn.Module contains layers, and a method forward(input)that returns the output.
class Net(nn.Module):
#nn.Module contains layers,定义参数
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
#疑问:6*6还需要自己计算,而不能自动计算得到?尚未解决
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
# a method forward(input)that returns the output,定义前向传播方法
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
#num_flat_features函数是把经过两次池化后的16x6x6的矩阵组降维成二维,便于view函数
#处理,其中用乘法也是为了不丢失每一层相关的特性
x = x.view(-1, self.num_flat_features(x))
#z = x.view(-1, 8),the size -1 is inferred from other dimensions.
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
params = list(net.parameters())
print(len(params))
print(params)
#每个parameter是tensor,被包含在一个list中。
print(params[0].size()) # conv1's .weight
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
#这个out是一个二维数据,输入四维矩阵,经过两次卷积后,四维矩阵要与全连接层做计算,要将多维矩阵变为二维
#且计算后的结果也是二维矩阵,下面的损失函数要将targrt和out做对比,也就必须将这个out定义为二维
net.zero_grad()
out.backward(torch.randn(1, 10))
net.zero_grad()
out.backward(torch.randn(1, 10))
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
# 对上面代码target = target.view(1, -1)的是说明
target = torch.randn(10)
print(target)
#一维数据
target = target.view(1, -1)
print(target)
#变为二维,但是包含的数据只有一维的:1行10列
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
#运行
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
#自己定义参数更新过程
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
TRAINING A CLASSIFIER
前面讲到了如何定义神经网络,计算损失,更新网络的权重,现在思考如何处理数据?
处理数据的包介绍
当你处理image,text,audio,video data时,可以用标准python package,这些package可以加载数据为numpy array。
然后,你可以将array转换为torch.*Tensor.
处理不同data的package如下:
1,For images,packages such as Pillow, OpenCV are useful
2,For audio, packages such as scipy and librosa
3,For text, either raw Python or Cython based loading, or NLTK and SpaCy
pytorch的torchvision package
对于vision(视觉处理)来说,pytorch提供了torchvision package
torchvision package的data loader(数据加载器)可以提供一些常见数据集,比如Imagenet, CIFAR10, MNIST, etc
torchvision package的处理image的data transformers(数据转换器):torchvision.datasets and torch.utils.data.DataLoader.
use the CIFAR10 dataset
10 classes:‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’.
The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.
Training an image classifier
将做以下步骤:
1.用torchvision加载且normalizing CIFAR10 training and test datasets
2.定义一个Convolutional Neural Network
3.定义一个 loss function
4.在 training data上训练Network
5.在test data上测试network
**
# Using torchvision to load CIFAR10.
import torch
import torchvision
import torchvision.transforms as transforms
The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].
torchvision datasets的输出是 PILImage images,其像素范围在[0, 1],我们将其转换为tensor,且标准化为[-1, 1]
If running on Windows and you get a BrokenPipeError, try setting the num_worker of torch.utils.data.DataLoader() to 0.
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
#compose构成,组成.
#1,torchvision.transforms:图像转换工具,用torchvision.transforms.Compose(transforms)可以将这些转换工具组合起来。
#Parameters:list of Transform objects,参数是Transform object的list。
#2,transforms.ToTensor():将一个PIL Image or numpy.ndarray (H x W x C)(相似范围是[0, 255])转换为orch.FloatTensor
#((C x H x W),像素范围是[0.0, 1.0])
#3,transforms.Normalize(mean,std,inplace=False):参数是image的每个通道的均值和标准差的的序列。
#mean: (M1,...,Mn) and std: (S1,..,Sn) for n channels
trainset = torchvision.datasets.CIFAR10(root='data/',train=True,download=True, transform=transform)
#数据集使用的初步步骤:数据集调用:包括保存位置,数据集划分,是否下载,是否转换
#1,查看官网Docs>torchvision>torchvision.datasets,CLASStorchvision.datasets.CIFAR10
#2,torchvision.datasets.CIFAR10(root, train=True, transform=None, target_transform=None, download=False),
#3,root (string)-dataset的保存位置。后面的download设置为True的话,数据集会被下载,需要花费一定时间,也在这个位置。dataset已存在,
#则不再下载。
#train (bool, optional)-设置为True,从CIFAR10数据集中提取training set,设置为False,从中提取test set。
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,shuffle=True, num_workers=2)
#调用某个数据集后,数据加载的具体处理方法在TORCH.UTILS.DATA模块
#将自定义的Dataset根据batch size大小、是否shuffle等封装成一个Batch Size大小的Tensor,用于后面的训练,以适应网络的四维数据输入
#1,Dataloader将dataset和sampler采样器结合起来,且提供了迭代器,因而是可迭代对象
#2,batch_size(int, optional),per batch每一批加载samples样本数量.(default: 1)
#3,shuffle洗牌,将序列的所有元素随机排序.(bool, optional),True则在every epoch里将数据重新洗牌
#4,num_workers(int, optional),数据加载的子进程数.0表示只有一个主进程.(default: 0)
testset = torchvision.datasets.CIFAR10(root='data/', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Dataloader生成一个可迭代对象,它将dataset返回的每一条数据样本拼接成一个batch,并提供多线程加速优化和数据打乱操作,
当程序对dataset的所有数据遍历完一遍之后,对Dataloader也完成了一次迭代.
Dataset对象是一个数据集,可以按下标访问,返回(data,label)的数据。 以下为对上面数据集的验证,不是必须步骤
from torchvision.transforms import ToPILImage
show = ToPILImage() #可以把Tensor转换成Image,方便可视化
#Dataset对象(trainset)是一个数据集,可以按下标访问,可以迭代,返回(data,label)的数据
for i in range(0,5):
(data,label)=trainset[i]
print(classes[label])
#(data+1)/2是为了还原被归一化的数据
show((data+1)/2).resize((100,100)) #不知道为什么,这里放到循环中就不显示了
some of the training images显示训练图像 ,对上面数据集的验证,不是必须步骤
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize,还原被归一化的数据,(data+1)/2
npimg = img.numpy() #Docs>torch.Tensor.numpy(),将tensor转换为numpy ndarray.
plt.imshow(np.transpose(npimg, (1, 2, 0))) #
plt.show()
# get some random training images
dataiter = iter(trainloader) #iter()用来生成迭代器,trainloader是支持迭代的集合对象
#iter(object[, sentinel]),object-支持迭代的集合对象,iter创建了一个迭代器对象,每次调用这个迭代器对象的__next__()方法时
#都会调用object。
images, labels = dataiter.next() #调用迭代器对象的.next(),迭代一次,出现一组数据。这与for不同,for会全部循环出来。
# show images
imshow(torchvision.utils.make_grid(images)) #torchvision.utils.make_grid(),Make a grid of images生成一个图像网格
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
# print(' '.join('%5s' % classes[labels])) #TypeError:only integer tensors of a single element can be converted to an index
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5) #修改第一个参数为3
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
print(net)
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times多次遍历数据集,遍历完一遍数据集称为一个epoch
#前面trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,shuffle=True, num_workers=2)。
#shuffle=True,在every epoch里将数据重新洗牌。batch_size=4,则每次返回的batch包括4个数据
#torch.nn只支持mini-batches,不支持一次输入一个样本。输入数据为4维(nSamples,nChannels,Height,Width)
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
#enumerate,列举.枚举。enumerate(sequence, [start=0]),将一个可遍历的数据对象组合为一个索引序列,同时列出数据和
#数据下标.
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients梯度清零
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs) #前面定义的网络,输入数据为4维
loss = criterion(outputs, labels)
loss.backward()
#更新参数
optimizer.step()
# print statistics打印loss信息
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches每2000个batch打印一次训练状态。0到1999为2000
print('[%d, %5d] loss: %.3f' %(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0 #每2000次归0一次,因此得到的loss总是2000个loss的平均值
print('Finished Training')
对上面代码的说明: running_loss += loss.item() loss是一个零维的标量,对标量进行索引是没有意义的(似乎会报 invalid index to scalar variable的错误),使loss.item()可以从标量中获取Python数字。total_loss += loss.item()。
如果在累加损失时未将其转换为Python数字,则可能出现程序内存使用量增加的情况。这是因为上面表达式的右侧原本是一个Python浮点数,而它现在是一个零维张量。因此,总损失累加了张量和它们的梯度历史,这可能会产生很大的autograd 图,耗费内存和计算资源。
# 对上面代码的说明
seq = ['one', 'two', 'three']
for i, element in enumerate(seq,0):
print(i,element)
#save our trained model:保存训练好的model
PATH = './cifar_net.pth' #'./'表示当前所在路径
torch.save(net.state_dict(), PATH) #torch.save(obj,f),保存一个obj到磁盘文件中,f-a file-like object(可执行写入和刷新flush)
#或者是一个包含文件名字的字符串。
#troch.nn.Module.state_dict(),返回一个包含模型所有状况-参数的字典。这是前面定义的Net类中所继承的nn.Module父类中的方法
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
#print('GroundTruth: ', join('%5s' % classes[labels[j]] for j in range(4)))
#NameError: name 'join' is not defined
第二步:加载我们之前训练好保存下的模型(注意:保存并且重新加载这个模型不是必要的,这里仅仅是为了举例说明)
net = Net()
net.load_state_dict(torch.load(PATH))#torch.nn.Module.load_state_dict(state_dict,strict=True),
#从前面保存好的state_dict字典中复制参数和缓冲进入这个模型和子节点.如果strict is True,则state_dict的键必须匹配
#模型的state_dict()的键。
第三步:我们看神经网络判断上面的那些例子都是什么类别
outputs = net(images)
print(outputs)
The outputs are energies for the 10 classes. The higher the energy for a class, the more the network thinks that the image is of the particular class. So, let’s get the index of the highest energy:
输出不是每个类别的概率大小,而是每个类别的得分
_, predicted = torch.max(outputs, 1) #torch.max(input)→Tensor
#这里的策略是:设定1为阈值,输出值大于1的筛选出来,其对应的index为预测的类
print(predicted )
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]for j in range(4)))
现在看神经网络在整个数据集上表现如何
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
#_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item() #什么意思?
print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
print(outputs)
That looks way better than chance, which is 10% accuracy (randomly picking a class out of 10 classes). Seems like the network learnt something. 结果看起来比碰运气好很多,碰运气只有10%准确率(即从10个类中随机选出一类),神经网络似乎学到了什么。
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)
这些方法将运行在所有模块上,并且将他们的参数和缓冲区转移到CUDA tensors
net.to(device)
你将把输入数据和目标数据输入到每个步骤到GPU上
inputs, labels = data[0].to(device), data[1].to(device)