PyTorch是目前最受欢迎的深度学习框架之一,本文介绍PyTorch中的一些基本概念及操作,包括张量、自动微分等,最后通过训练一个用于图像分类的神经网络简单介绍用PyTorch框架训练神经网络的基本流程,希望对新入门PyTorch的读者有所帮助。
本文主要参考PyTorch官方文档及网络上的一些教程,如有侵权,请联系删除。
Tensor是一种专用的数据结构,类似于数组和矩阵。在 PyTorch 中,我们使用张量对模型的输入和输出以及模型的参数进行编码。
Tensor与NumPy的ndarray类似,不同的是PyTorch中的 Tensor 可以在 GPU 或其他专用硬件上运行以加速计算。
import torch
import numpy as np
Tensor有几种初始化方式:
#用其他数据进行初始化
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
#用NumPy数组进行初始化
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
#用其他Tensors进行初始化
x_ones = torch.ones_like(x_data) #与x_data的属性保持特性一致
x_rand = torch.rand_like(x_data, dtype=torch.float) #另外指定数据类型
Tensor的属性包括: shape, datatype, device等
tensor = torch.rand(3,4)
print("Shape of tensor: ", tensor.shape)
print("Datatype of tensor: ", tensor.dtype)
print("tensor is stored on: ", tensor.device)
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
tensor is stored on: cpu
我们可以把Tensor放到GPU上计算:
if torch.cuda.is_available():
tensor = tensor.to('cuda')
print("tensor is stored on: ", tensor.device)
tensor is stored on: cuda:0
PyTorch中支持超过100个Tensor操作,包括转置、索引、切片、数学运算、线性代数、随机采样等等。
print(tensor, "\n")
# 切片
print(tensor[:,1], '\n')
# 对应元素相乘
print("tensor.mul(tensor):\n {} \n".format(tensor.mul(tensor)))
# 或者
print("tensor * tensor: \n {} \n".format(tensor * tensor))
# 矩阵乘法
print("tensor.matmul(tensor.T): \n {} \n".format({tensor.matmul(tensor.T)}))
# 或者:
print("tensor @ tensor.T: \n {} \n".format({tensor @ tensor.T}))
# 零内存拷贝的操作
tensor.add_(5)
print(tensor)
tensor([[0.7988, 0.5616, 0.4597, 0.0974],
[0.9885, 0.7940, 0.4081, 0.4274],
[0.5799, 0.3978, 0.6393, 0.6365]], device='cuda:0')
tensor([0.5616, 0.7940, 0.3978], device='cuda:0')
tensor.mul(tensor):
tensor([[0.6381, 0.3154, 0.2114, 0.0095],
[0.9771, 0.6305, 0.1665, 0.1826],
[0.3363, 0.1583, 0.4087, 0.4051]], device='cuda:0')
tensor * tensor:
tensor([[0.6381, 0.3154, 0.2114, 0.0095],
[0.9771, 0.6305, 0.1665, 0.1826],
[0.3363, 0.1583, 0.4087, 0.4051]], device='cuda:0')
tensor.matmul(tensor.T):
{tensor([[1.1744, 1.4648, 1.0426],
[1.4648, 1.9568, 1.4220],
[1.0426, 1.4220, 1.3084]], device='cuda:0')}
tensor @ tensor.T:
{tensor([[1.1744, 1.4648, 1.0426],
[1.4648, 1.9568, 1.4220],
[1.0426, 1.4220, 1.3084]], device='cuda:0')}
tensor([[5.7988, 5.5616, 5.4597, 5.0974],
[5.9885, 5.7940, 5.4081, 5.4274],
[5.5799, 5.3978, 5.6393, 5.6365]], device='cuda:0')
PyTorch中的Tensor与NumPy中的数组可以互相转换,并且会共享内存位置,更改其中一个的内容同样会影响另一个的值。
t = torch.ones(3)
n = t.numpy()
print("t: ", t)
print("n: ", n)
#更改Tensor会影响NumPy数组
t.add_(1)
print("\nt: ", t)
print("n: ", n)
#Tensor由NumPy数组初始化
n = np.ones(6)
t = torch.from_numpy(n)
print("\nn: ", n)
print("t: ", t)
#更改NumPy数组会影响Tensor
np.add(n, 1, out=n)
print("\nn: ", n)
print("t: ", t)
t: tensor([1., 1., 1.])
n: [1. 1. 1.]
t: tensor([2., 2., 2.])
n: [2. 2. 2.]
n: [1. 1. 1. 1. 1. 1.]
t: tensor([1., 1., 1., 1., 1., 1.], dtype=torch.float64)
n: [2. 2. 2. 2. 2. 2.]
t: tensor([2., 2., 2., 2., 2., 2.], dtype=torch.float64)
Autograd包是PyTorch中所有神经网络的核心,它为Tensors上的所有操作提供自动微分,为神经网络训练过程中的反向传播提供驱动力。对于每一个Tensor,如果设置它的属性 .requires_grad 为 True,那么Autograd将会追踪对于该张量的所有操作。当完成计算后可以通过调用.backward(),来自动计算所有的梯度。这个张量的所有梯度将会自动累加到.grad属性中。
x = torch.ones(3, 3)
if not x.requires_grad:
x.requires_grad_(True)
print(x)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], requires_grad=True)
对Tensor做一些操作:
y = x * 2
print(y, '\n')
z = y * y * 2
print(z)
tensor([[2., 2., 2.],
[2., 2., 2.],
[2., 2., 2.]], grad_fn=)
tensor([[8., 8., 8.],
[8., 8., 8.],
[8., 8., 8.]], grad_fn=)
进行反正传播,求x的梯度x.grad:
z.backward(x)
print('x.grad: {} \n'.format(x.grad))
x.grad: tensor([[16., 16., 16.],
[16., 16., 16.],
[16., 16., 16.]])
可以通过将代码块包装在 with torch.no_grad(): 中,来阻止Autograd去跟踪设置了.requires_grad=True 的Tensor的历史记录:
with torch.no_grad():
print((x * 2).requires_grad)
False
在PyTorch中我们可以通过torch.nn包来构建神经网络。一个典型的神经网络训练过程如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
# nn.Module是所有神经网络模型的基类
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5) #输入1通道,输出6通道,5x5卷积核
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# 2x2 Max pooling
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # 除去批处理维度的其他所有维度
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
我们只需要定义 forward 函数,可以在中使用任何针对张量的操作和计算。backward函数用来计算导数,会通过autograd自动定义。
给网络输入32x32的数据:
input = torch.randn(1, 1, 32, 32)
output = net(input)
print(output)
tensor([[-0.1158, -0.0385, 0.1082, 0.0346, -0.0512, 0.0358, 0.1280, 0.1219,
-0.0250, 0.0036]], grad_fn=)
target = torch.randn(10) # 使用模拟数据
target = target.view(1, -1) # 使目标值与数据值尺寸一致
criterion = nn.MSELoss()
loss = criterion(output, target) # 使用均方误差损失函数
print(loss)
tensor(1.6162, grad_fn=)
首先需要清零现有的梯度,否则当前梯度会与已有的梯度累加,然后再调用loss.backward()来反向传播误差。
net.zero_grad() # 清零所有参数的梯度缓存
print('conv1.bias.grad before backward:')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward:')
print(net.conv1.bias.grad)
conv1.bias.grad before backward:
None
conv1.bias.grad after backward:
tensor([ 0.0106, -0.0016, 0.0181, 0.0205, 0.0186, -0.0276])
假设采用随机梯度下降(SGD)法来更新神经网络的权重:
w e i g h t = w e i g h t − l e a r n i n g _ r a t e ∗ g r a d i e n t weight = weight - learning\_rate * gradient weight=weight−learning_rate∗gradient
import torch.optim as optim
# 创建SGD优化器
optimizer = optim.SGD(net.parameters(), lr=0.01)
optimizer.step() #更新权重
torchvision包中包含了计算机视觉中常用的Imagenet、CIFAR10、MNIST等数据集,可以通过torchvision.datasets来引用。同时PyTorch中提供数据加载器torch.utils.data.DataLoader用于加载数据集。
在本例程中,我们使用CIFAR10数据集来训练一个神经网络,用于对图像进行分类。CIFAR10数据集有10个类别,每张图片都是32x32像素的3通道彩色图片。
首先导入需要的包:
import torch
import torchvision
import torchvision.transforms as transforms
用torchvision加载数据集后输出的是范围在[0, 1]之间的PILImage,我们需要将其标准化为范围在[-1, 1]之间的张量。
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
classes = ('airplane', 'automobile', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Files already downloaded and verified
Files already downloaded and verified
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(3, 6, 5),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(6, 16, 5),
nn.ReLU(),
nn.MaxPool2d(2, 2)
)
self.fc = nn.Sequential(
nn.Linear(16*5*5, 120),
nn.ReLU(),
nn.Linear(120, 84),
nn.ReLU(),
nn.Linear(84, 10)
)
def forward(self, x):
feature = self.conv(x)
output = self.fc(feature.view(x.shape[0], -1))
return output
net = Net()
import torch.optim as optim
#使用交叉熵损失函数
criterion = nn.CrossEntropyLoss()
#使用SGD优化器
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
通过数据加载器,将训练集数据输入给网络和优化器进行训练:
total_epoches = 5
for epoch in range(total_epoches):
totoal_loss = 0.0
i = 0
for data in trainloader:
# 读取数据
inputs, labels = data
# 对网络所有参数的梯度进行清零
optimizer.zero_grad()
# Forward
outputs = net(inputs)
# 计算损失
loss = criterion(outputs, labels)
# Backward
loss.backward()
# 更新参数
optimizer.step()
totoal_loss += loss.item()
# 每5000个batch打印一次
if i % 5000 == 4999:
print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, totoal_loss / 5000))
totoal_loss = 0.0
i += 1
print('训练完成')
[1, 5000] loss: 2.036
[1, 10000] loss: 1.607
[2, 5000] loss: 1.384
[2, 10000] loss: 1.326
[3, 5000] loss: 1.216
[3, 10000] loss: 1.177
[4, 5000] loss: 1.103
[4, 10000] loss: 1.094
[5, 5000] loss: 1.012
[5, 10000] loss: 1.034
训练完成
保存训练好的模型:
SAVE_PATH = './cifar10_net.pth'
torch.save(net.state_dict(), SAVE_PATH)
在测试集上测试模型的效果:
net = Net()
net.load_state_dict(torch.load(SAVE_PATH))
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('The accuracy of the network on the %d test images: %d %%' % (total, 100 * correct / total))
The accuracy of the network on the 10000 test images: 62 %
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#如果有多个GPU,则采用数据并行的方式在多个GPU上进行训练
if torch.cuda.device_count() > 1:
# 默认在全部GPU上进行训练
net = nn.DataParallel(net)
# 或者在指定的GPU上进行训练
# net = nn.DataParallel(net, device_ids=[0,1])
# 还可以先通过环境变量设置GPU,再调用nn.DataParallel
# os.environ["CUDA_VISIBLE_DEVICES"] = '1,2'
net.to(device)
Net(
(conv): Sequential(
(0): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(4): ReLU()
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(fc): Sequential(
(0): Linear(in_features=400, out_features=120, bias=True)
(1): ReLU()
(2): Linear(in_features=120, out_features=84, bias=True)
(3): ReLU()
(4): Linear(in_features=84, out_features=10, bias=True)
)
)
如果在GPU上训练,那么训练数据也需要送入到GPU上
inputs, labels = inputs.to(device), labels.to(device)