主要对《深度学习框架PyTorch:入门与实践》进行学习:
1.1 PyTorch的诞生
1.2 常见的深度学习框架介绍
1.3 属于动态图的未来
计算图上的微积分:反向传播(Backpropagation)
反向传播是使训练深度模型在计算上变得容易的关键算法。对于现代神经网络来说,相对于简单的实现,它可以让梯度下降的训练速度快上千万倍。这就是一个模型需要一周的训练和需要20万年的区别
除了在深度学习中的应用,反向传播在许多其他领域也是一个强大的计算工具,从天气预报到数值稳定性分析,只是名称不同而已。事实上,这个算法在不同的领域已经被重新发明了至少几十次(见Griewank(2010))。一般的、与应用无关的名称是"逆向模式分化"
从根本上来说,这是一种快速计算导数的技术。而这是你包里必不可少的技巧,不仅在深度学习中,而且在各种各样的数值计算情况下,都可以使用
计算图分为静态计算图和动态计算图,PyTorch使用的是动态图
1.4 为什么选择PyTorch
1.5 星火燎原
1.6 fast.ai放弃Keras+TensorFlow选择PyTorch
输入:
import torch
x = torch.rand(5, 3)
print(x)
显示:
tensor([[0.9732, 0.1357, 0.9145],
[0.5854, 0.5963, 0.1376],
[0.2889, 0.1247, 0.4303],
[0.9021, 0.8625, 0.6613],
[0.4179, 0.7646, 0.3817]])
说明安装成功
踩坑:
在C:\Windows\System32\drivers\etc\hosts
增加以下:
#github
140.82.112.4 github.com
199.232.68.133 raw.githubusercontent.com
199.232.69.194 github.global.ssl.fastly.net
185.199.108.153 assets-cdn.github.com
185.199.110.153 assets-cdn.github.com
185.199.111.153 assets-cdn.github.com
import torch as t
x = t.Tensor(5,3) # 构建5*3矩阵,只分配空间,为初始化
tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
x = t.rand(5,3) # 构造一个随机初始化的矩阵,[0,1]均匀分布
tensor([[0.5630, 0.5831, 0.1832],
[0.3755, 0.1908, 0.2778],
[0.6050, 0.8965, 0.4945],
[0.7089, 0.0270, 0.5071],
[0.6807, 0.5496, 0.0910]])
x = t.zeros(5, 3, dtype=t.long) # 数据类型是 long
tensor([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
x = t.tensor([5.5, 3]) # 构造一个张量,直接使用数据
tensor([5.5000, 3.0000])
x = x.new_ones(5, 3, dtype=t.double)
print(x)
x = t.randn_like(x, dtype=t.float) # 创建一个 tensor 基于已经存在的 tensor
print(x)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], dtype=torch.float64)
tensor([[ 0.2489, 0.7820, 1.7745],
[ 0.3888, -0.0552, 0.3124],
[ 1.0615, 0.1501, 0.8262],
[-0.5375, 1.3126, -0.2629],
[ 0.8085, -0.3260, 1.7282]])
print(x[:,1])
tensor([ 0.7820, -0.0552, 0.1501, 1.3126, -0.3260])
print(x.size())
torch.Size([5, 3])
x.size(0) # torch.Size 是一个元组
5
print(x.size()[0])
5
y = t.rand(5,3)
tensor([[0.6101, 0.8795, 0.0847],
[0.2647, 0.0896, 0.2071],
[0.5567, 0.5395, 0.7964],
[0.3699, 0.4707, 0.9524],
[0.4630, 0.4474, 0.4270]])
print(x+y) # 加法: 方式 1
print(t.add(x,y)) # 加法: 方式2
result = t.empty(5,3)
t.add(x,y, out=result) # 加法: 提供一个输出 tensor 作为参数
print(result)
y.add_(x) # # adds x to y
print(y)
tensor([[ 0.8590, 1.6615, 1.8591],
[ 0.6535, 0.0344, 0.5195],
[ 1.6182, 0.6895, 1.6226],
[-0.1676, 1.7833, 0.6895],
[ 1.2715, 0.1214, 2.1552]])
x = t.randn(4,4)
y = x.view(16) # 改变一个 tensor 的大小或者形状
z = x.view(-1,8)
print(x.size(), y.size(), z.size())
torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])
x = t.randn(1)
print(x)
print(x.item()) # 使用 .item() 来获得这个 value
tensor([0.1640])
0.16402670741081238
Tensor 不支持的操作,可以先转成numpy数组处理,之后再转回Tensor
a = t.ones(5)
b = a.numpy() # Tensor → Numpy
b
array([1., 1., 1., 1., 1.], dtype=float32)
import numpy as np
a = np.ones(5)
b = t.from_numpy(a) # Numpy → Tensor
b.add_(1)
print(b)
print(a)
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
[2. 2. 2. 2. 2.]
Tensor和Numpy对象共享内存,所以它们之间的转换很快,而且几乎不会消耗资源
import torch
x = torch.ones(2,2, requires_grad=True) # 创建一个张量,设置 requires_grad=True 来跟踪与它相关的计算
print(x)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
y = x + 2 # y 作为操作的结果被创建,所以它有 grad_fn
print(y)
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
print(y.grad_fn)
<AddBackward0 object at 0x0000026670D99040>
z = y * y * 3
out = z.mean()
print(z, out)
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)
a = torch.randn(2,2)
a = ((a * 3)/(a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
False
True
<SumBackward0 object at 0x0000026670937D90>
out.backward()
print(x.grad)
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
x = torch.randn(3, requires_grad=True)
x
tensor([-0.3211, -0.7231, 0.9793], requires_grad=True)
y = x * 2
while y.data.norm() < 1000: # 对y张量L2范数,先对y中每一项取平方,之后累加,最后取根号
y = y * 2
print(y)
tensor([-328.8195, -740.4708, 1002.8046], grad_fn=<MulBackward0>)
y.backward()
报错:
RuntimeError: grad can be implicitly created only for scalar outputs
改为:
y = y.sum()
y.backward()
x.grad
tensor([512., 512., 512.])
或者:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v) # 要雅可比向量积
print(x.grad)
tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
True
True
False
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad=True)
x
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
y = x.sum()
y
tensor(4., grad_fn=<SumBackward0>)
y.grad_fn
<SumBackward0 at 0x2666fec5190>
y.backward() # 反向传播,计算梯度
x.grad
tensor([[1., 1.],
[1., 1.]])
y.backward()
x.grad
tensor([[2., 2.],
[2., 2.]])
y.backward() # 每次运行反向传播,梯度都会累加之前的梯度,所以反向传播之前需要把梯度清零
x.grad
tensor([[3., 3.],
[3., 3.]])
x.grad.data.zero_()
tensor([[0., 0.],
[0., 0.]])
y.backward()
x.grad
tensor([[1., 1.],
[1., 1.]])
x = Variable(torch.ones(4, 5)) # Variable和Tensor具有几乎一样的接口,实际使用中可以无缝切换
y = torch.cos(x)
x_tensor_cos = torch.cos(x.data)
print(y)
print(x_tensor_cos)
tensor([[0.5403, 0.5403, 0.5403, 0.5403, 0.5403],
[0.5403, 0.5403, 0.5403, 0.5403, 0.5403],
[0.5403, 0.5403, 0.5403, 0.5403, 0.5403],
[0.5403, 0.5403, 0.5403, 0.5403, 0.5403]])
tensor([[0.5403, 0.5403, 0.5403, 0.5403, 0.5403],
[0.5403, 0.5403, 0.5403, 0.5403, 0.5403],
[0.5403, 0.5403, 0.5403, 0.5403, 0.5403],
[0.5403, 0.5403, 0.5403, 0.5403, 0.5403]])
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x): # 只要在nn.Module的子类中定义了forward函数,backward函数就会被自动实现(利用Autograd)
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
torch.randn(1,1,32,32)
tensor([[[[ 0.9383, 0.2258, -1.6244, ..., 0.1584, 1.0838, -0.3517],
[ 0.0886, 0.1057, -2.0140, ..., 0.9298, 0.3106, -0.7022],
[-1.2740, -0.1253, -0.2867, ..., 0.6535, -0.6852, 0.5318],
...,
[-0.9887, 0.5887, 1.6174, ..., -0.0556, -0.7984, -0.4886],
[-1.3098, 0.0088, 0.1746, ..., 0.9128, 0.2602, 0.4438],
[ 0.6318, -0.7189, -0.5646, ..., 0.5879, 0.7369, -0.3482]]]])
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight
for name,parameters in net.named_parameters():
print(name,':',parameters.size())
10
torch.Size([6, 1, 5, 5])
conv1.weight : torch.Size([6, 1, 5, 5])
conv1.bias : torch.Size([6])
conv2.weight : torch.Size([16, 6, 5, 5])
conv2.bias : torch.Size([16])
fc1.weight : torch.Size([120, 400])
fc1.bias : torch.Size([120])
fc2.weight : torch.Size([84, 120])
fc2.bias : torch.Size([84])
fc3.weight : torch.Size([10, 84])
fc3.bias : torch.Size([10])
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
tensor([[ 0.0509, 0.0137, 0.0444, -0.1061, -0.0869, 0.0608, -0.1220, 0.0101,
-0.0129, 0.0442]], grad_fn=<AddmmBackward>)
net.zero_grad()
out.backward(torch.randn(1, 10))
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
计算图:input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss
tensor(1.0540, grad_fn=<MseLossBackward>)
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0016, -0.0151, 0.0127, 0.0059, 0.0009, -0.0097])
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='D:/University_Study/2021_Graduation_project/Code/data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='D:/University_Study/2021_Graduation_project/Code/data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to D:/University_Study/2021_Graduation_project/Code/data\cifar-10-python.tar.gz
100%|█████████▉| 170385408/170498071 [00:29<00:00, 2917300.60it/s]Extracting D:/University_Study/2021_Graduation_project/Code/data\cifar-10-python.tar.gz to D:/University_Study/2021_Graduation_project/Code/data
Files already downloaded and verified
170500096it [00:33, 5065648.50it/s]
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
报错:
The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
解决办法:加入if __name__ == '__main__':
报错:
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
把E:\ProgramData\Anaconda3\Lib\site-packages\torch\lib\libiomp5md.dll
移到别的地方
cat frog plane deer
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
if __name__ == '__main__':
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
结果:
[1, 2000] loss: 2.228
[1, 4000] loss: 1.849
[1, 6000] loss: 1.649
[1, 8000] loss: 1.553
[1, 10000] loss: 1.504
[1, 12000] loss: 1.436
[2, 2000] loss: 1.388
[2, 4000] loss: 1.346
[2, 6000] loss: 1.305
[2, 8000] loss: 1.306
[2, 10000] loss: 1.290
[2, 12000] loss: 1.247
Finished Training
Accuracy of the network on the 10000 test images: 54 %
Accuracy of plane : 68 %
Accuracy of car : 58 %
Accuracy of bird : 39 %
Accuracy of cat : 14 %
Accuracy of deer : 47 %
Accuracy of dog : 75 %
Accuracy of frog : 58 %
Accuracy of horse : 56 %
Accuracy of ship : 59 %
Accuracy of truck : 68 %
大致了解了Tensor、自动微分、神经网络、PyTorch 图像分类,接下来自己找一个图像数据集自己动手编程实践