二.Pytorch与视觉竞赛入门
任务1:PyTorch张量计算与Numpy的转换
任务要点:Pytorch基础使用、张量计算
步骤1:配置本地Notebook环境,或使用天池DSW:https://dsw-dev.data.aliyun.com/#/
-
步骤2:学习Pytorch的基础语法,并成功执行以下代码
- 基础pytorch教程:https://zhuanlan.zhihu.com/p/25572330
- 官方教程:https://pytorch.org/tutorials/beginner/basics/intro.html
c = np.ones((3,3))
d = torch.from_numpy(c) #numpy 转tensor
解答:
import torch
import numpy as np
c = np.ones((3,3))
d = torch.from_numpy(c) #numpy转tensor
e = d.numpy() #tensor转numpy
d,e
任务2:梯度计算和梯度下降过程
任务要点:Pytorch梯度计算、随机梯度下降
步骤1:学习自动求梯度原理,https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html
步骤2:学习随机梯度下降原理,https://www.cnblogs.com/BYRans/p/4700202.html
-
步骤3:
- 使用numpy创建一个y=10*x+4+noise(0,1)的数据,其中x是0到100的范围,以0.01进行等差数列
- 使用pytorch定义w和b,并使用随机梯度下降,完成回归拟合。
解答:
一定要看清要求!!
由题意我觉得用nn.Linear()不合适,因为没有用到要去定义的w和b。我们nn.Parameter()初始化参数,介绍参考https://blog.csdn.net/qq_28753373/article/details/104179354
还有个坑就是这里用SGD一定要调小一点,不然会梯度爆炸。用Adam则不会出现问题。
import numpy as np
import torch.nn as nn
import torch
import torch.optim as optim
class MLP(nn.Module):
def __init__(self, w, b):
super(MLP, self).__init__()
self.weight = nn.Parameter(w) # 使用nn.Parameter()对weights进行初始化
self.bias = nn.Parameter(b)
def forward(self, x):
out = torch.matmul(x, self.weight) + self.bias
return out
w = torch.tensor([[1.]])
b = torch.tensor([1.])
x = np.arange(0, 100, 0.01)
noise = np.random.normal(0, 1, 10000) # (0,1)的高斯噪声
y = 10 * x + 4 + noise
x = torch.from_numpy(x).float()
y = torch.from_numpy(y).float() # .float
x = torch.unsqueeze(x, dim=1) # 转换[1,10000]为[10000,1]
y = torch.unsqueeze(y, dim=1) # 转换[1,10000]为[10000,1]
model = MLP(w, b)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.0001) # 一定是小的学习率,不然会梯度爆炸
for epoch in range(100000):
y_pre = model(x)
loss = criterion(y_pre, y)
if epoch % 1000 == 0:
print("Epoch:{}, loss is {}".format(epoch, loss))
optimizer.zero_grad() # 梯度清零
loss.backward() # 反向传播计算梯度
optimizer.step() # 更新参数
print("w的值为", model.weight.item())
print("b的值为", model.bias.item())
任务3:PyTorch全连接层原理和使用
任务要点:全连接网络
步骤1:学习全连接网络原理,https://blog.csdn.net/xiaodong_11/article/details/82015456
步骤2:在pytorch中使用矩阵乘法实现全连接层
步骤3:在pytorch中使用nn.Linear层
解答:
import torch
import torch.nn as nn
class Mylinear(nn.Module):
def __init__(self, in_features, out_features):
super(Mylinear, self).__init__()
self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
self.bias = nn.Parameter(torch.Tensor(out_features))
def forward(self, x):
out = [email protected]()+self.bias
return out
model = Mylinear(784, 10)
x = torch.rand(100, 784)
out = model(x)
for name, param in model.named_parameters():
print('%14s : %s' % (name, param.shape)) # 打印参数名和参数数量
# print('%s' % param) # 可以打印出参数
import torch
import torch.nn as nn
class linear(nn.Module):
def __init__(self, in_features, out_features):
super(linear, self).__init__()
self.fc = nn.Linear(in_features, out_features)
def forward(self, x):
out = self.fc(x)
return out
model = linear(784, 10)
x = torch.rand(100, 784)
out = model(x)
for name, param in model.named_parameters():
print('%14s : %s' % (name, param.shape)) # 打印参数名和参数数量
# print('%s' % param) # 可以打印出参数
任务4:PyTorch激活函数原理和使用
任务要点:激活函数
步骤1:学习激活函数的原理,https://zhuanlan.zhihu.com/p/88429934
步骤2:在pytorch中手动实现上述激活函数
解答:
def ELU(x, alpha=1.0, inplace=False):
return max(0, x) + min(0, alpha * (np.exp(x) - 1))
def LeakyReLU(x, negative_slope=0.01, inplace=False):
return max(0, x) + negative_slope * min(0, x)
def PReLU(x, num_parameters=1, init=0.25):
return max(0, x) + init * min(0, x)
def ReLU(x, inplace=False):
return max(0, x)
def ReLU6(x, inplace=False):
return min(max(0, x), 6)
def SELU(x, inplace=False):
alpha=1.6732632423543772848170429916717
scale=1.0507009873554804934193349852946
return scale * (max(0, x) + min(0, alpha * (np.exp(x) - 1)))
def CELU(x, alpha=1.00, inplace=False):
return max(0, x) + min(0, alpha * (np.exp(x / alpha) - 1))
def Sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
def LogSigmoid(x):
return np.log(1.0 / (1.0 + np.exp(-x)))
def Tanh(x):
return np.exp(x) - np.exp(-x) / np.exp(x) + np.exp(-x)
def TanhShrink(x):
return x - (np.exp(x) - np.exp(-x) / np.exp(x) + np.exp(-x))
def Softplus(x, beta=1, threshold=20):
return (1.0 / beta) * (np.log(1 + np.exp(beta * x)))
def SoftShrink(x, lambd=0.5):
if x > lambd:
return x - lambd
elif x < -lambd:
return x + lambd
return 0
任务5:PyTorch卷积层原理和使用
任务要点:卷积层
步骤1:理解卷积层的原理和具体使用
https://blog.csdn.net/qq_37385726/article/details/81739179
https://www.cnblogs.com/zhangxiann/p/13584415.html
步骤2:计算下如下卷积层的参数量
nn.Conv2d(
in_channels=1,
out_channels=32,
kernel_size=5,
stride=1,
padding=2
)
解答:
方法一:函数计算
import torch.nn as nn
import torch
class net(nn.Module):
def __init__(self):
super(net, self).__init__()
self.conv1 = nn.Conv2d(
in_channels=1,
out_channels=32,
kernel_size=5,
stride=1,
padding=2
)
def forward(self, x):
return self.conv1(x)
model = net()
p = sum(map(lambda p: p.numel(), model.parameters()))
print(p)
方法二:直接计算
in_channel是1,1个kernel的参数是5x5x1,32个out_channel也就是32个卷积核,bias也就是32,5x5x1x32+32=832
任务6:PyTorch常见的损失函数和优化器使用
任务要点:损失函数、优化器
步骤1:学习损失函数的细节,https://www.cnblogs.com/wanghui-garcia/p/10862733.html
步骤2:学习优化器的使用,https://pytorch.org/docs/stable/optim.html
步骤3:设置不同的优化器和学习率,重复任务2的回归过程
- 损失函数MSE、优化器SGD、学习率0.1
- 损失函数MSE、优化器SGD、学习率0.5
- 损失函数MSE、优化器SGD、学习率0.01
解答:
这里我们重新构造x,y, w,b, noise,方便用SGD优化。如果用任务2的数据,结果惨不忍睹(因为lr太大,第二个epoch都是nan了)
import torch
import torch.nn as nn
import torch.optim as optim
class linear(nn.Module):
def __init__(self, in_features, out_features):
super(linear, self).__init__()
self.fc = nn.Linear(in_features, out_features)
def forward(self, x):
out = self.fc(x)
return out
x = torch.randn(4, 3)
w = torch.randint(5, 10, size=(3, 1), dtype=torch.float)
b = torch.tensor(5.)
noise = torch.randn(4, 1)
y = x @ w + b + noise
res = {}
for lr in [0.5, 0.1, 0.01]:
best_loss = float("inf")
best_epoch = 0
model = linear(3, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr)
for epoch in range(100000):
y_pre = model(x)
loss = criterion(y_pre, y)
if epoch % 1000 == 0:
print("Epoch:{}, loss is {}".format(epoch, loss))
optimizer.zero_grad()
loss.backward()
optimizer.step()
if loss < best_loss:
best_loss = loss
best_epoch = epoch
res[lr] = {'loss': best_loss.item(), 'epoch': best_epoch}
print(res)
任务7:PyTorch池化层和归一化层
任务要点:池化层、归一化层
步骤1:使用pytorch代码实现2d pool中的mean-pooling、max-pooling
https://pytorch.org/docs/stable/nn.html#pooling-layers
https://blog.csdn.net/shanglianlm/article/details/85313924
步骤2:学习归一化的原理,https://blog.csdn.net/qq_23981335/article/details/106572171
解答:
import torch
import torch.nn as nn
x = torch.randn(10, 3, 32, 32)
avg = nn.AvgPool2d(3, 3)
print(avg(x).shape)
maxp = nn.MaxPool2d(7, 3)
print(maxp(x).shape)
输出为:
torch.Size([10, 3, 10, 10])
torch.Size([10, 3, 9, 9])
x = torch.randint(10, size=(10, 3, 32, 32)).float()
bn = nn.BatchNorm2d(3)
gn = nn.GroupNorm(num_groups=1, num_channels=3) # num_channels必须被num_groups整除
任务8:使用PyTorch搭建VGG网络
任务要点:网络搭建
https://zhuanlan.zhihu.com/p/263527295
步骤1:理解VGG网络的原理。
步骤2:使用pytorch搭建VGG网络模型。
步骤3:打印出VGG 11层模型 每层特征图的尺寸,以及参数量。
解答:
# -*- coding: UTF-8 -*-
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
class VGG(nn.Module):
"""
VGG builder
"""
def __init__(self, arch: object, num_classes=1000) -> object:
super(VGG, self).__init__()
self.in_channels = 3
self.conv3_64 = self.__make_layer(64, arch[0])
self.conv3_128 = self.__make_layer(128, arch[1])
self.conv3_256 = self.__make_layer(256, arch[2])
self.conv3_512a = self.__make_layer(512, arch[3])
self.conv3_512b = self.__make_layer(512, arch[4])
self.fc1 = nn.Linear(7 * 7 * 512, 4096)
self.bn1 = nn.BatchNorm1d(4096)
self.bn2 = nn.BatchNorm1d(4096)
self.fc2 = nn.Linear(4096, 4096)
self.fc3 = nn.Linear(4096, num_classes)
def __make_layer(self, channels, num):
layers = []
for i in range(num):
layers.append(nn.Conv2d(self.in_channels, channels, 3, stride=1, padding=1, bias=False)) # same padding
layers.append(nn.BatchNorm2d(channels))
layers.append(nn.ReLU())
self.in_channels = channels
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv3_64(x)
out = F.max_pool2d(out, 2)
out = self.conv3_128(out)
out = F.max_pool2d(out, 2)
out = self.conv3_256(out)
out = F.max_pool2d(out, 2)
out = self.conv3_512a(out)
out = F.max_pool2d(out, 2)
out = self.conv3_512b(out)
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = self.fc1(out)
out = self.bn1(out)
out = F.relu(out)
out = self.fc2(out)
out = self.bn2(out)
out = F.relu(out)
return self.fc3(out)
def VGG_11():
return VGG([1, 1, 2, 2, 2], num_classes=1000)
def VGG_13():
return VGG([1, 1, 2, 2, 2], num_classes=1000)
def VGG_16():
return VGG([2, 2, 3, 3, 3], num_classes=1000)
def VGG_19():
return VGG([2, 2, 4, 4, 4], num_classes=1000)
def test():
net = VGG_11()
# net = VGG_13()
# net = VGG_16()
# net = VGG_19()
summary(net, (3, 224, 224), device="cpu")
test()
输出
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,728
BatchNorm2d-2 [-1, 64, 224, 224] 128
ReLU-3 [-1, 64, 224, 224] 0
Conv2d-4 [-1, 128, 112, 112] 73,728
BatchNorm2d-5 [-1, 128, 112, 112] 256
ReLU-6 [-1, 128, 112, 112] 0
Conv2d-7 [-1, 256, 56, 56] 294,912
BatchNorm2d-8 [-1, 256, 56, 56] 512
ReLU-9 [-1, 256, 56, 56] 0
Conv2d-10 [-1, 256, 56, 56] 589,824
BatchNorm2d-11 [-1, 256, 56, 56] 512
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 512, 28, 28] 1,179,648
BatchNorm2d-14 [-1, 512, 28, 28] 1,024
ReLU-15 [-1, 512, 28, 28] 0
Conv2d-16 [-1, 512, 28, 28] 2,359,296
BatchNorm2d-17 [-1, 512, 28, 28] 1,024
ReLU-18 [-1, 512, 28, 28] 0
Conv2d-19 [-1, 512, 14, 14] 2,359,296
BatchNorm2d-20 [-1, 512, 14, 14] 1,024
ReLU-21 [-1, 512, 14, 14] 0
Conv2d-22 [-1, 512, 14, 14] 2,359,296
BatchNorm2d-23 [-1, 512, 14, 14] 1,024
ReLU-24 [-1, 512, 14, 14] 0
Linear-25 [-1, 4096] 102,764,544
BatchNorm1d-26 [-1, 4096] 8,192
Linear-27 [-1, 4096] 16,781,312
BatchNorm1d-28 [-1, 4096] 8,192
Linear-29 [-1, 1000] 4,097,000
================================================================
Total params: 132,882,472
Trainable params: 132,882,472
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 170.10
Params size (MB): 506.91
Estimated Total Size (MB): 677.58
----------------------------------------------------------------
任务9:使用PyTorch搭建ResNet网络
任务要点:网络搭建
https://zhuanlan.zhihu.com/p/263526658
步骤1:理解ResNet网络的原理。
步骤2:使用pytorch搭建ResNet网络模型。
步骤3:打印出ResNet 18模型 每层特征图的尺寸,以及参数量。
解答:
# -*- coding: UTF-8 -*-
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
class SE(nn.Module):
def __init__(self, in_chnls, ratio):
super(SE, self).__init__()
self.squeeze = nn.AdaptiveAvgPool2d((1, 1))
self.compress = nn.Conv2d(in_chnls, in_chnls // ratio, 1, 1, 0)
self.excitation = nn.Conv2d(in_chnls // ratio, in_chnls, 1, 1, 0)
def forward(self, x):
out = self.squeeze(x)
out = self.compress(out)
out = F.relu(out)
out = self.excitation(out)
return F.sigmoid(out)
class BN_Conv2d(nn.Module):
"""
BN_CONV, default activation is ReLU
"""
def __init__(self, in_channels: object, out_channels: object, kernel_size: object, stride: object, padding: object,
dilation=1, groups=1, bias=False, activation=True) -> object:
super(BN_Conv2d, self).__init__()
layers = [nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=bias),
nn.BatchNorm2d(out_channels)]
if activation:
layers.append(nn.ReLU(inplace=True))
self.seq = nn.Sequential(*layers)
def forward(self, x):
return self.seq(x)
class BasicBlock(nn.Module):
"""
basic building block for ResNet-18, ResNet-34
"""
message = "basic"
def __init__(self, in_channels, out_channels, strides, is_se=False):
super(BasicBlock, self).__init__()
self.is_se = is_se
self.conv1 = BN_Conv2d(in_channels, out_channels, 3, stride=strides, padding=1, bias=False) # same padding
self.conv2 = BN_Conv2d(out_channels, out_channels, 3, stride=1, padding=1, bias=False, activation=False)
if self.is_se:
self.se = SE(out_channels, 16)
# fit input with residual output
self.short_cut = nn.Sequential()
if strides is not 1:
self.short_cut = nn.Sequential(
nn.Conv2d(in_channels, out_channels, 1, stride=strides, padding=0, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
out = self.conv1(x)
out = self.conv2(out)
if self.is_se:
coefficient = self.se(out)
out = out * coefficient
out = out + self.short_cut(x)
return F.relu(out)
class ResNet(nn.Module):
"""
building ResNet_34
"""
def __init__(self, block: object, groups: object, num_classes=1000) -> object:
super(ResNet, self).__init__()
self.channels = 64 # out channels from the first convolutional layer
self.block = block
self.conv1 = nn.Conv2d(3, self.channels, 7, stride=2, padding=3, bias=False)
self.bn = nn.BatchNorm2d(self.channels)
self.pool1 = nn.MaxPool2d(3, 2, 1)
self.conv2_x = self._make_conv_x(channels=64, blocks=groups[0], strides=1, index=2)
self.conv3_x = self._make_conv_x(channels=128, blocks=groups[1], strides=2, index=3)
self.conv4_x = self._make_conv_x(channels=256, blocks=groups[2], strides=2, index=4)
self.conv5_x = self._make_conv_x(channels=512, blocks=groups[3], strides=2, index=5)
self.pool2 = nn.AvgPool2d(7)
patches = 512 if self.block.message == "basic" else 512 * 4
self.fc = nn.Linear(patches, num_classes) # for 224 * 224 input size
def _make_conv_x(self, channels, blocks, strides, index):
"""
making convolutional group
:param channels: output channels of the conv-group
:param blocks: number of blocks in the conv-group
:param strides: strides
:return: conv-group
"""
list_strides = [strides] + [1] * (blocks - 1) # In conv_x groups, the first strides is 2, the others are ones.
conv_x = nn.Sequential()
for i in range(len(list_strides)):
layer_name = str("block_%d_%d" % (index, i)) # when use add_module, the name should be difference.
conv_x.add_module(layer_name, self.block(self.channels, channels, list_strides[i]))
self.channels = channels if self.block.message == "basic" else channels * 4
return conv_x
def forward(self, x):
out = self.conv1(x)
out = F.relu(self.bn(out))
out = self.pool1(out)
out = self.conv2_x(out)
out = self.conv3_x(out)
out = self.conv4_x(out)
out = self.conv5_x(out)
out = self.pool2(out)
out = out.view(out.size(0), -1)
out = F.softmax(self.fc(out))
return out
def ResNet_18(num_classes=1000):
return ResNet(block=BasicBlock, groups=[2, 2, 2, 2], num_classes=num_classes)
# def ResNet_34(num_classes=1000):
# return ResNet(block=BasicBlock, groups=[3, 4, 6, 3], num_classes=num_classes)
#
# def ResNet_50(num_classes=1000):
# return ResNet(block=BottleNeck, groups=[3, 4, 6, 3], num_classes=num_classes)
#
# def ResNet_101(num_classes=1000):
# return ResNet(block=BottleNeck, groups=[3, 4, 23, 3], num_classes=num_classes)
#
# def ResNet_152(num_classes=1000):
# return ResNet(block=BottleNeck, groups=[3, 8, 36, 3], num_classes=num_classes)
def test():
net = ResNet_18()
# net = ResNet_34()
# net = ResNet_50()
# net = ResNet_101()
# net = ResNet_152()
summary(net, (3, 224, 224), device='cpu')
test()
输出
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
MaxPool2d-3 [-1, 64, 56, 56] 0
Conv2d-4 [-1, 64, 56, 56] 36,864
BatchNorm2d-5 [-1, 64, 56, 56] 128
ReLU-6 [-1, 64, 56, 56] 0
BN_Conv2d-7 [-1, 64, 56, 56] 0
Conv2d-8 [-1, 64, 56, 56] 36,864
BatchNorm2d-9 [-1, 64, 56, 56] 128
BN_Conv2d-10 [-1, 64, 56, 56] 0
BasicBlock-11 [-1, 64, 56, 56] 0
Conv2d-12 [-1, 64, 56, 56] 36,864
BatchNorm2d-13 [-1, 64, 56, 56] 128
ReLU-14 [-1, 64, 56, 56] 0
BN_Conv2d-15 [-1, 64, 56, 56] 0
Conv2d-16 [-1, 64, 56, 56] 36,864
BatchNorm2d-17 [-1, 64, 56, 56] 128
BN_Conv2d-18 [-1, 64, 56, 56] 0
BasicBlock-19 [-1, 64, 56, 56] 0
Conv2d-20 [-1, 128, 28, 28] 73,728
BatchNorm2d-21 [-1, 128, 28, 28] 256
ReLU-22 [-1, 128, 28, 28] 0
BN_Conv2d-23 [-1, 128, 28, 28] 0
Conv2d-24 [-1, 128, 28, 28] 147,456
BatchNorm2d-25 [-1, 128, 28, 28] 256
BN_Conv2d-26 [-1, 128, 28, 28] 0
Conv2d-27 [-1, 128, 28, 28] 8,192
BatchNorm2d-28 [-1, 128, 28, 28] 256
BasicBlock-29 [-1, 128, 28, 28] 0
Conv2d-30 [-1, 128, 28, 28] 147,456
BatchNorm2d-31 [-1, 128, 28, 28] 256
ReLU-32 [-1, 128, 28, 28] 0
BN_Conv2d-33 [-1, 128, 28, 28] 0
Conv2d-34 [-1, 128, 28, 28] 147,456
BatchNorm2d-35 [-1, 128, 28, 28] 256
BN_Conv2d-36 [-1, 128, 28, 28] 0
BasicBlock-37 [-1, 128, 28, 28] 0
Conv2d-38 [-1, 256, 14, 14] 294,912
BatchNorm2d-39 [-1, 256, 14, 14] 512
ReLU-40 [-1, 256, 14, 14] 0
BN_Conv2d-41 [-1, 256, 14, 14] 0
Conv2d-42 [-1, 256, 14, 14] 589,824
BatchNorm2d-43 [-1, 256, 14, 14] 512
BN_Conv2d-44 [-1, 256, 14, 14] 0
Conv2d-45 [-1, 256, 14, 14] 32,768
BatchNorm2d-46 [-1, 256, 14, 14] 512
BasicBlock-47 [-1, 256, 14, 14] 0
Conv2d-48 [-1, 256, 14, 14] 589,824
BatchNorm2d-49 [-1, 256, 14, 14] 512
ReLU-50 [-1, 256, 14, 14] 0
BN_Conv2d-51 [-1, 256, 14, 14] 0
Conv2d-52 [-1, 256, 14, 14] 589,824
BatchNorm2d-53 [-1, 256, 14, 14] 512
BN_Conv2d-54 [-1, 256, 14, 14] 0
BasicBlock-55 [-1, 256, 14, 14] 0
Conv2d-56 [-1, 512, 7, 7] 1,179,648
BatchNorm2d-57 [-1, 512, 7, 7] 1,024
ReLU-58 [-1, 512, 7, 7] 0
BN_Conv2d-59 [-1, 512, 7, 7] 0
Conv2d-60 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-61 [-1, 512, 7, 7] 1,024
BN_Conv2d-62 [-1, 512, 7, 7] 0
Conv2d-63 [-1, 512, 7, 7] 131,072
BatchNorm2d-64 [-1, 512, 7, 7] 1,024
BasicBlock-65 [-1, 512, 7, 7] 0
Conv2d-66 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-67 [-1, 512, 7, 7] 1,024
ReLU-68 [-1, 512, 7, 7] 0
BN_Conv2d-69 [-1, 512, 7, 7] 0
Conv2d-70 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-71 [-1, 512, 7, 7] 1,024
BN_Conv2d-72 [-1, 512, 7, 7] 0
BasicBlock-73 [-1, 512, 7, 7] 0
AvgPool2d-74 [-1, 512, 1, 1] 0
Linear-75 [-1, 1000] 513,000
================================================================
Total params: 11,689,512
Trainable params: 11,689,512
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 62.41
Params size (MB): 44.59
Estimated Total Size (MB): 107.58
----------------------------------------------------------------
任务10:使用PyTorch完成Fashion-MNIST分类
https://github.com/masoudrostami/Fashion-MNIST-using-PyTorch/blob/main/MNIST%20Fashion%20Project.ipynb
步骤1:搭建4层卷积 + 2层全连接的分类模型。
步骤2:在训练过程中记录下每个epoch的训练集精度和测试集精度。
解答:
from torch import nn, optim
from torchvision import transforms
import torchvision
import torch
from torch.utils.data import Dataset
# Different classes in Fashion MNIST dataset
# classes = ('Tshirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
# 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Anke boot')
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Download and load the training and test data
trainset = torchvision.datasets.FashionMNIST('data', download=True, train=True, transform=transform)
testset = torchvision.datasets.FashionMNIST('data', download=True, train=False, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)
class Fashion(nn.Module):
def __init__(self):
super(Fashion, self).__init__()
self.model = nn.Sequential(nn.Conv2d(1, 6, kernel_size=7, stride=1),
nn.Conv2d(6, 16, kernel_size=7, stride=1),
nn.Conv2d(16, 16, kernel_size=7, stride=1),
nn.Conv2d(16, 32, kernel_size=7, stride=1),
nn.Flatten(),
nn.Linear(32 * 4 * 4, 128),
nn.Linear(128, 10),
)
def forward(self, x):
out = self.model(x)
return out
def get_acc(output, label):
total = output.shape[0]
_, pred_label = output.max(1)
num_correct = (pred_label == label).sum().item()
return num_correct / total
model = Fashion().to(device)
error = nn.NLLLoss().to(device)
learning_rate = 0.1
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
num_epochs = 50
for e in range(num_epochs):
# running_loss = 0
train_acc = 0
for images, labels in trainloader:
images, labels = images.to(device), labels.to(device)
log_ps = model(images)
loss = error(log_ps, labels)
optimizer.zero_grad() # Zeroing our gradients
loss.backward() # Taking backward pass
optimizer.step()
train_acc += get_acc(log_ps, labels)
# running_loss += loss.item()
print(f"Training Accuracy: {train_acc / len(trainloader) * 100:.2f}", end=" ")
model.eval()
with torch.no_grad():
total = 0
correct = 0
for images, labels in testloader:
images, labels = images.to(device), labels.to(device)
log_ps = model(images)
mx_index = torch.argmax(log_ps, dim=1)
total += labels.numel()
correct += sum(mx_index == labels).item()
print(f"Test Accuracy {correct / total * 100:.2f}")
结果有点莫名其妙,待优化...
任务11:使用PyTorch完成人脸关键点检测
数据集:https://ai-contest-static.xfyun.cn/2021/7afa865e-5ac8-48ab-9966-d88bb33cdc15/%E4%BA%BA%E8%84%B8%E5%85%B3%E9%94%AE%E7%82%B9%E6%A3%80%E6%B5%8B%E6%8C%91%E6%88%98%E8%B5%9B_%E6%95%B0%E6%8D%AE%E9%9B%86.zip
https://gitee.com/coggle/competition-baseline/blob/master/competition/%E7%A7%91%E5%A4%A7%E8%AE%AF%E9%A3%9EAI%E5%BC%80%E5%8F%91%E8%80%85%E5%A4%A7%E8%B5%9B2021/%E4%BA%BA%E8%84%B8%E5%85%B3%E9%94%AE%E7%82%B9%E6%A3%80%E6%B5%8B%E6%8C%91%E6%88%98%E8%B5%9B/face-keypoint2.ipynb
步骤1:搭建4层卷积 + 2层的模型完成关键点回归。
步骤2:使用resnet18预训练模型完成关键点回归。
解答:
import os, sys, codecs, glob
from PIL import Image, ImageDraw
import numpy as np
import pandas as pd
import time
from torchvision.models.resnet import resnet18
import torch
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error
import timm
torch.backends.cudnn.benchmark = False
import torchvision.models as models
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data.dataset import Dataset
import warnings
warnings.filterwarnings("ignore")
# 单个样本读取
class load_data(Dataset):
def __init__(self, img, keypoint, transform=None):
self.img = img
self.transform = transform
self.keypoint = keypoint
def __getitem__(self, index):
img = Image.fromarray(self.img[:, :, index]).convert('RGB')
if self.transform is not None:
img = self.transform(img)
return img, self.keypoint[index] / 96.0
def __len__(self):
return self.img.shape[-1]
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.model = nn.Sequential(nn.Conv2d(3, 6, kernel_size=5, stride=2),
nn.Conv2d(6, 16, kernel_size=5, stride=2),
nn.Conv2d(16, 16, kernel_size=5, stride=2),
nn.Conv2d(16, 32, kernel_size=5, stride=2),
nn.Flatten(),
nn.Linear(32 * 3 * 3, 784),
nn.Linear(784, 8),
)
def forward(self, x):
out = self.model(x)
return out
class resNet(nn.Module):
def __init__(self, model):
super(resNet, self).__init__()
self.model = nn.Sequential(*list(model.children())[:-1], # [b, 512, 1, 1] *是防止Sequential打乱顺序
nn.Flatten(), # [b, 512, 1, 1] -> [b, 512*1*1]
nn.Linear(512, 8))
def forward(self, x):
out = self.model(x)
return out
class XunFeiNet(nn.Module):
def __init__(self):
super(XunFeiNet, self).__init__()
self.model = timm.create_model('resnet18', num_classes=8, pretrained=True)
def forward(self, img, labels=None):
feat = self.model(img)
return feat
def train(train_loader, model, criterion, optimizer, epoch):
model.train()
for i, (input, target) in enumerate(train_loader):
input = input.cuda(non_blocking=True).float()
target = target.cuda(non_blocking=True).float()
output = model(input)
loss = criterion(output, target)
optimizer.zero_grad()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
loss.backward()
optimizer.step()
if i % 200 == 0:
print(loss.item())
def validate(val_loader, model):
model.eval()
val_feats = []
with torch.no_grad():
end = time.time()
for i, (input, target) in enumerate(val_loader):
input = input.cuda().float()
target = target.cuda().float()
output = model(input)
val_feats.append(output.data.cpu().numpy())
return val_feats
if __name__ == '__main__':
train_df = pd.read_csv('人脸关键点检测挑战赛_数据集/train.csv')
train_df = train_df.fillna(48)
train_img = np.load('人脸关键点检测挑战赛_数据集/train.npy')
test_img = np.load('人脸关键点检测挑战赛_数据集/test.npy')
# 单个样本读取 -> 批量样本读取
train_loader = torch.utils.data.DataLoader(
load_data(train_img[:, :, :-500], train_df.values[:-500],
transforms.Compose([
transforms.ToTensor(),
])
),
batch_size=10, shuffle=True
)
val_loader = torch.utils.data.DataLoader(
load_data(train_img[:, :, -500:], train_df.values[-500:],
transforms.Compose([
transforms.ToTensor(),
])
),
batch_size=10, shuffle=False
)
test_loader = torch.utils.data.DataLoader(
load_data(test_img, np.zeros((2049, 8)),
transforms.Compose([
transforms.ToTensor(),
])
),
batch_size=10, shuffle=False
)
# model = Net().cuda()
# model = resNet(resnet18(pretrained=True)).cuda()
model = XunFeiNet().cuda()
criterion = nn.MSELoss().cuda()
optimizer = torch.optim.Adam(model.parameters(), 0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.85)
best_acc = 0.0
for epoch in range(5):
print('Epoch: ', epoch)
train(train_loader, model, criterion, optimizer, epoch)
val_feats = validate(val_loader, model)
scheduler.step()
val_feats = np.vstack(val_feats) * 96
print('Val', mean_absolute_error(val_feats, train_df.values[-500:]))
pred_tta = []
pred = []
with torch.no_grad():
for t, (x, y) in enumerate(test_loader):
x_var = x.cuda()
y_var = y.cuda()
scores = model(x_var)
pred.append(scores.data.cpu().numpy())
pred = np.concatenate(pred, 0)
pred_tta.append(pred)
pred = np.mean(pred_tta, axis=0)
idx = 409
xy = pred[idx].reshape(4, 2) * 96
plt.scatter(xy[:, 0], xy[:, 1], c='r')
plt.imshow(test_img[:, :, idx], cmap='gray')
plt.show()
# col = ['left_eye_center_x', 'left_eye_center_y', 'right_eye_center_x',
# 'right_eye_center_y', 'nose_tip_x', 'nose_tip_y',
# 'mouth_center_bottom_lip_x','mouth_center_bottom_lip_y']
# pd.DataFrame(pred * 96, columns=col).to_csv('submit.csv', index=None)
Net,resNet,XunFeiNet分别指自己搭建的网络,改造的torchvision.models预训练resnet18网络和timm中的预训练resnet18,注意:运行code,使用预训练模型需要等待自动下载完成!
Net结果:
resNet结果:
XunFeiNet结果:
任务12:使用PyTorch搭建对抗生成网络
步骤1:学习对抗生成网络的原理,https://blog.csdn.net/DFCED/article/details/105175097
步骤2:学习DCGAN的代码实现,https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
步骤3:使用任务11中的人脸数据(缩放至64*64),并使用DCGAN完成训练,生成人脸。
解答:
from __future__ import print_function
# %matplotlib inline
import random
import torch.backends.cudnn as cudnn
import torch.utils.data
import torchvision.utils as vutils
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML
from PIL import Image, ImageDraw
import numpy as np
import pandas as pd
torch.backends.cudnn.benchmark = False
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
from torch.utils.data.dataset import Dataset
class XunFeiDataset(Dataset):
def __init__(self, img, keypoint, transform=None):
self.img = img
self.transform = transform
self.keypoint = keypoint
def __getitem__(self, index):
img = Image.fromarray(self.img[:, :, index]).convert('RGB')
if self.transform is not None:
img = self.transform(img)
return img, self.keypoint[index] / 96.0
def __len__(self):
return self.img.shape[-1]
# custom weights initialization called on netG and netD
def weights_init(m):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
nn.init.normal_(m.weight.data, 0.0, 0.02)
elif classname.find('BatchNorm') != -1:
nn.init.normal_(m.weight.data, 1.0, 0.02)
nn.init.constant_(m.bias.data, 0)
# Generator Code
class Generator(nn.Module):
def __init__(self, ngpu):
super(Generator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d(nz, ngf * 8, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# state size. (ngf*8) x 4 x 4
nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# state size. (ngf*4) x 8 x 8
nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
# state size. (ngf*2) x 16 x 16
nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 32 x 32
nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
nn.Tanh()
# state size. (nc) x 64 x 64
)
def forward(self, input):
return self.main(input)
# Discriminator Code
class Discriminator(nn.Module):
def __init__(self, ngpu):
super(Discriminator, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is (nc) x 64 x 64
nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf) x 32 x 32
nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*2) x 16 x 16
nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*4) x 8 x 8
nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ndf * 8),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*8) x 4 x 4
nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
nn.Sigmoid()
)
def forward(self, input):
return self.main(input)
if __name__ == '__main__':
# Set random seed for reproducibility
manualSeed = 999
# manualSeed = random.randint(1, 10000) # use if you want new results
print("Random Seed: ", manualSeed)
random.seed(manualSeed)
torch.manual_seed(manualSeed)
# Root directory for dataset
dataroot = "celeba/" # 我们加载的数据不需要
# Number of workers for dataloader
workers = 0
# Batch size during training
batch_size = 128
# Spatial size of training images. All images will be resized to this
# size using a transformer.
image_size = 64
# Number of channels in the training images. For color images this is 3 通道数
nc = 3
# Size of z latent vector (i.e. size of generator input)
nz = 100
# Size of feature maps in generator
ngf = 64
# Size of feature maps in discriminator
ndf = 64
# Number of training epochs
num_epochs = 5
# Learning rate for optimizers
lr = 0.0002
# Beta1 hyperparam for Adam optimizers
beta1 = 0.5
# Number of GPUs available. Use 0 for CPU mode.
ngpu = 1
train_img = np.load('人脸关键点检测挑战赛_数据集/train.npy')
test_img = np.load('人脸关键点检测挑战赛_数据集/test.npy')
train_df = pd.read_csv('人脸关键点检测挑战赛_数据集/train.csv')
train_df = train_df.fillna(48)
dataloader = torch.utils.data.DataLoader(
XunFeiDataset(train_img[:, :, :-500], train_df.values[:-500],
transform=transforms.Compose([
transforms.Resize([64, 64]),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
), batch_size=batch_size, shuffle=True, num_workers=workers)
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
# 训练图可视化
real_batch = next(iter(dataloader))
plt.figure(figsize=(8, 8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(
np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(), (1, 2, 0)))
plt.show()
# Create the generator
netG = Generator(ngpu).to(device)
# Handle multi-gpu if desired
if (device.type == 'cuda') and (ngpu > 1):
netG = nn.DataParallel(netG, list(range(ngpu)))
# Apply the weights_init function to randomly initialize all weights
# to mean=0, stdev=0.02.
netG.apply(weights_init)
# Print the model
print(netG)
# Create the Discriminator
netD = Discriminator(ngpu).to(device)
# Handle multi-gpu if desired
if (device.type == 'cuda') and (ngpu > 1):
netD = nn.DataParallel(netD, list(range(ngpu)))
# Apply the weights_init function to randomly initialize all weights
# to mean=0, stdev=0.2.
netD.apply(weights_init)
# Print the model
print(netD)
# Initialize BCELoss function
criterion = nn.BCELoss()
# Create batch of latent vectors that we will use to visualize
# the progression of the generator
fixed_noise = torch.randn(64, nz, 1, 1, device=device)
# Establish convention for real and fake labels during training
real_label = 1.
fake_label = 0.
# Setup Adam optimizers for both G and D
optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))
# Training Loop
# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0
print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
# For each batch in the dataloader
for i, data in enumerate(dataloader, 0):
############################
# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
###########################
## Train with all-real batch
netD.zero_grad()
# Format batch
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
# Forward pass real batch through D
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
errD_real = criterion(output, label)
# Calculate gradients for D in backward pass
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
# Generate batch of latent vectors
noise = torch.randn(b_size, nz, 1, 1, device=device)
# Generate fake image batch with G
fake = netG(noise)
label.fill_(fake_label)
# Classify all fake batch with D
output = netD(fake.detach()).view(-1)
# Calculate D's loss on the all-fake batch
errD_fake = criterion(output, label)
# Calculate the gradients for this batch, accumulated (summed) with previous gradients
errD_fake.backward()
D_G_z1 = output.mean().item()
# Compute error of D as sum over the fake and the real batches
errD = errD_real + errD_fake
# Update D
optimizerD.step()
############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
label.fill_(real_label) # fake labels are real for generator cost
# Since we just updated D, perform another forward pass of all-fake batch through D
output = netD(fake).view(-1)
# Calculate G's loss based on this output
errG = criterion(output, label)
# Calculate gradients for G
errG.backward()
D_G_z2 = output.mean().item()
# Update G
optimizerG.step()
# Output training stats
if i % 50 == 0:
print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (epoch, num_epochs, i, len(dataloader),
errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(errD.item())
# Check how the generator is doing by saving G's output on fixed_noise
if (iters % 500 == 0) or ((epoch == num_epochs - 1) and (i == len(dataloader) - 1)):
with torch.no_grad():
fake = netG(fixed_noise).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
plt.figure(figsize=(10, 5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses, label="G")
plt.plot(D_losses, label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()
# %%capture
fig = plt.figure(figsize=(8, 8))
plt.axis("off")
ims = [[plt.imshow(np.transpose(i, (1, 2, 0)), animated=True)] for i in img_list]
ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True)
HTML(ani.to_jshtml())
# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))
# Plot the real images
plt.figure(figsize=(15, 15))
plt.subplot(1, 2, 1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(
np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(), (1, 2, 0)))
# Plot the fake images from the last epoch
plt.subplot(1, 2, 2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1], (1, 2, 0)))
plt.show()
注意:epoch比较小的话,效果比较差,D(G(z))比较小,D(x)占据上风,也就是生成的图片不足以骗过判别器。我们测试epoch=100,生成的图有一些人脸还是比较差。epoch=1000,效果也比较一般,GAN的生成还是比较玄学的。
epoch=100
epoch=1000
同时也测试了celeb数据集epoch=5的情况,本身数据集比较大,所以epoch=5也花费了不少时间。