模型量化是为了减少模型的大小,以便在边缘设备上进行计算
首先建网络:
import torch
import torch.nn as nn
from torchsummary import summary
device = torch.device("cpu")
class SimpleNet(nn.Module):
def __init__(self, num_classes=10):
super(SimpleNet, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2)
self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)
self.conv4 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=3, stride=1, padding=1)
self.fc = nn.Linear(in_features=16 * 16 * 24, out_features=num_classes)
def forward(self, input):
output = self.conv1(input)
output = nn.ReLU()(output)
output = self.conv2(output)
output = nn.ReLU()(output)
output = self.pool(output)
output = self.conv3(output)
output = nn.ReLU()(output)
output = self.conv4(output)
output = nn.ReLU()(output)
output = output.view(-1, 16 * 16 * 24)
output = self.fc(output)
return output
model = SimpleNet().to(device=device)
print(model)
结果:
SimpleNet(
(conv1): Conv2d(3, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv4): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc): Linear(in_features=6144, out_features=10, bias=True)
)
量化操作:
这里使用的动态量化:
quantized_model = torch.quantization.quantize_dynamic(
model, {nn.LSTM, nn.Linear}, dtype=torch.qint8
)
print(quantized_model)
结果:
SimpleNet(
(conv1): Conv2d(3, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv2): Conv2d(12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv3): Conv2d(12, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv4): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc): DynamicQuantizedLinear(in_features=6144, out_features=10, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
)
看下到底压缩了多少:
import os
def print_size_of_model(model):
torch.save(model.state_dict(), "temp.p")
print('Size (MB):', os.path.getsize("temp.p")/1e6)
os.remove('temp.p')
print_size_of_model(model)
print_size_of_model(quantized_model)
结果:
Size (MB): 0.287049
Size (MB): 0.103451
压缩的效果还是比较明显的。
量化只是对模型的权重数值进行了优化,上面的是权重值的对比,看起来还是不错的
summary(model.cuda(), input_size=(3, 512, 512))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 12, 512, 512] 336
Conv2d-2 [-1, 12, 512, 512] 1,308
MaxPool2d-3 [-1, 12, 256, 256] 0
Conv2d-4 [-1, 24, 256, 256] 2,616
Conv2d-5 [-1, 24, 256, 256] 5,208
Linear-6 [-1, 10] 61,450
================================================================
Total params: 70,918
Trainable params: 70,918
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 3.00
Forward/backward pass size (MB): 78.00
Params size (MB): 0.27
Estimated Total Size (MB): 81.27
----------------------------------------------------------------
从上面的数据可以看到,模型权重数值(Params size (MB): 0.27)只是占一部分 ,在这个模型中占用的比例很小
由于量化后的模型没法使用summary,所以没法直接观察