可以在我都模板仓库的Utils文件夹中查看具体有关参数量(param)和浮点计算量(FLOPs)的计算代码。
即,网络参数量(param)和显存密切相关;浮点计算量(FLOPs)和GPU的计算速度相关。
网络中参数计算需要分为
需要关注的参数为(kernel_size,in_channel,out_channel)
完全版: c o n v _ p a r a m = ( k e r n e l _ s i z e ∗ i n _ c h a n n e l + b i a s ) ∗ o u t _ c h a n n e l conv\_param = (kernel\_size*in\_channel+bias)*out\_channel conv_param=(kernel_size∗in_channel+bias)∗out_channel,默认 b i a s = 1 bias=1 bias=1,out_channel是filter(代表卷积核个数),且每个卷积核都有对应的bias。
简略版: c o n v _ p a r a m = k e r n e l _ s i z e ∗ i n _ c h a n n e l ∗ o u t _ c h a n n e l conv\_param = kernel\_size*in\_channel*out\_channel conv_param=kernel_size∗in_channel∗out_channel,因为bias不会影响数量级的变化,一般可省略。
举个例子:
如下图所示:
image_size = 5x5x3
kernel_size = 3x3
in_channel = 3 (图像channel)
out_channel = 2 (卷积核数目\filter数目)
则该卷积层的参数个数为:
完全版: c o n v _ p a r a m = ( k e r n e l _ s i z e ∗ i n _ c h a n n e l + b i a s ) ∗ o u t _ c h a n n e l = = ( 3 ∗ 3 ∗ 3 + 1 ) ∗ 2 = 56 conv\_param = (kernel\_size*in\_channel+bias)*out\_channel==(3*3*3+1)*2 = 56 conv_param=(kernel_size∗in_channel+bias)∗out_channel==(3∗3∗3+1)∗2=56
简略版: c o n v _ p a r a m = k e r n e l _ s i z e ∗ i n _ c h a n n e l ∗ o u t _ c h a n n e l = 3 ∗ 3 ∗ 3 ∗ 2 = 54 conv\_param = kernel\_size*in\_channel*out\_channel =3*3*3*2 = 54 conv_param=kernel_size∗in_channel∗out_channel=3∗3∗3∗2=54
池化层不需要参数。例如 max_pooling:直接最大化池化就可以,无需参数。
全连接层有两种情况,一种是卷积层到全连接层,一种是全连接层到全连接层,因此需要分情况来讨论:
C o n v _ F C _ p a r a m = f e t u r e m a p _ s i z e ∗ i n _ c h a n n e l ∗ o u t _ n e u r a l Conv\_FC\_param = feturemap\_size*in\_channel*out\_neural Conv_FC_param=feturemap_size∗in_channel∗out_neural
feturemap_size : 前一层特征图尺寸
in_channel : 前一层卷积核个数
out_neural : 全连接层神经元个数
F C _ F C _ p a r a m = i n _ n e u r a ∗ ∗ o u t _ n e u r a l − b i a s FC\_FC\_param = in\_neura**out\_neural-bias FC_FC_param=in_neura∗∗out_neural−bias
bias = out_neural,每个神经元都有一个bias。一般可忽略bias。
Pytorch中计算网络中的参数量的包有很多,例如torchstat、thop、ptflops、torchsummary等等,这里将选取部分进行展示。
import torch
from torch import nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential( # input shape (1, 28, 28)
nn.Conv2d(
in_channels=1, # input height gray just have one level
out_channels=16, # n_filters
kernel_size=5, # filter size
stride=1, # filter movement/step
padding=2
), # output shape (16, 28, 28)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 14, 14)
)
self.conv2 = nn.Sequential( # input shape (16, 14, 14)
nn.Conv2d(16, 32, 5, 1, 2), # output shape (32, 14, 14)
nn.ReLU(), # activation
nn.MaxPool2d(2), # output shape (32, 7, 7)
)
self.out = nn.Linear(32 * 7 * 7, 10) # fully connected layer, output 10 classes
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
output = self.out(x)
return output, x # return x for visualization
cnn = CNN()
print(cnn) # net architecture
Torchstat包的使用
from torchstat import stat
# 导入模型,输入一张输入图片的尺寸
stat(cnn, (1, 28, 28))
输出结果:
module name input shape output shape params memory(MB) MAdd Flops MemRead(B) MemWrite(B) duration[%] MemR+W(B)
0 conv1.0 1 28 28 16 28 28 416.0 0.05 627,200.0 326,144.0 4800.0 50176.0 46.17% 54976.0
1 conv1.1 16 28 28 16 28 28 0.0 0.05 12,544.0 12,544.0 50176.0 50176.0 0.22% 100352.0
2 conv1.2 16 28 28 16 14 14 0.0 0.01 9,408.0 12,544.0 50176.0 12544.0 12.20% 62720.0
3 conv2.0 16 14 14 32 14 14 12832.0 0.02 5,017,600.0 2,515,072.0 63872.0 25088.0 35.98% 88960.0
4 conv2.1 32 14 14 32 14 14 0.0 0.02 6,272.0 6,272.0 25088.0 25088.0 0.05% 50176.0
5 conv2.2 32 14 14 32 7 7 0.0 0.01 4,704.0 6,272.0 25088.0 6272.0 5.03% 31360.0
6 out 1568 10 15690.0 0.00 31,350.0 15,680.0 69032.0 40.0 0.35% 69072.0
total 28938.0 0.16 5,709,078.0 2,894,528.0 69032.0 40.0 100.00% 457616.0
=========================================================================================================================================
Total params: 28,938
-----------------------------------------------------------------------------------------------------------------------------------------
Total memory: 0.16MB
Total MAdd: 5.71MMAdd
Total Flops: 2.89MFlops
Total MemR+W: 446.89KB
Torchinfo包的使用: (刚刚写完,发现torchsummary 更名为 torchinfo了,用这个)
pip install torchinfo
#from torchsummary import summary
from torchinfo import summary
# 导入模型,输入一张输入图片的尺寸
#summary(cnn.cuda(), input_size=(1, 28, 28), batch_size=-1)
batch_size = 1
summary(model, input_size=(batch_size, 1, 28, 28))
输出结果:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 16, 28, 28] 416
ReLU-2 [-1, 16, 28, 28] 0
MaxPool2d-3 [-1, 16, 14, 14] 0
Conv2d-4 [-1, 32, 14, 14] 12,832
ReLU-5 [-1, 32, 14, 14] 0
MaxPool2d-6 [-1, 32, 7, 7] 0
Linear-7 [-1, 10] 15,690
================================================================
Total params: 28,938
Trainable params: 28,938
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.32
Params size (MB): 0.11
Estimated Total Size (MB): 0.44
----------------------------------------------------------------
两种包的显示结果各有优劣,可以依据需求来使用。
Thop包的使用及Pytorch自带参数计算的使用:
from thop import profile
model = build_detection_model(cfg).cuda()
# 导入模型,输入一张输入图片的尺寸
print(model)
input = torch.randn(1, 3, 300, 300).cuda()
flop, para = profile(model, inputs=(input, ))
print('Flops:',"%.2fM" % (flop/1e6), 'Params:',"%.2fM" % (para/1e6))
total = sum([param.nelement() for param in model.parameters()])
print('Number of parameter: %.2fM' % (total/1e6))
参数占用显存=参数数目×n
n=4:float32
n=2:float16
n=8:double64
除此之外,batch_size输入的图片占据大部分显存。
模型尺寸就是模型的大小,我们一般使用参数量parameter来衡量,注意,它的单位是个。但是由于很多模型参数量太大,所以一般取一个更方便的单位:兆(M) 来衡量。比如ResNet-152的参数量可以达到60 million = 0.0006M。有些时候,model size在实际计算时除了包含参数量以外,还包括网络架构信息和优化器信息等。比如存储一个一般的CNN模型(ImageNet训练)需要大于300MB。
M和MB的换算关系:
比如说我有一个模型参数量是1M,在一般的深度学习框架中(比如说PyTorch),一般是32位存储。32位存储的意思就是1个参数用32个bit来存储。那么这个拥有1M参数量的模型所需要的存储空间的大小即为:1M * 32 bit = 32Mb = 4MB。因为1 Byte = 8 bit。现在的quantization技术就是减少参数量所占的位数:比如我用8位存储,那么:所需要的存储空间的大小即为:1M * 8 bit = 8Mb = 1MB。
浮点计算量:
FLOPs如何计算
参考:
https://blog.csdn.net/m0_51004308/article/details/118048504
https://blog.csdn.net/weixin_45292794/article/details/108227437
https://blog.csdn.net/Leo_whj/article/details/109636819