FLOPS
:注意全大写,是floating point operations per second
的缩写,意指每秒浮点运算次数,理解为计算速度
。是一个衡量硬件性能的指标。FLOPs
: 注意s小写,是浮点运算量floating point operations
的缩写(s表复数),意指浮点运算数,理解为计算量
。可以用来衡量算法/模型的复杂度
。
下面的模型以vgg16为例进行介绍。
(1)统计所有参数,包括可学习和不学习的
sum(p.numel() for p in model.parameters())
(2)只统计可学习的参数
sum(p.numel() for p in model.parameters() if p.requires_grad)
举例
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
# from torchstat import stat
# stat(model.to(device), (3,224,224))
a = sum(p.numel() for p in model.parameters())
b = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(a)
print(b)
输出:
138357544
138357544
params = list(model.parameters())
num_params = 0
for param in params:
curr_num_params = 1
for size_count in param.size():
curr_num_params *= size_count
num_params += curr_num_params
print("total number of parameters: " + str(num_params))
pip install torchsummary
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
import torchsummary
torchsummary.summary(model.cuda(),(3,244,244))
输出:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 244, 244] 1,792
ReLU-2 [-1, 64, 244, 244] 0
Conv2d-3 [-1, 64, 244, 244] 36,928
ReLU-4 [-1, 64, 244, 244] 0
MaxPool2d-5 [-1, 64, 122, 122] 0
Conv2d-6 [-1, 128, 122, 122] 73,856
ReLU-7 [-1, 128, 122, 122] 0
Conv2d-8 [-1, 128, 122, 122] 147,584
ReLU-9 [-1, 128, 122, 122] 0
MaxPool2d-10 [-1, 128, 61, 61] 0
Conv2d-11 [-1, 256, 61, 61] 295,168
ReLU-12 [-1, 256, 61, 61] 0
Conv2d-13 [-1, 256, 61, 61] 590,080
ReLU-14 [-1, 256, 61, 61] 0
Conv2d-15 [-1, 256, 61, 61] 590,080
ReLU-16 [-1, 256, 61, 61] 0
MaxPool2d-17 [-1, 256, 30, 30] 0
Conv2d-18 [-1, 512, 30, 30] 1,180,160
ReLU-19 [-1, 512, 30, 30] 0
Conv2d-20 [-1, 512, 30, 30] 2,359,808
ReLU-21 [-1, 512, 30, 30] 0
Conv2d-22 [-1, 512, 30, 30] 2,359,808
ReLU-23 [-1, 512, 30, 30] 0
MaxPool2d-24 [-1, 512, 15, 15] 0
Conv2d-25 [-1, 512, 15, 15] 2,359,808
ReLU-26 [-1, 512, 15, 15] 0
Conv2d-27 [-1, 512, 15, 15] 2,359,808
ReLU-28 [-1, 512, 15, 15] 0
Conv2d-29 [-1, 512, 15, 15] 2,359,808
ReLU-30 [-1, 512, 15, 15] 0
MaxPool2d-31 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-32 [-1, 512, 7, 7] 0
Linear-33 [-1, 4096] 102,764,544
ReLU-34 [-1, 4096] 0
Dropout-35 [-1, 4096] 0
Linear-36 [-1, 4096] 16,781,312
ReLU-37 [-1, 4096] 0
Dropout-38 [-1, 4096] 0
Linear-39 [-1, 1000] 4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.68
Forward/backward pass size (MB): 258.51
Params size (MB): 527.79
Estimated Total Size (MB): 786.98
----------------------------------------------------------------
pip install thop
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
from thop import profile
from thop import clever_format
my_input = torch.zeros((1,3,224,224)).to(device)
flops, params = profile(model.to(device), inputs = (my_input, ))
flops, parsms = clever_format([flops, params], '%.3f')
print(flops,params)
输出:
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.pooling.MaxPool2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.container.Sequential'>.
[INFO] Register count_adap_avgpool() for <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
15.470G 138357544.0
torchstat工具的输出比较多,推荐使用。
pip install torchstat
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
from torchstat import stat
stat(model.to(device), (3,224,224))
输出:
[MAdd]: AdaptiveAvgPool2d is not supported!
[Flops]: AdaptiveAvgPool2d is not supported!
[Memory]: AdaptiveAvgPool2d is not supported!
[MAdd]: Dropout is not supported!
[Flops]: Dropout is not supported!
[Memory]: Dropout is not supported!
[MAdd]: Dropout is not supported!
[Flops]: Dropout is not supported!
[Memory]: Dropout is not supported!
module name input shape output shape params memory(MB) MAdd Flops MemRead(B) MemWrite(B) duration[%] MemR+W(B)
0 features.0 3 224 224 64 224 224 1792.0 12.25 173,408,256.0 89,915,392.0 609280.0 12845056.0 3.15% 13454336.0
1 features.1 64 224 224 64 224 224 0.0 12.25 3,211,264.0 3,211,264.0 12845056.0 12845056.0 0.67% 25690112.0
2 features.2 64 224 224 64 224 224 36928.0 12.25 3,699,376,128.0 1,852,899,328.0 12992768.0 12845056.0 10.89% 25837824.0
3 features.3 64 224 224 64 224 224 0.0 12.25 3,211,264.0 3,211,264.0 12845056.0 12845056.0 0.65% 25690112.0
4 features.4 64 224 224 64 112 112 0.0 3.06 2,408,448.0 3,211,264.0 12845056.0 3211264.0 2.21% 16056320.0
5 features.5 64 112 112 128 112 112 73856.0 6.12 1,849,688,064.0 926,449,664.0 3506688.0 6422528.0 5.10% 9929216.0
6 features.6 128 112 112 128 112 112 0.0 6.12 1,605,632.0 1,605,632.0 6422528.0 6422528.0 0.09% 12845056.0
7 features.7 128 112 112 128 112 112 147584.0 6.12 3,699,376,128.0 1,851,293,696.0 7012864.0 6422528.0 9.20% 13435392.0
8 features.8 128 112 112 128 112 112 0.0 6.12 1,605,632.0 1,605,632.0 6422528.0 6422528.0 0.09% 12845056.0
9 features.9 128 112 112 128 56 56 0.0 1.53 1,204,224.0 1,605,632.0 6422528.0 1605632.0 1.07% 8028160.0
10 features.10 128 56 56 256 56 56 295168.0 3.06 1,849,688,064.0 925,646,848.0 2786304.0 3211264.0 4.85% 5997568.0
11 features.11 256 56 56 256 56 56 0.0 3.06 802,816.0 802,816.0 3211264.0 3211264.0 0.12% 6422528.0
12 features.12 256 56 56 256 56 56 590080.0 3.06 3,699,376,128.0 1,850,490,880.0 5571584.0 3211264.0 8.59% 8782848.0
13 features.13 256 56 56 256 56 56 0.0 3.06 802,816.0 802,816.0 3211264.0 3211264.0 0.05% 6422528.0
14 features.14 256 56 56 256 56 56 590080.0 3.06 3,699,376,128.0 1,850,490,880.0 5571584.0 3211264.0 7.07% 8782848.0
15 features.15 256 56 56 256 56 56 0.0 3.06 802,816.0 802,816.0 3211264.0 3211264.0 0.05% 6422528.0
16 features.16 256 56 56 256 28 28 0.0 0.77 602,112.0 802,816.0 3211264.0 802816.0 0.52% 4014080.0
17 features.17 256 28 28 512 28 28 1180160.0 1.53 1,849,688,064.0 925,245,440.0 5523456.0 1605632.0 4.05% 7129088.0
18 features.18 512 28 28 512 28 28 0.0 1.53 401,408.0 401,408.0 1605632.0 1605632.0 0.05% 3211264.0
19 features.19 512 28 28 512 28 28 2359808.0 1.53 3,699,376,128.0 1,850,089,472.0 11044864.0 1605632.0 7.27% 12650496.0
20 features.20 512 28 28 512 28 28 0.0 1.53 401,408.0 401,408.0 1605632.0 1605632.0 0.04% 3211264.0
21 features.21 512 28 28 512 28 28 2359808.0 1.53 3,699,376,128.0 1,850,089,472.0 11044864.0 1605632.0 7.17% 12650496.0
22 features.22 512 28 28 512 28 28 0.0 1.53 401,408.0 401,408.0 1605632.0 1605632.0 0.04% 3211264.0
23 features.23 512 28 28 512 14 14 0.0 0.38 301,056.0 401,408.0 1605632.0 401408.0 0.26% 2007040.0
24 features.24 512 14 14 512 14 14 2359808.0 0.38 924,844,032.0 462,522,368.0 9840640.0 401408.0 2.67% 10242048.0
25 features.25 512 14 14 512 14 14 0.0 0.38 100,352.0 100,352.0 401408.0 401408.0 0.03% 802816.0
26 features.26 512 14 14 512 14 14 2359808.0 0.38 924,844,032.0 462,522,368.0 9840640.0 401408.0 2.32% 10242048.0
27 features.27 512 14 14 512 14 14 0.0 0.38 100,352.0 100,352.0 401408.0 401408.0 0.03% 802816.0
28 features.28 512 14 14 512 14 14 2359808.0 0.38 924,844,032.0 462,522,368.0 9840640.0 401408.0 2.45% 10242048.0
29 features.29 512 14 14 512 14 14 0.0 0.38 100,352.0 100,352.0 401408.0 401408.0 0.03% 802816.0
30 features.30 512 14 14 512 7 7 0.0 0.10 75,264.0 100,352.0 401408.0 100352.0 0.09% 501760.0
31 avgpool 512 7 7 512 7 7 0.0 0.10 0.0 0.0 0.0 0.0 0.10% 0.0
32 classifier.0 25088 4096 102764544.0 0.02 205,516,800.0 102,760,448.0 411158528.0 16384.0 15.39% 411174912.0
33 classifier.1 4096 4096 0.0 0.02 4,096.0 4,096.0 16384.0 16384.0 0.05% 32768.0
34 classifier.2 4096 4096 0.0 0.02 0.0 0.0 0.0 0.0 0.06% 0.0
35 classifier.3 4096 4096 16781312.0 0.02 33,550,336.0 16,777,216.0 67141632.0 16384.0 2.64% 67158016.0
36 classifier.4 4096 4096 0.0 0.02 4,096.0 4,096.0 16384.0 16384.0 0.04% 32768.0
37 classifier.5 4096 4096 0.0 0.02 0.0 0.0 0.0 0.0 0.02% 0.0
38 classifier.6 4096 1000 4097000.0 0.00 8,191,000.0 4,096,000.0 16404384.0 4000.0 0.81% 16408384.0
total 138357544.0 109.39 30,958,666,264.0 15,503,489,024.0 16404384.0 4000.0 100.00% 783170624.0
============================================================================================================================================================
Total params: 138,357,544
------------------------------------------------------------------------------------------------------------------------------------------------------------
Total memory: 109.39MB
Total MAdd: 30.96GMAdd
Total Flops: 15.5GFlops
Total MemR+W: 746.89MB
pip install ptflops
import torch
import torchvision
model = torchvision.models.vgg16(pretrained = False)
device = torch.device('cpu')
model.to(device)
from ptflops import get_model_complexity_info
flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
print('Flops: ' + flops)
print('Params: ' + params)
输出:
VGG(
138.36 M, 100.000% Params, 15.5 GMac, 100.000% MACs,
(features): Sequential(
14.71 M, 10.635% Params, 15.38 GMac, 99.202% MACs,
(0): Conv2d(1.79 k, 0.001% Params, 89.92 MMac, 0.580% MACs, 3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(0, 0.000% Params, 3.21 MMac, 0.021% MACs, inplace=True)
(2): Conv2d(36.93 k, 0.027% Params, 1.85 GMac, 11.951% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(0, 0.000% Params, 3.21 MMac, 0.021% MACs, inplace=True)
(4): MaxPool2d(0, 0.000% Params, 3.21 MMac, 0.021% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(73.86 k, 0.053% Params, 926.45 MMac, 5.976% MACs, 64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(0, 0.000% Params, 1.61 MMac, 0.010% MACs, inplace=True)
(7): Conv2d(147.58 k, 0.107% Params, 1.85 GMac, 11.941% MACs, 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(0, 0.000% Params, 1.61 MMac, 0.010% MACs, inplace=True)
(9): MaxPool2d(0, 0.000% Params, 1.61 MMac, 0.010% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(295.17 k, 0.213% Params, 925.65 MMac, 5.971% MACs, 128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(0, 0.000% Params, 802.82 KMac, 0.005% MACs, inplace=True)
(12): Conv2d(590.08 k, 0.426% Params, 1.85 GMac, 11.936% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(0, 0.000% Params, 802.82 KMac, 0.005% MACs, inplace=True)
(14): Conv2d(590.08 k, 0.426% Params, 1.85 GMac, 11.936% MACs, 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(0, 0.000% Params, 802.82 KMac, 0.005% MACs, inplace=True)
(16): MaxPool2d(0, 0.000% Params, 802.82 KMac, 0.005% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(1.18 M, 0.853% Params, 925.25 MMac, 5.968% MACs, 256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(0, 0.000% Params, 401.41 KMac, 0.003% MACs, inplace=True)
(19): Conv2d(2.36 M, 1.706% Params, 1.85 GMac, 11.933% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(0, 0.000% Params, 401.41 KMac, 0.003% MACs, inplace=True)
(21): Conv2d(2.36 M, 1.706% Params, 1.85 GMac, 11.933% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(0, 0.000% Params, 401.41 KMac, 0.003% MACs, inplace=True)
(23): MaxPool2d(0, 0.000% Params, 401.41 KMac, 0.003% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(2.36 M, 1.706% Params, 462.52 MMac, 2.983% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(0, 0.000% Params, 100.35 KMac, 0.001% MACs, inplace=True)
(26): Conv2d(2.36 M, 1.706% Params, 462.52 MMac, 2.983% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(0, 0.000% Params, 100.35 KMac, 0.001% MACs, inplace=True)
(28): Conv2d(2.36 M, 1.706% Params, 462.52 MMac, 2.983% MACs, 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(0, 0.000% Params, 100.35 KMac, 0.001% MACs, inplace=True)
(30): MaxPool2d(0, 0.000% Params, 100.35 KMac, 0.001% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(0, 0.000% Params, 25.09 KMac, 0.000% MACs, output_size=(7, 7))
(classifier): Sequential(
123.64 M, 89.365% Params, 123.65 MMac, 0.798% MACs,
(0): Linear(102.76 M, 74.275% Params, 102.76 MMac, 0.663% MACs, in_features=25088, out_features=4096, bias=True)
(1): ReLU(0, 0.000% Params, 4.1 KMac, 0.000% MACs, inplace=True)
(2): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(3): Linear(16.78 M, 12.129% Params, 16.78 MMac, 0.108% MACs, in_features=4096, out_features=4096, bias=True)
(4): ReLU(0, 0.000% Params, 4.1 KMac, 0.000% MACs, inplace=True)
(5): Dropout(0, 0.000% Params, 0.0 Mac, 0.000% MACs, p=0.5, inplace=False)
(6): Linear(4.1 M, 2.961% Params, 4.1 MMac, 0.026% MACs, in_features=4096, out_features=1000, bias=True)
)
)
Flops: 15.5 GMac
Params: 138.36 M