关于计算量(FLOPs)参数量(Params)的一个直观理解,便是计算量对应时间复杂度,参数量对应空间复杂度,即计算量要看网络执行时间的长短,参数量要看占用显存的量。
计算量: FLOPs,FLOP时指浮点运算次数,s是指秒,即每秒浮点运算次数的意思,考量一个网络模型的计算量的标准。越小越好
参数量: Params,是指网络模型中需要训练的参数总数。越小越好
了解以上概念后,接下来便是如何计算这两个值。
一个很常见的方法便是通过ptflos
包来实现。
# -- coding: utf-8 --
import torchvision
from ptflops import get_model_complexity_info
model = torchvision.models.alexnet(pretrained=False)
flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
print('flops: ', flops, 'params: ', params)
这段代码可以说是即插即用。
博主以DAB-DETR模型为例,运行时报错,这是由于权重文件于模型配置文件不匹配导致的
RuntimeError: Error(s) in loading state_dict for DABDeformableDETR:
size mismatch for input_proj.0.0.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
size mismatch for input_proj.1.0.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
size mismatch for input_proj.2.0.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for input_proj.3.0.weight: copying a param with shape torch.Size([256, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
修改num_channels的值即可,原本为【128,256,512】
if return_interm_layers:
# return_layers = {"layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"}
return_layers = {"layer2": "0", "layer3": "1", "layer4": "2"}
self.strides = [8, 16, 32]
self.num_channels = [512, 1024, 2048]
推理代码如下:几乎所有的DETR类模型的推理代码都是可以通用的。
import json
import os, sys
import torch
import numpy as np
from models import build_DABDETR
from models.dab_deformable_detr import build_dab_deformable_detr
from util.slconfig import SLConfig
from datasets import build_dataset
from util.visualizer import COCOVisualizer
from util import box_ops
model_config_path = "D:/graduate/others/DAB-DETR/config.json" # change the path of the model config file
model_checkpoint_path = "D:/graduate/others/DAB-DETR/checkpoint.pth" # change the path of the model checkpoint
# See our Model Zoo section in README.md for more details about our pretrained models.
args = SLConfig.fromfile(model_config_path)
model, criterion, postprocessors = build_DABDETR(args)
checkpoint = torch.load(model_checkpoint_path, map_location='cpu')
model.load_state_dict(checkpoint['model'])
_ = model.eval()
with open('util/coco_id2name.json') as f:
id2name = json.load(f)
id2name = {int(k): v for k, v in id2name.items()}
from PIL import Image
import datasets.transforms as T
image = Image.open("./figure/4.jpg").convert("RGB") # load image
# transform images
transform = T.Compose([
T.RandomResize([800], max_size=1333),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image, _ = transform(image, None)
from ptflops import get_model_complexity_info
model=model.to(args.device)
flops, params = get_model_complexity_info(model, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
print('flops: ', flops, 'params: ', params)
# predict images
with torch.no_grad():
output = model.cuda()(image[None].cuda())
# visualize outputs
output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0]
thershold = 0.5 # set a thershold
vslzr = COCOVisualizer()
scores = output['scores']
print(len(scores))
labels = output['labels']
boxes = box_ops.box_xyxy_to_cxcywh(output['boxes'])
select_mask = scores > thershold
box_label = [id2name[int(item)] for item in labels[select_mask]]
pred_dict = {
'boxes': boxes[select_mask],
'size': torch.Tensor([image.shape[1], image.shape[2]]),
'box_label': box_label
}
vslzr.visualize(image, pred_dict, savedir=None, dpi=120)
DN-DETR模型推理代码与DAB-DETR模型推理代码大同小异,但问题却不尽相同。
indicator0 = torch.zeros([num_queries * num_patterns, 1]).cuda()
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'
空值问题,给num_patterns赋值=1即可
boxes = boxes * scale_fct[:, None, :]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
数据有的在cpu上,有的在gpu上,在boxes = boxes * scale_fct[:, None, :]后面加上.cuda()
此外,还会报错tuple的转换问题
TypeError: tuple indices must be integers or slices, not str
将下面的代码
out_logits, out_bbox = outputs['pred_logits'], outputs['pred_boxes']
改为:
out_logits=outputs[0]['pred_logits']
out_bbox = outputs[0]['pred_boxes']
至此,DN-DETR模型推理代码修改无误,但在计算参数量时却出现问题:
File "D:\Anaconda\envs\deformable_detr\lib\site-packages\ptflops\pytorch_ops.py", line 162, in multihead_attention_counter_hook
q, k, v = input
ValueError: not enough values to unpack (expected 3, got 2)
这里可以看到报错是参数数量出现了问题,我们找到原来的代码,将q, k, v = input
改为:
q, k= input, v=k
同样的,这里也报了数据计算位置不一致的问题,如法炮制即可。
File "E:\graduate\papers\DN-DETR\DN-DETR-main\models\DN_DAB_DETR\DABDETR.py", line 458, in forward
boxes = boxes * scale_fct[:, None, :]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
由于DN-DAB-Deformable-DETR与DN-DAB-DETR共用一套代码,这里出了问题。
q, k= input
ValueError: too many values to unpack (expected 2)
我们查看一下input的长度,共有三个值,那么原本的写法就没有问题了,改为原本写法即可。
q, k, v= input
报错batch-size问题,其实很好解决,因为我们只是推理,只有一张图片,那么只需要设置为1即可。
至此,DETR类模型推理与计算量,参数量计算解决了。
随后便是YOLO模型,其计算方式类似,原本博主将上面的代码直接拿过来用,但发现却出问题了。
参数量始终为0,这让我百思不得其解。
随后博主换了另一个工具包。
from thop import profile
print('==> Building model..')
input = torch.randn(1, 3, 224,224)
input = input.cuda()
flops, params = profile(model, (input,))
print('flops: %.2f M, params: %.2f M' % (flops / 1e6, params / 1e6))
就OK了,与DETR模型一样,我们将其放到模型推理代码中直接就可以了。