答案显然不是的,模型的运行时间和参数量不成正比,在这里,我对yolov5n模型进行对对比测试,发现结果如下
模型 | 模型压缩说明 | 模型大小(onnx) | 运行时间(ncnn) |
---|---|---|---|
yolov5n | yolov5n原始模型 | 7.1m | base mean time (秒): 0.0411731 |
yolov5n_0.3p | 模型压缩0.3倍 | 4.3m | 0.3prune mean time (秒): 0.0395994 |
yolov5n_0.6p | 模型压缩0.6倍 | 2.3m | 0.6 nochangep mean time (秒): 0.0413083 |
yolov5n_0.7p | 模型压缩0.7倍 | 1.9m | 0.7 nochange mean time (秒): 0.041187 |
yolov5n_0.7change_p | 模型压缩0.7倍,压缩做了优化修改 | 2.0m | 0.7changepurn mean time (秒): 0.0356186 |
在这里对模型压缩方式进行说明:初始版本对模型进行稀疏化训练,然后根据bn层的值的大小对模型进行减枝操作,但是在裁剪过程中你会发现一下情况:
prune0.7
==============================================================================================
| layer name | origin channels | remaining channels |
| model.0.bn | 16 | 13 |
| model.1.bn | 32 | 31 |
| model.2.cv1.bn | 16 | 16 |
| model.2.cv2.bn | 16 | 15 |
| model.2.cv3.bn | 32 | 32 |
| model.2.m.0.cv1.bn | 16 | 16 |
| model.2.m.0.cv2.bn | 16 | 16 |
| model.3.bn | 64 | 57 |
| model.4.cv1.bn | 32 | 32 |
| model.4.cv2.bn | 32 | 28 |
| model.4.cv3.bn | 64 | 57 |
| model.4.m.0.cv1.bn | 32 | 32 |
| model.4.m.0.cv2.bn | 32 | 32 |
| model.4.m.1.cv1.bn | 32 | 32 |
| model.4.m.1.cv2.bn | 32 | 32 |
| model.5.bn | 128 | 66 |
| model.6.cv1.bn | 64 | 64 |
| model.6.cv2.bn | 64 | 30 |
| model.6.cv3.bn | 128 | 54 |
| model.6.m.0.cv1.bn | 64 | 64 |
| model.6.m.0.cv2.bn | 64 | 64 |
| model.6.m.1.cv1.bn | 64 | 64 |
| model.6.m.1.cv2.bn | 64 | 64 |
| model.6.m.2.cv1.bn | 64 | 64 |
| model.6.m.2.cv2.bn | 64 | 64 |
| model.7.bn | 256 | 1 |
| model.8.cv1.bn | 128 | 128 |
| model.8.cv2.bn | 128 | 1 |
| model.8.cv3.bn | 256 | 4 |
| model.8.m.0.cv1.bn | 128 | 128 |
| model.8.m.0.cv2.bn | 128 | 128 |
| model.9.cv1.bn | 128 | 1 |
| model.9.cv2.bn | 256 | 18 |
| model.10.bn | 128 | 13 |
| model.13.cv1.bn | 64 | 56 |
| model.13.cv2.bn | 64 | 8 |
| model.13.cv3.bn | 128 | 50 |
| model.13.m.0.cv1.bn | 64 | 47 |
| model.13.m.0.cv2.bn | 64 | 51 |
| model.14.bn | 64 | 35 |
| model.17.cv1.bn | 32 | 18 |
| model.17.cv2.bn | 32 | 5 |
| model.17.cv3.bn | 64 | 60 |
| model.17.m.0.cv1.bn | 32 | 22 |
| model.17.m.0.cv2.bn | 32 | 27 |
| model.18.bn | 64 | 17 |
| model.20.cv1.bn | 64 | 19 |
| model.20.cv2.bn | 64 | 9 |
| model.20.cv3.bn | 128 | 75 |
| model.20.m.0.cv1.bn | 64 | 17 |
| model.20.m.0.cv2.bn | 64 | 38 |
| model.21.bn | 128 | 18 |
| model.23.cv1.bn | 128 | 11 |
| model.23.cv2.bn | 128 | 9 |
| model.23.cv3.bn | 256 | 63 |
| model.23.m.0.cv1.bn | 128 | 10 |
| model.23.m.0.cv2.bn | 128 | 30 |
=====================================================================================
在这里你会发现,压缩的模型会存在128->1的通道变化,其实这个在模型设计中是不太合理的,我之前在一个博客中看到,通道最好都设计成4、8、16、32的指数倍,所以我在这里萌生了一个想法,在根据通道bn层相应大小的准则下,进行通道数目的优化保存。于是对比0.7倍压缩和优化后的0.7倍压缩,虽然模型大小优化后的模型大小变大了,但是模型运行速度有了一个客观的提升。
def obtain_bn_mask_change(bn_module, thre):
thre = thre.cuda()
mask = bn_module.weight.data.abs().ge(thre).float()
weights_numpy =bn_module.weight.data.abs().cpu().numpy()
mask_length = int(mask.sum().item())
# aa =mask.sum().item()
if int(mask.sum().item())==0:
max_val=bn_module.weight.data.abs().max().item()-0.00001
mask = bn_module.weight.data.abs().ge(max_val).float()
return mask
def getchangemasklength(mask_length):
vals=[8,16,32,64,128,256,512]
muls=[]
for val in [8,16,32,64,128,256,512]:
len8 =abs(mask_length -val)
muls.append(len8)
index =muls.index(min(muls))
return vals[index]
def obtain_bn_maskchange(bn_module, thre):
thre = thre.cuda()
mask = bn_module.weight.data.abs().ge(thre).float()
weights_numpy =bn_module.weight.data.abs().cpu().numpy()
mask_length = int(mask.sum().item())
top=getchangemasklength(mask_length)
sort_numpy =np.sort(weights_numpy)
top_numpy =sort_numpy[:top]
mask = np.isin(weights_numpy, top_numpy)
return torch.from_numpy(mask).cuda()