先说点废话,为啥要做Yolo-tiny-v4权重和中间层输出参数提取?这图花了拼凑起来大概一天半的时间!一直在KerasTensor
上被debug恶心到!为了后来者尝试使用Yolo-Tiny-v4的量化研究上少走点弯路,或者顺带demo一下,而且也为了自己接下来的研究有个备份,记录一下这次尝试!
本次撰文重在技术实现,关于原理可花时间搞明白!不过我也先给自己挖个坑,将来这个系列会对Yolo-Tiny-v4乃至Yolo系列的算法与软硬件结合、商业化/学界经典的加速器架构…一一梳理
老板看到2020年DAC上的一篇文章 《Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick》,论文提出了一种加速器架构的改进手段(咱就是认为在挤牙膏),将训练好的ResNet-18对某张图(流氓点1:没指名道姓用了哪个dataset
)预测的conv卷积核、fc卷积核、conv输出特征图、fc输出按照数据精度为8bit的方式对原单精度浮点数进行压缩,并统计计数bit位为1、2、3…8的情况,分析发现,无论是Filter还是Output Feature Map,低bit数占比更高(流氓点2:没说明压缩后的ResNet-18预测精度损失率是不是在容忍范围内;Float压缩到INT的灵活度太大了
)。于是乎,作者觉得大部分Filter和Output Feature Map可以调整成低bit存到片外,少部分数据仍维持高bit。借助这样的存储方式,理论上确实可以提高片外存储row buffer line的数据个数,对片外存储的利好真得不少,而且对PE的性能、能效也十分有利,这个对内存改善利用的思路虽然算法模型上漏洞不少,但还是给人点启发。当然作者额外设计了可并行的decoder解码数据,设计细节作者并没有详细讲,咱不管了(先挖几个坑)!回归正题,Yolo-tiny-v4权重和中间层输出参数
,先上个图好好欣赏一下
这个图的图例包括fc
和res block
的imap(input feature map)
和fmap(filter map)
,图的大概意思就是两个map参数在量化之后的累计分布。现在要在Yolo-tiny-v4上复现类似的图,那目标就得放在Yolo-tiny-v4的21层conv2d
的Filter
和Onput Feature Map
上,该网络没有fc
。
Tip1:提取权重和特征图参数的系统工作挺少的,反正我算第一个半保姆级教程!
Tip2:本次尝试使用已经训练好的权重文件,当然也可以选择自行训练,不过咱取个数据就意思一下,有时间再体验漫长的训练过程!
Tip3:本次尝试并没有对量化后的权重、特征图重新带入网络模型中预测!食用后可以考虑把量化后的权重、特征图带回网络预测!
Tip4:本次尝试简单地统计量化到最少bit存储的情况!
./model_data
.h5
文件可以自行阅览,以下源码可尝试 (注释的部分可自行解注释,运行前看一下文件路径对不对) 不出意外,可以看到.h5
文件以dict
形式存在,下面的f.keys()
将输出第一层keys
,试了之后发现嵌套了三层keys。
import h5py
import numpy as np
import os
#f = h5py.File('yolov4_tiny_weights_voc.h5','r')
f = h5py.File('yolov4_tiny_weights_coco.h5','r')
print(f.keys())
.h5
文件具备的情况下,该文件非必要。但该文件用于将.weights转换为.h5文件。 (运行前看一下文件名是否正确)#!/bin/bash
#python convert.py ///yolov4-tiny.cfg ///yolov4-tiny.weights ///yolov4-tiny.h5
python convert.py ///yolo4.cfg ///yolo4.weights ///yolov4.h5
可细看convert.py的argparse模块定义的argument,可通过命令行传入别的参数。
tensorflow2
scipy==1.4.1
numpy==1.18.4
matplotlib==3.2.1
opencv_python==4.2.0.34
tensorflow_gpu==2.2.0
tqdm==4.46.1
Pillow==8.2.0
h5py==2.10.0
建议用yolov4-tiny-tf2-master
文件夹内的requirement.txt
安装环境
pip3 install -r requirements.txt
本节将简单介绍从.h出发看Yolo-Tiny-v4网络并提取权重数据
import h5py
import numpy as np
import os
def flat(nums):
res = []
for i in nums:
if isinstance(i, list):
res.extend(flat(i))
else:
res.append(i)
return res
#f = h5py.File('yolov4_tiny_weights_voc.h5','r')
f = h5py.File('yolov4_tiny_weights_coco.h5','r')
print(f.keys())
将直接得到下面的layer信息,这一点会在之后的第4节直接使用keras.models.Model
生成summary
来印证。
使用以下源码将直接生成无四层嵌套的单维list的txt文件
import h5py
import numpy as np
import os
#f = h5py.File('yolov4_tiny_weights_voc.h5','r')
f = h5py.File('yolov4_tiny_weights_coco.h5','r')
fkeys_box = ['conv2d', 'conv2d_1', 'conv2d_10', 'conv2d_11', 'conv2d_12', 'conv2d_13', 'conv2d_14', 'conv2d_15', 'conv2d_16', 'conv2d_17', 'conv2d_18', 'conv2d_19', 'conv2d_2', 'conv2d_20', 'conv2d_3', 'conv2d_4', 'conv2d_5', 'conv2d_6', 'conv2d_7', 'conv2d_8', 'conv2d_9']
### output to .txt
dirpath = os.path.dirname(os.path.abspath(__file__))
output_txt = os.path.join(dirpath, 'weight.txt')
filetxt = open(output_txt, 'w')
dict_temp = dict()
for i in fkeys_box:
dict_temp[i] = f[i][i]['kernel:0'][:]
#print(f[i][i]['kernel:0'][:].shape)
for k,v in dict_temp.items():
filetxt.write(str(k) + ' ')
for L1 in range(v.shape[0]):
for L2 in range(v.shape[1]):
for L3 in range(v.shape[2]):
for L4 in range(v.shape[3]):
filetxt.write(str(v[L1][L2][L3][L4]) +' ')
filetxt.write('\n')
filetxt.close()
这一块花了不少时间琢磨,debug凌乱的一整个心理过程直接跳过,这块在CSDN博客也算是写得比较少,我也算是抛砖引玉!
下面直接把正确操作展示一下!在yolo.py
文件的detect_image(self, image, crop = False)
定义部分out_boxes, out_scores, out_classes = self.get_pred(image_data, input_image_shape)
后添加如下:
self.yolo_model.summary()
for i,layer in enumerate(self.yolo_model.layers):
print(i,layer.name)
conv2d_box = []
conv2d_list_whole = {}
#### 筛选所有层名称中含有conv的网络层
for layer in self.yolo_model.layers:
tmp = []
if 'conv2d' in layer.name:
conv2d_box.append(layer)
conv2d_layer_model = Model(inputs=self.yolo_model.input,outputs=self.yolo_model.get_layer(layer.name).output)
out = conv2d_layer_model.predict([image_data, input_image_shape])
#print(layer.name, ' type is ', type(out)) ### 全部输出numpy.ndarray
#print(layer.name, ' shape is ', out.shape) ### 全部输出(x,x,x,x)
for L1 in range(out.shape[0]):
for L2 in range(out.shape[1]):
for L3 in range(out.shape[2]):
for L4 in range(out.shape[3]):
tmp.append(out[L1][L2][L3][L4])
conv2d_list_whole[layer.name] =tmp
#print('\n')
#print(conv2d_list_whole)
# 先创建并打开一个文本文件
file_mid = open('layer_mid_results.txt', 'w')
# 遍历字典的元素,将每项元素的key和value分拆组成字符串,注意添加分隔符和换行符
for k,v in conv2d_list_whole.items():
file_mid.write(str(k)+' ')
for num in range(len(v)):
file_mid.write(str(v[num]) + ' ')
file_mid.write('\n')
# 注意关闭文件
file_mid.close()
使用上述self.yolo_model.summary()
可输出网络层信息
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, None, None, 0
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D) (None, None, None, 3 0 input_1[0][0]
__________________________________________________________________________________________________
conv2d (Conv2D) (None, None, None, 3 864 zero_padding2d[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, None, None, 3 128 conv2d[0][0]
__________________________________________________________________________________________________
leaky_re_lu (LeakyReLU) (None, None, None, 3 0 batch_normalization[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, None, None, 3 0 leaky_re_lu[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, None, None, 6 18432 zero_padding2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, None, 6 256 conv2d_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, None, None, 6 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, None, None, 6 36864 leaky_re_lu_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, None, 6 256 conv2d_2[0][0]
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, None, None, 6 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, None, None, 3 0 leaky_re_lu_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, None, None, 3 9216 lambda[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, None, 3 128 conv2d_3[0][0]
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, None, None, 3 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, None, None, 3 9216 leaky_re_lu_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, None, None, 3 128 conv2d_4[0][0]
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU) (None, None, None, 3 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, None, None, 6 0 leaky_re_lu_4[0][0]
leaky_re_lu_3[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, None, None, 6 4096 concatenate[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, None, None, 6 256 conv2d_5[0][0]
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU) (None, None, None, 6 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, None, None, 1 0 leaky_re_lu_2[0][0]
leaky_re_lu_5[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, None, None, 1 0 concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, None, None, 1 147456 max_pooling2d[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, None, None, 1 512 conv2d_6[0][0]
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU) (None, None, None, 1 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
lambda_1 (Lambda) (None, None, None, 6 0 leaky_re_lu_6[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, None, None, 6 36864 lambda_1[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, None, None, 6 256 conv2d_7[0][0]
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU) (None, None, None, 6 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, None, None, 6 36864 leaky_re_lu_7[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, None, None, 6 256 conv2d_8[0][0]
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU) (None, None, None, 6 0 batch_normalization_8[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, None, None, 1 0 leaky_re_lu_8[0][0]
leaky_re_lu_7[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, None, None, 1 16384 concatenate_2[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, None, None, 1 512 conv2d_9[0][0]
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU) (None, None, None, 1 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
concatenate_3 (Concatenate) (None, None, None, 2 0 leaky_re_lu_6[0][0]
leaky_re_lu_9[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, None, None, 2 0 concatenate_3[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, None, None, 2 589824 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, None, None, 2 1024 conv2d_10[0][0]
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU) (None, None, None, 2 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
lambda_2 (Lambda) (None, None, None, 1 0 leaky_re_lu_10[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, None, None, 1 147456 lambda_2[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, None, None, 1 512 conv2d_11[0][0]
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU) (None, None, None, 1 0 batch_normalization_11[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, None, None, 1 147456 leaky_re_lu_11[0][0]
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, None, None, 1 512 conv2d_12[0][0]
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU) (None, None, None, 1 0 batch_normalization_12[0][0]
__________________________________________________________________________________________________
concatenate_4 (Concatenate) (None, None, None, 2 0 leaky_re_lu_12[0][0]
leaky_re_lu_11[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, None, None, 2 65536 concatenate_4[0][0]
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, None, None, 2 1024 conv2d_13[0][0]
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU) (None, None, None, 2 0 batch_normalization_13[0][0]
__________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, None, None, 5 0 leaky_re_lu_10[0][0]
leaky_re_lu_13[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, None, None, 5 0 concatenate_5[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, None, None, 5 2359296 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, None, None, 5 2048 conv2d_14[0][0]
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU) (None, None, None, 5 0 batch_normalization_14[0][0]
__________________________________________________________________________________________________
conv2d_15 (Conv2D) (None, None, None, 2 131072 leaky_re_lu_14[0][0]
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, None, None, 2 1024 conv2d_15[0][0]
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU) (None, None, None, 2 0 batch_normalization_15[0][0]
__________________________________________________________________________________________________
conv2d_18 (Conv2D) (None, None, None, 1 32768 leaky_re_lu_15[0][0]
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, None, None, 1 512 conv2d_18[0][0]
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU) (None, None, None, 1 0 batch_normalization_17[0][0]
__________________________________________________________________________________________________
up_sampling2d (UpSampling2D) (None, None, None, 1 0 leaky_re_lu_17[0][0]
__________________________________________________________________________________________________
concatenate_6 (Concatenate) (None, None, None, 3 0 up_sampling2d[0][0]
leaky_re_lu_13[0][0]
__________________________________________________________________________________________________
conv2d_16 (Conv2D) (None, None, None, 5 1179648 leaky_re_lu_15[0][0]
__________________________________________________________________________________________________
conv2d_19 (Conv2D) (None, None, None, 2 884736 concatenate_6[0][0]
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, None, None, 5 2048 conv2d_16[0][0]
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, None, None, 2 1024 conv2d_19[0][0]
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU) (None, None, None, 5 0 batch_normalization_16[0][0]
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU) (None, None, None, 2 0 batch_normalization_18[0][0]
__________________________________________________________________________________________________
conv2d_17 (Conv2D) (None, None, None, 2 130815 leaky_re_lu_16[0][0]
__________________________________________________________________________________________________
conv2d_20 (Conv2D) (None, None, None, 2 65535 leaky_re_lu_18[0][0]
__________________________________________________________________________________________________
input_2 (InputLayer) [(1, 2)] 0
__________________________________________________________________________________________________
yolo_eval (Lambda) ((None, 4), (None,), 0 conv2d_17[0][0]
conv2d_20[0][0]
input_2[0][0]
==================================================================================================
Total params: 6,062,814
Trainable params: 6,056,606
Non-trainable params: 6,208
__________________________________________________________________________________________________
可对照前述输出网络层比对,Yolo-Tiny-v4的网络层如下:
可以在上面源码中添加self.yolo_model.get_weights()
,利用 Model
属性即可输出卷积权重,完成了第3部分想完成的工作!
修改/predict.py
中mode = "dir_predict"
,直接运行即可得到如下定位和分类结果
Found 12 boxes for img
b'person 0.99' 548 69 954 270
b'person 0.98' 501 915 999 1161
b'person 0.95' 525 475 859 683
b'person 0.92' 543 442 858 545
b'person 0.75' 570 359 701 408
b'person 0.74' 584 209 692 266
b'person 0.60' 564 402 709 445
b'person 0.54' 562 330 696 369
b'bicycle 0.93' 739 804 1023 1217
b'car 0.99' 587 652 765 962
b'car 0.86' 607 1 685 54
b'car 0.55' 547 594 719 719
如果不使用函数式编程方法得到输出,而直接利用self.yolo_model.output
无法得到中间层结果,即使在/nets/CSPdarknet53_tiny.py
和/nets/yolo.py
中添加.output
属性同样无法得到结果,会显示KerasTensor
,这一张量据能检索到的信息无法转变为可打印数据的数值形式。
解决方案是设定了一个函数式编程Model
conv2d_layer_model = Model(inputs=self.yolo_model.input,outputs=self.yolo_model.get_layer(layer.name).output)
out = conv2d_layer_model.predict([image_data, input_image_shape])
同样的方法可以对非conv2d
层结果输出,不再赘述!
你可以使用下述源码对权重文件和输出特征图文件量化
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import matplotlib.pyplot as mp
from math import pow
dirpath = os.path.dirname(os.path.abspath(__file__))
weights_file = 'weights.txt' ### TODO: 修改成你的weights.txt
mid_res_file = 'results.txt' ### TODO: 修改成你的results.txt
weights_path = os.path.join(dirpath, weights_file)
mid_res_path = os.path.join(dirpath, mid_res_file)
## TODO: step2 : transfer to list
weight_list = []
mid_res_list = []
file_weight = open(weights_path,'r')
for line in file_weight.readlines():
line = line.strip()
k = line.split(' ')[0]
v = line.split(' ')[1:-1]
for v_ele in v:
weight_list.append(float(v_ele))
file_weight.close()
file_res = open(mid_res_path,'r')
for line in file_res.readlines():
line = line.strip()
k = line.split(' ')[0]
v = line.split(' ')[1:-1]
for v_ele in v:
mid_res_list.append(float(v_ele))
file_res.close()
## TODO: step1 : extract the biggest and smallest values
## 此处量化不考虑负数的补码,不考虑就近约化,后续截断所有小数位
weight_list_max, weight_list_min = np.max(np.abs(weight_list)), np.min(np.abs(weight_list))
res_list_max, res_list_min = np.max(np.abs(mid_res_list)), np.min(np.abs(mid_res_list))
## TODO: step2 : bitwidth
#### configurably pow function
weight_upper, result_upper = 2, 0 ### 9, 7 这里采用科学计数法,指weight和output小数点右移若干位,动态数据精度调控
weight_bigger_list = [int(i*pow(10,weight_upper)) for i in weight_list]
result_bigger_list = [int(j*pow(10,result_upper)) for j in mid_res_list]
weight_bigger_bit_list = [len(str(bin(i)).split('b')[1]) for i in weight_bigger_list]
result_bigger_bit_list = [len(str(bin(j)).split('b')[1]) for j in result_bigger_list]
weight_dict, result_dict = {}, {}
for weight_bitlen in weight_bigger_bit_list:
weight_dict[weight_bitlen] = weight_dict.get(weight_bitlen, 0) + 1
for result_bitlen in result_bigger_bit_list:
result_dict[result_bitlen] = result_dict.get(result_bitlen, 0) + 1
weight_dict_keys=sorted(weight_dict.keys())
weight_dict_vals=[weight_dict[i] for i in weight_dict_keys]
weight_v_sum = np.sum(weight_dict_vals)
result_dict_keys=sorted(result_dict.keys())
result_dict_vals=[result_dict[i] for i in result_dict_keys]
result_v_sum = np.sum(result_dict_vals)
## TODO: step3 : draw
weight_draw_keys = weight_dict_keys[0:7]
weight_draw_vals = [i / weight_v_sum * 100 for i in weight_dict_vals[0:7]]
weight_draw_vals = [np.sum(weight_draw_vals[0:i]) for i in range(len(weight_draw_vals))]
result_draw_keys = result_dict_keys[0:7]
result_draw_vals = [i / result_v_sum * 100 for i in result_dict_vals[0:7]]
result_draw_vals = [np.sum(result_draw_vals[0:i]) for i in range(len(result_draw_vals))]
weight_draw_keys.append(8)
result_draw_keys.append(8)
weight_draw_vals.append(weight_draw_vals[-1])
result_draw_vals.append(result_draw_vals[-1]) ##这个地方我偷点懒,两list长度达不到8,修改数据精度后可超过8,自行尝试即可
fig, ax1 = plt.subplots()
lns1 = ax1.plot(weight_draw_keys, weight_draw_vals, c='orangered', label='Filter', marker='o', markersize=3, markerfacecolor='white')
lns2 = ax1.plot(result_draw_keys, result_draw_vals, c='green', label='Input Feature Map', marker='o', markersize=3, markerfacecolor='white')
lns = lns1+lns2
labs = [l.get_label() for l in lns]
ax1.legend(lns, labs, loc=0)
ax1.set_title("Percentage of values - Width",size=15)
ax1.set_xlabel('Width(bit)',size=10)
ax1.set_ylabel('Percentage of values(%)',size=10)
plt.fill_between(x=result_draw_keys, y1=0, y2=result_draw_vals,facecolor='green' , alpha=0.2)
plt.fill_between(x=weight_draw_keys, y1=result_draw_vals, y2=weight_draw_vals, facecolor='orangered', alpha=0.2)
plt.show()