在模型转换时将权重张量以及激活张量从32位浮点数量化为8bit整数
#数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0
#构建模型
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
# 编译并训练
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(
train_images, train_labels,
epochs=5, validation_split=0.1,
)
Epoch 1/5
1688/1688 [==============================] - 8s 2ms/step - loss: 0.5397 - accuracy: 0.8512 - val_loss: 0.1348 - val_accuracy: 0.9643
Epoch 2/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.1416 - accuracy: 0.9593 - val_loss: 0.0937 - val_accuracy: 0.9738
Epoch 3/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.0920 - accuracy: 0.9720 - val_loss: 0.0759 - val_accuracy: 0.9797
Epoch 4/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.0780 - accuracy: 0.9774 - val_loss: 0.0735 - val_accuracy: 0.9805
Epoch 5/5
1688/1688 [==============================] - 4s 3ms/step - loss: 0.0620 - accuracy: 0.9820 - val_loss: 0.0651 - val_accuracy: 0.9828
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
tflite_name = "tflite_model"
open(tflite_name, "wb").write(tflite_model)
83640
为了量化变量(如输入、输出以及一些中间层的数据),我们需要一个RepresentativeDataset来代表这些数据的分布特征,如最大值最小值。
可以从训练集或验证集中选取大约100-500个数据。
def representative_data_gen():
for image in train_images[0:100,:,:]:
yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
23840
使用转换后的tf lite 模式的模型进行推断查看效果:
def evaluate(interpreter_path):
#加载模型并分配张量
interpreter = tf.lite.Interpreter(model_path=interpreter_path)
interpreter.allocate_tensors()
#获得输入输出张量.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
import numpy as np
index = input_details[0]['index']
shape = input_details[0]['shape']
acc_count = 0
image_count = test_images.shape[0]
for i in range(image_count):
interpreter.set_tensor(index, test_images[i].reshape(shape).astype("float32"))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
label = np.argmax(output_data)
if label == test_labels[i]:
acc_count += 1
print("test_images accuracy is {:.2%}".format(acc_count/(image_count)))
evaluate(tflite_name)
evaluate(FullInt_name)
test_images accuracy is 98.02%
test_images accuracy is 97.94%
大小从原来的83640b减小到23840b,也是大约4倍的缩减,精度下降了0.08%。
至此,模型中权重和激活值被量化为了8bit,但是为了保持兼容性,这种方式的量化里输入和输出张量仍是float32类型。
如果TensorFlow Lite没有包含某个操作的量化实现,此量化过程可能会留下浮点格式的操作,这也就是浮点回退量化的名字的原因。
此方法使得所有的张量都被量化为8bit,如果不能被顺利执行,就会抛出异常。
实现这种方法的步骤很简单,只需要在1.2的基础上增添几行代码即可。
def representative_data_gen():
for image in train_images[0:100,:,:]:
yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
#--------新增加的代码--------------------------------------------------------
# 确保量化操作不支持时抛出异常
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# 设置输入输出张量为uint8格式
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
#----------------------------------------------------------------------------
tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
这种方法的效果依据1.2中的步骤可以自行测试。
将量化方式改为float16 量化较为简单,只需要在1.2的基础上增加一行代码
def representative_data_gen():
for image in train_images[0:100,:,:]:
yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
#--------增加的代码--------------------------------------------------------
converter.target_spec.supported_types = [tf.float16]
#----------------------------------------------------------------------------
tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
结果对比如下:
83640
43488
test_images accuracy is 98.02%
test_images accuracy is 98.02%
可以看出模型缩小为原来的1/2,而准确率没有下降。
def representative_data_gen():
for image in train_images[0:100,:,:]:
yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
#--------增加的代码--------------------------------------------------------
converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8]
#----------------------------------------------------------------------------
tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
84684
25008
test_images accuracy is 98.02%
test_images accuracy is 98.02%
注:此方法仍在实验当中,如果报错提示没有EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8属性,请更新你的tensorflow 版本,实验环境此时为tensorflow == 2.4.1
代码流程与上述流程并无太大差异,具体参考量化感知训练