tensorflow模型量化篇(2)全整形量化及半浮点数量化、量化感知训练

文章目录

  • 1 全整形量化(Full integer quantization)
    • 1.1 训练一个keras模型并转换为tflite格式
    • 1.2 使用浮点回退量化(float fallback quantization)
    • 1.3 仅有integer的量化(integer-only quantization)
    • 1.4 半浮点数量化(float16 quantization)
    • 1.5 8bit权重16bit激活(integer quantization with int16 activations)
  • 2 量化感知训练
  • 章节导航
    • 上一篇:[tensorflow模型量化篇(1)量化方法及动态范围量化](https://blog.csdn.net/weixin_43490422/article/details/114961890)
    • 下一篇:待续

1 全整形量化(Full integer quantization)

在模型转换时将权重张量以及激活张量从32位浮点数量化为8bit整数

1.1 训练一个keras模型并转换为tflite格式

#数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0
#构建模型
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10, activation=tf.nn.softmax)
])

# 编译并训练
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(
  train_images, train_labels,
  epochs=5, validation_split=0.1,
)
Epoch 1/5
1688/1688 [==============================] - 8s 2ms/step - loss: 0.5397 - accuracy: 0.8512 - val_loss: 0.1348 - val_accuracy: 0.9643
Epoch 2/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.1416 - accuracy: 0.9593 - val_loss: 0.0937 	- val_accuracy: 0.9738
Epoch 3/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.0920 - accuracy: 0.9720 - 	val_loss: 0.0759 - val_accuracy: 0.9797
Epoch 4/5
1688/1688 [==============================] - 4s 2ms/step - loss: 0.0780 - accuracy: 0.9774 - val_loss: 0.0735 - val_accuracy: 0.9805
Epoch 5/5
1688/1688 [==============================] - 4s 3ms/step - loss: 0.0620 - accuracy: 0.9820 - val_loss: 0.0651 - val_accuracy: 0.9828

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
tflite_name = "tflite_model"
open(tflite_name, "wb").write(tflite_model)
83640

1.2 使用浮点回退量化(float fallback quantization)

为了量化变量(如输入、输出以及一些中间层的数据),我们需要一个RepresentativeDataset来代表这些数据的分布特征,如最大值最小值。
可以从训练集或验证集中选取大约100-500个数据。

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
23840

使用转换后的tf lite 模式的模型进行推断查看效果:

def evaluate(interpreter_path):
    #加载模型并分配张量
    interpreter = tf.lite.Interpreter(model_path=interpreter_path)
    interpreter.allocate_tensors()

    #获得输入输出张量.
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    import numpy as np
    index = input_details[0]['index']
    shape = input_details[0]['shape']
    acc_count = 0
    image_count = test_images.shape[0]
    for i in range(image_count):
        interpreter.set_tensor(index, test_images[i].reshape(shape).astype("float32"))
        interpreter.invoke()
        output_data = interpreter.get_tensor(output_details[0]['index'])
        label = np.argmax(output_data)
        if label == test_labels[i]:
            acc_count += 1
    print("test_images accuracy is {:.2%}".format(acc_count/(image_count)))
evaluate(tflite_name)
evaluate(FullInt_name)
test_images accuracy is 98.02%
test_images accuracy is 97.94%

大小从原来的83640b减小到23840b,也是大约4倍的缩减,精度下降了0.08%。

至此,模型中权重和激活值被量化为了8bit,但是为了保持兼容性,这种方式的量化里输入和输出张量仍是float32类型。
如果TensorFlow Lite没有包含某个操作的量化实现,此量化过程可能会留下浮点格式的操作,这也就是浮点回退量化的名字的原因。

1.3 仅有integer的量化(integer-only quantization)

此方法使得所有的张量都被量化为8bit,如果不能被顺利执行,就会抛出异常。
实现这种方法的步骤很简单,只需要在1.2的基础上增添几行代码即可。

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

#--------新增加的代码--------------------------------------------------------
# 确保量化操作不支持时抛出异常
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# 设置输入输出张量为uint8格式
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
#----------------------------------------------------------------------------

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)

这种方法的效果依据1.2中的步骤可以自行测试。

1.4 半浮点数量化(float16 quantization)

将量化方式改为float16 量化较为简单,只需要在1.2的基础上增加一行代码

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

#--------增加的代码--------------------------------------------------------
converter.target_spec.supported_types = [tf.float16]
#----------------------------------------------------------------------------

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)

结果对比如下:

83640
43488
test_images accuracy is 98.02%
test_images accuracy is 98.02%

可以看出模型缩小为原来的1/2,而准确率没有下降。

1.5 8bit权重16bit激活(integer quantization with int16 activations)

def representative_data_gen():
    for image in train_images[0:100,:,:]:
        yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
 
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

#--------增加的代码--------------------------------------------------------
converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8]
#----------------------------------------------------------------------------

tflite_model_quant = converter.convert()
#保存转换后的模型
FullInt_name = "quantify_Full.tflite"
open(FullInt_name, "wb").write(tflite_model_quant)
84684
25008
test_images accuracy is 98.02%
test_images accuracy is 98.02%

注:此方法仍在实验当中,如果报错提示没有EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8属性,请更新你的tensorflow 版本,实验环境此时为tensorflow == 2.4.1

2 量化感知训练

代码流程与上述流程并无太大差异,具体参考量化感知训练

章节导航

上一篇:tensorflow模型量化篇(1)量化方法及动态范围量化

下一篇:待续

你可能感兴趣的:(笔记,模型压缩优化,tensorflow,深度学习,神经网络,python)