常用量化有三种方式:
tf.lite.TFLiteConverter
https://tensorflow.google.cn/lite/api_docs/python/tf/lite/TFLiteConverter
为训练后量化,可直接根据keras模型结构,或模型保存文件(savedmodel),进行量化。
量化可分为只量化weights,只量化weights和activations,只量化weights、activation和input/output。
tensorflow_model_optimization.quantization
https://tensorflow.google.cn/model_optimization/guide/quantization/training_comprehensive_guide
量化感知训练。训练后量化更易使用,但量化感知训练在模型准确率方面的表现通常更好。
tf.quantization
针对参数进行假量化,即参数量化为如8bit的数值,但是参数类型还是float。
Tensorflow 2.x 官方文档:
tf.lite:https://tensorflow.google.cn/lite/api_docs/python/tf/lite
tf.lite.TFLiteConverter:https://tensorflow.google.cn/lite/api_docs/python/tf/lite/TFLiteConverter
quantization aware training:https://tensorflow.google.cn/model_optimization/guide/quantization/training
post-training quantization:https://tensorflow.google.cn/lite/performance/post_training_quantization
post-training integer quantization:https://tensorflow.google.cn/lite/performance/post_training_integer_quant
注:这里的网址都是可在国内直接打开的(即tensorflow.google.cn,而不是tensorflow.org)
ref: https://zhuanlan.zhihu.com/p/66346329
tflite是谷歌的一个轻量级推理库,主要用于移动端。tflite使用的思路主要是从预训练的模型转换为tflite模型文件,拿到移动端部署。tflite的源模型可以来自tensorflow的saved model或者frozen model,也可以来自keras。
TFLite是为了将深度学习模型部署在移动端和嵌入式设备的工具包,可以把训练好的TF模型通过转化、部署和优化三个步骤,达到提升运算速度,减少内存、显存占用的效果。
如下图,TFlite主要由Converter(左)和Interpreter(右)组成。Converter负责把TensorFlow训练好的模型转化,并输出为.tflite文件(FlatBuffer格式)。转化的同时,还完成了对网络的优化,如量化。Interpreter则负责把.tflite部署到移动端,嵌入式(embedded linux device)和microcontroller,并高效地执行推理过程,同时提供API接口给Python,Objective-C,Swift,Java等多种语言。简单来说,Converter负责打包优化模型,Interpreter负责高效易用地执行推理。
官网上显示主要有以下三种方式:
# Converting a SavedModel to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
# Converting a tf.Keras model to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Converting ConcreteFunctions to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_concrete_functions([func])
tflite_model = converter.convert()
如已有keras model, 如ds-cnn(ref:https://blog.csdn.net/u010637291/article/details/108257312)
github上ML-KWS-for-MCU:https://github.com/ARM-software/ML-KWS-for-MCU
## convert to tflite: without quantization
## Approach 1:
models.convert_from_unquant_keras_model_to_tflite(model, './saved_model/tflite/converted_from_unquant_keras_model.tflite')
def convert_from_unquant_keras_model_to_tflite(model, filename_tflite):
'''
OK
:param model:
:return:
'''
# Converting a tf.Keras model to a TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# To quant weights: ERROR (occured when using vela)
# converter.optimizations = [tf.lite.Optimize.DEFAULT] ## ERROR occured when using vela
tflite_model = converter.convert()
open(filename_tflite, "wb").write(tflite_model)
## Approach 2:
model.save('./saved_model/saved_keras_model')
models.convert_from_unquant_saved_model_to_tflite('./saved_model/saved_keras_model', './saved_model/tflite/converted_from_unquant_saved_model.tflite')
模型的保存可参考:https://blog.csdn.net/u010637291/article/details/107357308
def convert_from_unquant_saved_model_to_tflite(saved_model_dir, filename_tflite):
'''
OK
:return:
'''
if saved_model_dir != '':
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # only support .pb/.pbtxt
# Unsuccessfully quantize weights
# converter.optimizations = [tf.lite.Optimize.DEFAULT] # DEFAULT/OPTIMIZE_FOR_LATENCY/_LATENCY ERROR
tflite_model = converter.convert()
open(filename_tflite, "wb").write(tflite_model)
else:
print('saved_model_dir is empty, please specify it!')
注:以上生成的tflite均是未量化的。在实际中可能遇到需要量化后,再转换为tflite。
模型量化,通常分为训练中量化(Quantization aware training
)和训练后量化(Post-training quantization
)
Quantization aware training:https://tensorflow.google.cn/model_optimization/guide/quantization/training
Post-training quantization:https://tensorflow.google.cn/lite/performance/post_training_quantization
ref:https://tensorflow.google.cn/model_optimization/guide/quantization/training_comprehensive_guide
安装包:
pip uninstall -y tensorflow
pip install -q tf-nightly
pip install -q tensorflow-model-optimization
如已有一keras model:
# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0
# Define the model architecture.
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(10)
])
# Train the digit classification model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(
train_images,
train_labels,
epochs=1,
validation_split=0.1,
)
量化方式为:
import tensorflow_model_optimization as tfmot
quantize_model = tfmot.quantization.keras.quantize_model
# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)
# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
q_aware_model.summary()
实际上,如有自定义层,不支持,也没有找到解决方案
训练后量化概述:
对于已训练好的模型,可仅量化weights
(即fixed parameters),即运用 dynamic range quantization:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# This line
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model_quant = converter.convert()
即相对于未量化的模型转换为tflite,仅添加了一行converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_data_gen():
for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
# Model has only one input so each data point has one element.
yield [input_value]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
tflite_model_quant = converter.convert()
即需要设定converter.representative_dataset = representative_data_gen
def representative_data_gen():
for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
yield [input_value]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to uint8 (APIs added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()
即在上面基础上设定:
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8],
converter.inference_input_type = tf.uint8,
converter.inference_output_type = tf.uint8。
在将训练好的模型keras model利用tf.lite.TFLiteConverter
转换为tflite模型时,还可进行量化:
convert_from_quant_keras_model_to_tflite(model, rep_dataset, './saved_model/tflite/converted_from_quant_keras_model.tflite')
def convert_from_quant_keras_model_to_tflite(model, rep_dataset, filename_tflite):
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# Quantize weights: first set the optimizations flag to optimize for size
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
# Quantize variables
def representative_data_gen():
for input_value in rep_dataset.take(100):
yield [input_value]
converter.representative_dataset = representative_data_gen
# Quantize input/output tensors
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_quant_model = converter.convert()
# Write to tflite file
open(filename_tflite, "wb").write(tflite_quant_model)
model.save("./saved_model/saved_keras_model")
convert_from_quant_saved_model_to_tflite('./saved_model/saved_keras_model', rep_dataset, './saved_model/tflite/converted_from_quant_saved_model.tflite')
def convert_from_quant_saved_model_to_tflite(saved_model_dir, rep_dataset, filename_tflite):
'''
OK
:param saved_model_dir:
:param rep_dataset:
:param filename_tflite:
:return:
'''
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
# Quantize weighjts: first set the optimizations flag to optimize for size
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
# Quantize variables
def representative_data_gen():
for input_value in rep_dataset.take(100):
yield [input_value]
converter.representative_dataset = representative_data_gen
# Quantize input/output tensors
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_quant_model = converter.convert()
# Write to tflite file
open(filename_tflite, "wb").write(tflite_quant_model)
即现将模型保存,再将保存的模型用tf.lite.TFLiteConverter.from_saved_model
加载和量化,并生成tflite。
netron工具可很方便的查看多种类型的tensorflow模型,包括tensorflow lite模型:
主页:https://www.electronjs.org/apps/netron
github:https://github.com/lutzroeder/netron
下载:https://github.com/lutzroeder/netron/releases/tag/v4.5.1
Ubuntu:
snap install netron
下载.AppImage后未安装成功,下载方式为:
wget https://github.com/lutzroeder/netron/releases/tag/v4.5.1 -o Netron-4.5.1.AppImage
Ubuntu下新建虚拟环境,输入:
netron
再选择模型位置。示例,查看一个tensorflow lite 模型(.tflite):
如可查看模型整体架构,及每一层的参数,如上图中的卷积层的padding方式、stride步长、filter/bias等参数、输入输出tensor的数据及类型(如float32,则为未量化的模型)。
主页:https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/heads/master
vela参数说明:https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/heads/master/OPTIONS.md
安装要求:
tensorflow: Vela supports TensorFlow 2.1.0.
OS:Vela runs on the Linux operating system.
Others: Python >= 3.6, pip3, GNU toolchain (GCC, Binutils and libraries) or alternative C compiler/linker toolchain.
Ubuntu下安装:
# 创建虚拟环境(指定python版本为3.6,由于python3.6不是默认python3,可指定python3.6路径)
virtualenv -p /usr/local/bin/python3.6 ~/venv/vela
# 激活虚拟环境
source ~/venv/vela/bin/activate
# 安装 ethos-u-vela
/usr/local/bin/python3.6 -m pip install ethos-u-vela
# 经测试,通过下载git再安装,不成功。
vela path/to/network.tflite
经过vela工具的tflite模型前后对比如下:
1)原tflite模型(通过netron工具查看):
2)vela工具优化后(通过netron工具查看):