TensorFlow量化训练

前段时间研究了tflite和量化相关的操作, 经测试量化尤其在具有专门DSP加速的硬件上(比如MTK8183)有着很好的加速效果,大约3X的提升;

tensorflow提供了tflite转化工具toco,使用命令大致如下:

bazel-bin/tensorflow/contrib/lite/toco/toco --input_file=mobilenet_v1_1.0_128_frozen.pb \
  --input_format=TENSORFLOW_GRAPHDEF \
  --output_format=TFLITE \
  --output_file=/tmp/mobilenet_v1_1.0_128.tflite \
  --inference_type=FLOAT \
  --input_data_types=FLOAT \
  --input_arrays=input \
  --output_arrays=MobilenetV1/Predictions/Reshape_1 \
  --input_shapes=1,128,128,3 \
  --logtostderr

这里面遇到的一个坑是,发现用pip直接安装的tensorflow, 运行这个命令会包error:, 因为option的名字不一样,那是用python重新封装的,而从源码编译toco(用官方的bazel工具)是C++的;

上述命令只是转化为tflite的格式,如果需要量化,只需要改成这样:

bazel-bin/tensorflow/contrib/lite/toco/toco --input_file=mobilenet_v1_1.0_128_frozen.pb \
  --input_format=TENSORFLOW_GRAPHDEF \
  --output_format=TFLITE \
  --output_file=/tmp/mobilenet_v1_1.0_128.tflite \
  --inference_type=QUANTIZED_UINT8 \
  --input_data_types=QUANTIZED_UINT8 \
  --input_arrays=input \
  --output_arrays=MobilenetV1/Predictions/Reshape_1 \
  --input_shapes=1,128,128,3 \
  --logtostderr

对于, 一般的模型, 它会报错, 原因是没有提供MinMax的范围(量化需要这两个值进行缩放), 对于不同的类型,这两个值的获取方式不一样,对于里面的weights比较好办,只要统计一下最大最小值就好了,但是对于graph里面的其他tensor,因为输入的不同,所以每层输出的值也会不同,这里就需要在训练的时候去统计每个node的[Min, Max]; 当然你可以用这个语句去手动制定范围,但是精度可能就会收到很大影响.

tf.quantization.fake_quant_with_min_max_args
而且使用toco的时候需要加上:
--reorder_across_fake_quant=true

或者你也可以在使用toco的时候加上:

--default_ranges_min=0 \
--default_ranges_max=1

但是这样就更加糟糕了,它会给所有没有[Min,Max]的node,使用这两个值作为默认值

所以,由此我们知道:

1) 我们需要在训练的时候统计[Min, Max]

2) 为了量化能有更好的精度,我们希望[Min, Max]最好在一个比较小的范围内, 所以一些best practise包括: a)使用relu6作为激励函数; b) 使用batch normalization

Google提供了 quantization-aware的训练方法,可以参考 https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize

简单总结下:

1) 在你定义好网络结构之后, 加上

tf.contrib.quantize.create_training_graph(input_graph=g,
                                          quant_delay=2000000)

它会把网络重新改造,具体可以看tensorboar;

这里面,要注意quant_delay这个参数, 这个参数的意思的在此之前,网络会使用float进行正常的训练,收敛到一个比较好的水平;之后,会执行一个fake-quant的操作.

改造之前的一个卷积层:

改造之后的一个卷积层:

关键的操作就在于这个act_quant节点,统计了最大最小的值,然后执行量化操作,最上面的delay_quant里面有两个switch op, 就用根据设置的quant_delay来决定是否使用量化的结果进行forward

2) 按照正常的方式保存你的checkpoint;

3) 导出用于eval的网络,这个最好是在另外一个文件里面(否则可能会有这个问题 https://github.com/tensorflow/tensorflow/issues/19936),同样的方式创建网络结构后,调用:

tf.contrib.quantize.create_eval_graph(input_graph=g)

然后load checkpoint,最后保存graph和新的checkpoint.

tf.train.write_graph(sess.graph_def, output_dir, 'graph.pb', as_text=True)
saver = tf.train.Saver(max_to_keep=100)
saver.save(sess, './workspace/'+args.model+'/chk', global_step=1)

4) freeze graph

python3 -m tensorflow.python.tools.freeze_graph \
--input_graph=./graph.pb \
--output_graph=exported_freezed_inference_graph.pb \
--input_checkpoint=./chk-1 \
--output_node_names="your_output_name"

5) 现在就可以使用toco命令导出量化的模型了;

也许等你执行完以上操作,发现自己的模型里面仍然会报某些node还是没有[Min, Max], 根据我的观察,在某些情况下(具体不明),发现只有添加了BatchNorm的层在调用cerate_train_graph的时候才会添加fake_quant; 但是有的时候不需要BatchNorm也可以,很奇怪~

最后,你还可以用这个命令查看你到处tflite量化模型的op和[Min,Max]的范围

bazel-bin/tensorflow/contrib/lite/toco/toco \
--input_file=exported_freezed_inference_graph.pb \
--input_format=TENSORFLOW_GRAPHDEF   \
--output_format=GRAPHVIZ_DOT   \
--output_file=exported_freezed_inference_graph_opt.dot   \
--inference_type=QUANTIZED_UINT8 \
--input_data_types=QUANTIZED_UINT8  \
--input_arrays=image   \
--output_arrays=Openpose/MConv_Stage1_L_5_pointwise/Conv2D \
--input_shapes=1,128,128,3   \
--logtostderr \
--default_ranges_min=0 \
--default_ranges_max=1

dot -Tpdf exported_freezed_inference_graph_opt.dot -o exported_freezed_inference_graph_opt.pdf

TensorFlow量化训练

你可能感兴趣的:(TensorFlow量化训练)