tfs模型加速之固化和转半精度

attention标点fp16和fp32速度对比

NVIDIA-SMI Driver Version: 410.104 CUDA Version: 10.0 使用TensorFlow Serving Docker方式

model\batch length 1 8 16 32 64 128
fp32+freeze length64 2.67ms 3.95ms 5.61ms 10.30ms 17.45ms 33.21ms
length128 2.81ms 6.06ms 10.50ms 17.70ms 34.18ms 65.86ms
fp16+freeze length64 7.94ms 3.13ms 4.08ms 5.36ms 9.46ms 16.18ms
length128 2.79ms 4.05ms 5.59ms 9.44ms 16.17ms 31.35ms

freeze指固化后的模型,此处使用tfs方式,其他方式见保存格式转换

NVIDIA-SMI Driver Version: 418.56 CUDA Version: 10.1 TensorFlow Version: 1.13(GPU)

model\batch length 1 8 16 32 64 128
fp32+freeze length64 4.7ms 13.55ms 21.47ms 38.20ms
length128 6.09ms 27.52ms 39.60ms 98.16ms
fp16+freeze length64 7.94ms 29.08ms 49.44ms 82.36ms
length128 10.44ms 54.47ms 80.03ms 170.35ms
  • 由结果可知运算速度有大幅提升,且模型文件转换前fp32模型大小118M,转换后fp16模型大小74M;

  • 实验结果直接跑GPU无加速甚至速度降低,对比同精度同batch和length下的tfs和cpu计算结果,考虑计算瓶颈占比过大,硬件计算方面fp32比fp16明显快更多;

  • 而tfs速度加速明显,大batch下接近一倍,考虑模型大小降低和参数量大小降低,优化时间主要在数据读取方面;

转换代码如下:

import os,sys
import modeling
from modeling import BertConfig
import tensorflow as tf
import numpy as np
from tensorflow.python.framework import graph_util

if __name__ == '__main__':
		#存读取来的变量
    weight_list = []
		#1.要先有个图
    loaded_graph = tf.Graph()
    sconfig = tf.ConfigProto(log_device_placement=False)
    with tf.Session(graph=loaded_graph, config=sconfig) as sess1:
        #2.要有会话,后导入tfs32位的savedmodel文件,导入ckpt也可以,加载了图结构就可以转换成其他保存格式,详见上一篇格式转换博客
        tf.saved_model.loader.load(sess1, [tf.saved_model.tag_constants.SERVING], "/mnt/lustre02/jiangsu/aispeech/home/jbl01/80w_offline/cc/saved_model_no_freeze")
        # for op in loaded_graph.get_operations():
        #    print(op.name)
				#3. 冻结变量,接下来可以保存成pb
        frozen_graph_def = graph_util.convert_variables_to_constants(sess1,
                                                                     tf.get_default_graph().as_graph_def(),
                                                                     ['whichPun/output'])#注意此处是输出op得到名字
        with tf.gfile.FastGFile('./punc_model/pb_model_fp32/graph.pb', mode='wb') as f:
            f.write(frozen_graph_def.SerializeToString())
				#32位的输入输出,可以从Module.input得到或者从图节点名字得到
        inputs = sess1.graph.get_tensor_by_name("inputs/input_ids:0")
        print(inputs)
        inputmask = sess1.graph.get_tensor_by_name("inputs/input_mask:0")
        print(inputmask)

        pout = sess1.graph.get_tensor_by_name("whichPun/output:0")
        print(pout)

        aaa = np.asarray([111]).reshape((-1, 1))
        bbb = np.asarray([1]).reshape((-1, 1))
        print(sess1.run([pout], feed_dict={inputs: aaa, inputmask: bbb}))

        ddd = sess1.run(sess1.graph.get_tensor_by_name("bert/encoder/layer_1/attention/self/key/bias:0"),feed_dict={inputs: aaa, inputmask: bbb})
        print("key bias in sess1 from tfs32", ddd)
        print("key bias astype to fp16", ddd.astype(np.float16))
        num = 0
        for v in tf.get_collection(tf.GraphKeys.VARIABLES):
            #if num==24:
             #   print("num in sess1:", sess1.run(v))
            #第24个变量bias转换时出现部分nan,所以此处打印未转换前的
            print(sess1.run(v).shape)
            #读取从tfs里得到的32位变量转换成fp16存进去
            weight_list.append(sess1.run(v).astype(np.float16))
            #32位存进去,后续改modeling里的定义类型强制转换也可以
            #weight_list.append(sess1.run(v))
            num+=1
            
    model = BertModel(bert_config="bert_config_online.json", ckpt="/mnt/lustre02/jiangsu/aispeech/home/jbl01/80w_offline/cc/-1")
    print(model.input_ids)
    print(model.input_mask)
    print(model.logit)

    with tf.Session() as sess:
       list_id = 0
       #sess.run(tf.global_variables_initializer())
			#key step===tf.global varibal init函数对变量初始化,注释掉用assign赋值
       for v in tf.get_collection(tf.GraphKeys.VARIABLES):
           print(list_id, v)
           sess.run(tf.assign(v, weight_list[list_id]))
           list_id += 1
       print("weight_list[24]",weight_list[24])
       
       #1. 保存为ckpt 两行
       #saver = tf.train.Saver()

       ccc = sess.run([model.logit],
                           feed_dict={model.input_ids: aaa, model.input_mask: bbb})
        
        

转换完在 同样的sess下保存即可(不是sess1):

        #2. 保存为pb 在sess中两行
       frozen_graph_def = graph_util.convert_variables_to_constants(sess,
                                                                    tf.get_default_graph().as_graph_def(),
                                                                    ['whichPun/output']
                                                                        )				
    #预先读取一个需要新建空白文件
       with tf.gfile.FastGFile('./punc_model/pb_model_fp16/graph.pb', mode='wb') as f:
           f.write(frozen_graph_def.SerializeToString())

        #3. 保存为tfs modle
       with tf.Graph().as_default() as graph:
       			#===========模型固化import frozen_graph_def即可
           tf.import_graph_def(frozen_graph_def, name="", )
           with tf.Session() as sess:
               export_path = "./punc_model/tfs_model_fp16/"
               if export_path:
                   os.system("rm -rf " + export_path)
                   # 构造定义一个builder,并制定模型输出路径
                   builder = tf.saved_model.builder.SavedModelBuilder(export_path)
                   # 声明模型的input和output
                   inids = tf.saved_model.utils.build_tensor_info(model.input_ids)
                   inmask = tf.saved_model.utils.build_tensor_info(model.input_mask)
                   poutput = tf.saved_model.utils.build_tensor_info(model.logit)


                   # signature_def将输入输出信息进行封装,在构建模型阶段可以随便给tensor命名
                   prediction_signature = (
                       tf.saved_model.signature_def_utils.build_signature_def(
                           inputs={'input': inids, 'mask': inmask},
                           outputs={'punc_output': poutput},
                           method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
                   # 导入graph与变量信息
                   builder.add_meta_graph_and_variables(
                       sess, [tf.saved_model.tag_constants.SERVING],
                       signature_def_map={
                           'ac_forward': prediction_signature,
                       })

                   builder.save()

你可能感兴趣的:(算法)