tensorRT加速pytorch模型

pytorch模型保存方法

简单的保存方法

# 保存整个网络
torch.save(net, PATH) 
# 保存网络中的参数, 速度快,占空间少
torch.save(net.state_dict(),PATH)
#--------------------------------------------------
#针对上面一般的保存方法,加载的方法分别是:
model_dict=torch.load(PATH)
model_dict=model.load_state_dict(torch.load(PATH))

然而,在实验中往往需要保存更多的信息,比如优化器的参数,那么可以采取下面的方法保存:

torch.save({'epoch': epochID + 1, 'state_dict': model.state_dict(), 'best_loss': lossMIN,
                            'optimizer': optimizer.state_dict(),'alpha': loss.alpha, 'gamma': loss.gamma},
                           checkpoint_path + '/m-' + launchTimestamp + '-' + str("%.4f" % lossMIN) + '.pth.tar')
                           
def load_checkpoint(model, checkpoint_PATH, optimizer):
    if checkpoint != None:
        model_CKPT = torch.load(checkpoint_PATH)
        model.load_state_dict(model_CKPT['state_dict'])
        print('loading checkpoint!')
        optimizer.load_state_dict(model_CKPT['optimizer'])
    return model, optimizer                           

模型固化方式:jit.trace(待更新)???

onnx模型

pytorch提供存储onnx模型工具,可以直接将模型存储为onnx模型,也可以正常加载模型后,再转存成onnx模型,当然有些情况下,尤其当模型有自定义层的时候,会转换失败,需要自行实现自定义层

 # 神经网络输入数据类型
 dummy_input = torch.randn(self.config.BATCH_SIZE, 1, 28, 28, device='cuda')
 #model是自己定义的网络结构,这种方法还是要定义网路的
 torch.onnx.export(model, dummy_input, filepath, verbose=True)

tensorRT加载onnx模型

tensorRT的parser工具,通过onnx模型构建引擎,并使用引擎推理,类似的代码很多,这个代码简单易懂

def ONNX_build_engine(self, onnx_file_path):
    '''
    通过加载onnx文件,构建engine
    :param onnx_file_path: onnx文件路径
    :return: engine
    '''
    # 打印日志
    G_LOGGER = trt.Logger(trt.Logger.WARNING)
    with trt.Builder(G_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, G_LOGGER) as parser:
        builder.max_batch_size = 100
        builder.max_workspace_size = 1 << 20
        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            print('Beginning ONNX file parsing')
            parser.parse(model.read())
        print('Completed parsing of ONNX file')

        print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
        engine = builder.build_cuda_engine(network)
        print("Completed creating Engine")

        # 保存计划文件
        # with open(engine_file_path, "wb") as f:
        #     f.write(engine.serialize())
        return engine

def loadONNX2TensorRT(self, filepath):
    '''
    通过onnx文件,构建TensorRT运行引擎
    :param filepath: onnx文件路径
    '''

    engine = self.ONNX_build_engine(filepath)

    # 读取测试集
    datas = DataLoaders()
    test_loader = datas.testDataLoader()
    img, target = next(iter(test_loader))
    img = img.numpy()
    target = target.numpy()
    img = img.ravel()

    context = engine.create_execution_context()
    output = np.empty((100, 10), dtype=np.float32)

    # 分配内存
    d_input = cuda.mem_alloc(1 * img.size * img.dtype.itemsize)
    d_output = cuda.mem_alloc(1 * output.size * output.dtype.itemsize)
    bindings = [int(d_input), int(d_output)]

    # pycuda操作缓冲区
    stream = cuda.Stream()
    # 将输入数据放入device
    cuda.memcpy_htod_async(d_input, img, stream)
    # 执行模型
    context.execute_async(100, bindings, stream.handle, None)
    # 将预测结果从从缓冲区取出
    cuda.memcpy_dtoh_async(output, d_output, stream)
    # 线程同步
    stream.synchronize()

    print("Test Case: " + str(target))
    print("Prediction 100: " + str(np.argmax(output, axis=1)))
    del context
    del engine

tensorRT的parser

可以通过onnx、uff等格式的模型,构建推理引擎,parser还可以添加量化参数,将模型转换为float16或int8格式的引擎,达到模型量化目的
int8模型需要校准

https://github.com/jkjung-avt/tensorrt_demos/blob/fabb10313d2af1dbc8e97766d464e1b81cf7e2b6/modnet/onnx_to_tensorrt.py

你可能感兴趣的:(深度学习)