NVidia TensorRT 运行 Caffe 模型

前面的话

NVidia发布了TensorRT,支持fp16,可以在TX1和Pascal架构的显卡,如gtx1080上运行半精度。官方说法是TensorRT对inference的加速很明显,往往可以有一倍的性能提升。而且还支持使用caffe的模型。
但不足的是还不支持自定义层,只有常用的一些层可以选用,NVidia论坛上也表示近期不太可能支持自定义层。
目前网上关于如何将TensorRT运行caffe模型的内容比较少。终于在NVidia一个workshop的PPT上发现了一段看起来很清晰的流程。抄在下面。

代码

/* Importing a Caffe Model */
// create the network definition
INetworkDefinition* network = infer->createNetwork();

// create a map from caffe blob names to GIE tensors
std::unordered_map<std::string, infer1::Tensor> blobNameToTensor;

// populate the network definition and map
CaffeParser* parser = new CaffeParser;
parser->parse(deployFile, modelFile, *network, blobNameToTensor);

// tell GIE which tensors are required outputs
for (auto& s : outputs)
     network->setOutput(blobNameToTensor[s]);

/*Engine Creation*/
// Specify the maximum batch size and scratch size
CudaEngineBuildContext buildContext;
buildContext.maxBatchSize = maxBatchSize;
buildContext.maxWorkspaceSize = 1 << 20;

// create the engine
ICudaEngine* engine =
     infer->createCudaEngine(buildContext, *network);

// serialize to a C++ stream
engine->serialize(gieModelStream);

/*Binding Buffers*/
// get array bindings for input and output
int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME),
     outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);

// set array of input and output buffers
void* buffers[2];
buffers[inputIndex] = gpuInputBuffer;

/*Running the Engine*/
// Specify the batch size
CudaEngineContext context;
context.batchSize = batchSize;

// add GIE kernels to the given stream
engine->enqueue(context, buffers, stream, NULL);

//<…>

// wait on the stream
cudaStreamSynchronize(stream);'''

你可能感兴趣的:(Machine,Learning)