In compuatational mode=FP16, TensorRT can accept input or output data in either FP32 or FP16 mode.
You can change to use any combinations below for input and output:
• Input FP32, output FP32
• Input FP16, output FP32
• Input FP16, output FP16
• Input FP32, output FP16
static void setAllNetworkInputsToHalf(INetworkDefinition* network){
for (int i = 0; i < network->getNbInputs(); i++)
在jetson 上的例子所在位置:
You can refer to our tensorRT sample which is located at ‘/usr/src/gie_samples/’.
For example,
Separate your network to: input -> networkA -> networkSelf -> networkB -> output
NetworkA and networkB can inference directly via tensorRT.
NetworkSelf needs to be implemented via CUDA.
So, the flow will be:
IExecutionContext *contextA = engineA->createExecutionContext(); //create networkA
IExecutionContext *contextB = engineB->createExecutionContext(); //create networkB
contextA.enqueue(batchSize, buffersA, stream, nullptr); //inference networkA
myLayer(outputFromA, inputToB, stream); //inference networkSelf, your cuda code is here!
contextB.enqueue(batchSize, buffersB, stream, nullptr); //inference networkB