Using The CuDLA API To Run A TensorRT Engine

Using The CuDLA API To Run A TensorRT Engine

Table Of Contents

  • Description
  • How does this sample work?
    • TensorRT API layers and ops
  • Prerequisites
  • Running the sample
    • Sample --help options
  • Additional resources
  • License
  • Changelog
  • Known issues

Description

This sample, sampleCudla, uses an API to construct a network of a single ElementWise layer and builds the engine. The engine runs in DLA standalone mode using cuDLA runtime. In order to do that, the sample uses cuDLA APIs to do engine conversion and cuDLA runtime preparation, as well as inference.

How does this sample work?

After the construction of a network, the module with cuDLA is loaded from the network data. The input and output tensors are then allocated and registered with cuDLA. When the input tensors are copied from CPU to GPU, the cuDLA task can be submitted and executed. Then we wait for stream operations to finish and bring output buffer to CPU to be verified for correctness.

Specifically:

  • The single-layered network is built by TensorRT.
  • cudlaCreateDevice is called to create DLA device.
  • cudlaModuleLoadFromMemory is called to load the engine memory for DLA use.
  • cudaMalloc and cudlaMemRegister are called to first allocate memory on GPU, then let the CUDA pointer be registered with the DLA.
  • cudlaModuleGetAttributes is called to get module attributes from the loaded module.
  • cudlaSubmitTask is called to submit the inference task.

TensorRT API layers and ops

In this sample, the ElementWise layer is used. For more information, see the TensorRT Developer Guide: Layers documentation.

Prerequisites

This sample needs to be compiled with macro ENABLE_DLA=1, otherwise, this sample will print the following error message:

Unsupported platform, please make sure it is running on aarch64, QNX or android.

and quit.

Running the sample

  1. Compile this sample by running make in the /samples/sampleCudla directory. The binary named sample_cudla will be created in the /bin directory.
    cd /samples/sampleCudla make ENABLE_DLA=1

    Where `` is where you installed TensorRT.
    
  2. Run the sample to perform inference on DLA.
    ./sample_cudla

  3. Verify that the sample ran successfully. If the sample runs successfully you should see an output similar to the following:
    &&&& RUNNING TensorRT.sample_cudla # ./sample_cudla [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [DlaLayer] {ForeignNode[(Unnamed Layer* 0) [ElementWise]]}, [I] [TRT] --------------- Layers running on GPU: [I] [TRT] …(omit messages) &&&& PASSED TensorRT.sample_cudla

     This output shows that the sample ran successfully; `PASSED`.
    

Sample --help options

To see the full list of available options and their descriptions, use the ./sample_cudla -h command line option.

Additional resources

The following resources provide a deeper understanding of sampleCudla.

Documentation

  • Introduction To NVIDIA’s TensorRT Samples
  • Working With TensorRT Using The C++ API
  • NVIDIA’s TensorRT Documentation Library
  • Developer Guide for cuDLA APIs

License

For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.

Changelog

June 2022
This is the first release of the README.md file.

Known issues

There are no known issues with this tool.

你可能感兴趣的:(rnn,人工智能,深度学习)