TensorRT实现yolov5推理加速(二)
tensorrt_inference
yolov5 PyTorch模型转TensorRT
yolov5剪枝蒸馏压缩
Environment
Operating System + Version: Ubuntu + 16.04
TensorRT Version: 7.1.3.4
GPU Type: GeForce GTX1650,4GB
Nvidia Driver Version: 470.63.01
CUDA Version: 10.2.300
CUDNN Version: 7.6.5
Python Version (if applicable): 3.7.3
Anaconda Version:4.10.3
gcc:7.5.0
g++:7.5.0
name: tensorRT-yolov5
channels:
- >
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=4.5=1_gnu
- blas=1.0=mkl
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2021.7.5=h06a4308_1
- certifi=2021.5.30=py37h06a4308_0
- cudatoolkit=10.2.89=hfd86e86_1
- ffmpeg=4.2.2=h20bf706_0
- freetype=2.10.4=h5ab3b9f_0
- gmp=6.2.1=h2531618_2
- gnutls=3.6.15=he1e5248_0
- jpeg=9b=h024ee3a_2
- lame=3.100=h7b6447c_0
- lcms2=2.12=h3be6417_0
- libedit=3.1.20210714=h7f8727e_0
- libffi=3.2.1=hf484d3e_1007
- libgcc-ng=9.3.0=h5101ec6_17
- libgomp=9.3.0=h5101ec6_17
- libidn2=2.3.2=h7f8727e_0
- libopus=1.3.1=h7b6447c_0
- libpng=1.6.37=hbc83047_0
- libstdcxx-ng=9.3.0=hd4cf53a_17
- libtasn1=4.16.0=h27cfd23_0
- libtiff=4.2.0=h85742a9_0
- libunistring=0.9.10=h27cfd23_0
- libuv=1.40.0=h7b6447c_0
- libvpx=1.7.0=h439df22_0
- libwebp-base=1.2.0=h27cfd23_0
- lz4-c=1.9.3=h295c915_1
- mkl_fft=1.3.0=py37h42c9631_2
- mkl_random=1.2.2=py37h51133e4_0
- ncurses=6.2=he6710b0_1
- nettle=3.7.3=hbbd107a_1
- ninja=1.10.2=hff7bd54_1
- numpy-base=1.20.3=py37h74d4b33_0
- openh264=2.1.0=hd408876_0
- openjpeg=2.4.0=h3ad879b_0
- openssl=1.1.1l=h7f8727e_0
- pip=21.2.2=py37h06a4308_0
- python=3.7.3=h0371630_0
- pytorch=1.8.0=py3.7_cuda10.2_cudnn7.6.5_0
- readline=7.0=h7b6447c_5
- setuptools=52.0.0=py37h06a4308_0
- six=1.16.0=pyhd3eb1b0_0
- sqlite=3.33.0=h62c20be_0
- tk=8.6.10=hbc83047_0
- torchvision=0.9.0=py37_cu102
- typing_extensions=3.10.0.0=pyh06a4308_0
- wheel=0.37.0=pyhd3eb1b0_0
- x264=1!157.20191217=h7b6447c_0
- xz=5.2.5=h7b6447c_0
- zlib=1.2.11=h7b6447c_3
- zstd=1.4.9=haebb681_0
- pip:
- appdirs==1.4.4
- charset-normalizer==2.0.4
- cycler==0.10.0
- dpcpp-cpp-rt==2021.3.0
- flatbuffers==2.0
- graphsurgeon==0.4.5
- idna==3.2
- intel-cmplr-lib-rt==2021.3.0
- intel-cmplr-lic-rt==2021.3.0
- intel-opencl-rt==2021.3.0
- intel-openmp==2021.3.0
- kiwisolver==1.3.1
- mako==1.1.5
- markupsafe==2.0.1
- matplotlib==3.4.3
- mkl==2021.3.0
- mkl-fft==1.3.0
- mkl-service==2.4.0
- netron==5.1.6
- numpy==1.21.2
- olefile==0.46
- onnx==1.10.1
- onnx-simplifier==0.3.6
- onnxoptimizer==0.2.6
- onnxruntime==1.8.1
- opencv-python==4.5.3.56
- pandas==1.3.2
- pillow==8.3.2
- protobuf==3.17.3
- pycuda==2021.1
- pyparsing==2.4.7
- python-dateutil==2.8.2
- pytools==2021.2.8
- pytz==2021.1
- pyyaml==5.4.1
- requests==2.26.0
- scipy==1.7.1
- seaborn==0.11.2
- tbb==2021.3.0
- tensorrt==7.1.3.4
- torchsummary==1.5.1
- tqdm==4.62.2
- typing-extensions==3.10.0.2
- uff==0.6.9
- urllib3==1.26.6
prefix: /home/yichao/miniconda3/envs/tensorRT-yolov5
appdirs==1.4.4
certifi==2021.5.30
charset-normalizer==2.0.4
cycler==0.10.0
dpcpp-cpp-rt==2021.3.0
flatbuffers==2.0
graphsurgeon @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
idna==3.2
intel-cmplr-lib-rt==2021.3.0
intel-cmplr-lic-rt==2021.3.0
intel-opencl-rt==2021.3.0
intel-openmp==2021.3.0
kiwisolver==1.3.1
Mako==1.1.5
MarkupSafe==2.0.1
matplotlib==3.4.3
mkl==2021.3.0
mkl-fft==1.3.0
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work
mkl-service==2.4.0
netron==5.1.6
numpy==1.21.2
olefile==0.46
onnx==1.10.1
onnx-simplifier==0.3.6
onnxoptimizer==0.2.6
onnxruntime==1.8.1
opencv-python==4.5.3.56
pandas==1.3.2
Pillow==8.3.2
protobuf==3.17.3
pycuda==2021.1
pyparsing==2.4.7
python-dateutil==2.8.2
pytools==2021.2.8
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
scipy==1.7.1
seaborn==0.11.2
six @ file:///tmp/build/80754af9/six_1623709665295/work
tbb==2021.3.0
tensorrt @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/python/tensorrt-7.1.3.4-cp37-none-linux_x86_64.whl
torch==1.8.0
torchsummary==1.5.1
torchvision==0.9.0
tqdm==4.62.2
typing-extensions==3.10.0.2
uff @ file:///home/yichao/360Downloads/TensorRT-7.1.3.4/uff/uff-0.6.9-py2.py3-none-any.whl
urllib3==1.26.6
time ./yolov5 -s yolov5s.wts yolov5s.engine s
输出
real 7m29.211s
user 5m10.066s
sys 0m42.794s
no tensorRT | tensorRT FP 32 | tensorRT FP16 | tensorRT INT8 | |
---|---|---|---|---|
engine | ~ | 38.3MB | 21.5MB | 10.8MB |
FPS | 12ms/张,83fps | 11ms/张,90fps | 7ms/张,142fps | 5ms/张,200fps |
生成engine耗时 | ~ | 31s | 7m12s | 7m27s |
C++ API 显存 | ~ | 752MB | 544MB | 526MB |
python API 显存 | 1133MB | 2285MB | 2075MB | 2057MB |
accuracy 精度 | ~ | ~ | ~ | 无框 |
mAP | ~ | ~ | ~ | ~ |
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model)
import onnximport numpy as np
import onnxruntime as rt
import cv2
model_path = '/home/oldpan/code/models/Resnet34_3inputs_448x448_20200609.onnx'
# 验证模型合法性
onnx_model = onnx.load(model_path)
onnx.checker.check_model(onnx_model)
# 读入图像并调整为输入维度
image = cv2.imread("data/images/person.png")
image = cv2.resize(image, (448,448))
image = image.transpose(2,0,1)
image = np.array(image)[np.newaxis, :, :, :].astype(np.float32)
# 设置模型session以及输入信息
sess = rt.InferenceSession(model_path)
input_name1 = sess.get_inputs()[0].name
input_name2 = sess.get_inputs()[1].name
input_name3 = sess.get_inputs()[2].name
output = sess.run(None, {input_name1: image, input_name2: image, input_name3: image})
print(output)
此步骤相同,cmake生成cmake相关配置文件
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time cmake ..
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /usr/local/cuda (found version "10.2")
-- Found OpenCV: /usr/local/opencv3.3.0 (found version "3.3.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/yichao/MyDocuments/tensorrtx/yolov5/build
real 0m0.241s
user 0m0.201s
sys 0m0.042s
/home/yichao/MyDocuments/tensorrtx/yolov5/CMakeLists.txt
cmake_minimum_required(VERSION 2.6)
project(yolov5)
add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)
find_package(CUDA REQUIRED)
if(WIN32)
enable_language(CUDA)
endif(WIN32)
include_directories(${PROJECT_SOURCE_DIR}/include)
# include and link dirs of cuda and tensorrt, you need adapt them if yours are different
# cuda
# 需要修改目录
include_directories(/usr/local/cuda/include)
link_directories(/usr/local/cuda/lib64)
# tensorrt
# 需要修改目录
include_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/include/)
link_directories(/home/yichao/360Downloads/TensorRT-7.1.3.4/lib/)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -Ofast -Wfatal-errors -D_MWAITXINTRIN_H_INCLUDED")
cuda_add_library(myplugins SHARED ${PROJECT_SOURCE_DIR}/yololayer.cu)
target_link_libraries(myplugins nvinfer cudart)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
add_executable(yolov5 ${PROJECT_SOURCE_DIR}/calibrator.cpp ${PROJECT_SOURCE_DIR}/yolov5.cpp)
target_link_libraries(yolov5 nvinfer)
target_link_libraries(yolov5 cudart)
target_link_libraries(yolov5 myplugins)
target_link_libraries(yolov5 ${OpenCV_LIBS})
if(UNIX)
add_definitions(-O2 -pthread)
endif(UNIX)
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time make -j6
[ 20%] Building NVCC (Device) object
...
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
real 0m4.723s
user 0m5.887s
sys 0m0.421s
修改文件
/home/yichao/MyDocuments/tensorrtx/yolov5/yolov5.cpp
#define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
cd /home/yichao/MyDocuments/tensorrtx/yolov5
mkdir build
cd build
cp {ultralytics}/yolov5/yolov5s.wts {tensorrtx}/yolov5/build
cmake ..
(yolov5-pytorch) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time make -j6
[ 20%] Building NVCC (Device) object
...
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
real 0m4.702s
user 0m5.841s
sys 0m0.406s
(yolov5-pytorch) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time ./yolov5 -s yolov5s.wts yolov5s.engine s
Loading weights: yolov5s.wts
Building engine, please wait for a while...
Build engine successfully!
real 0m31.284s
user 0m24.642s
sys 0m1.750s
yolov5s.engine,38.3MB
显存占用情况:
Thu Sep 9 14:23:23 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 40C P0 24W / 75W | 829MiB / 3903MiB | 24% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1658 G /usr/lib/xorg/Xorg 206MiB |
| 0 N/A N/A 13920 C ./yolov5 619MiB |
+-----------------------------------------------------------------------------+
(yolov5-pytorch) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time ./yolov5 -d yolov5s.engine ../samples
375ms
13ms
12ms
13ms
14ms
12ms
12ms
...
10ms
10ms
10ms
11ms
real 0m41.621s
user 0m29.085s
sys 0m3.601s
1000张图,图片分辨率为 640x640
平均11ms/张,即90fps
显存占用情况:
Thu Sep 9 14:25:15 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 42C P0 35W / 75W | 962MiB / 3903MiB | 43% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1658 G /usr/lib/xorg/Xorg 206MiB |
| 0 N/A N/A 13988 C ./yolov5 752MiB |
+-----------------------------------------------------------------------------+
# 推理后图片路径
/home/yichao/MyDocuments/tensorrtx/yolov5/build
// install python-tensorrt, pycuda, etc.
// ensure the yolov5s.engine and libmyplugins.so have been built
python yolov5_trt.py
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5$ time python yolov5_trt.py
----------- True
bingding: data (3, 640, 640)
bingding: prob (6001, 1, 1)
batch size is 1
warm_up->(640, 640, 3), time->416.93ms
warm_up->(640, 640, 3), time->11.84ms
warm_up->(640, 640, 3), time->13.25ms
warm_up->(640, 640, 3), time->12.98ms
warm_up->(640, 640, 3), time->12.79ms
warm_up->(640, 640, 3), time->12.70ms
warm_up->(640, 640, 3), time->11.82ms
warm_up->(640, 640, 3), time->11.90ms
warm_up->(640, 640, 3), time->13.13ms
warm_up->(640, 640, 3), time->11.89ms
input->['samples/COCO_train2014_000000421903.jpg'], time->10.30ms, saving into output/
input->['samples/COCO_train2014_000000145736.jpg'], time->11.23ms, saving into output/
input->['samples/COCO_train2014_000000482834.jpg'], time->11.26ms, saving into output/
...
input->['samples/COCO_train2014_000000221565.jpg'], time->10.94ms, saving into output/
input->['samples/COCO_train2014_000000366274.jpg'], time->10.30ms, saving into output/
input->['samples/COCO_train2014_000000048824.jpg'], time->10.77ms, saving into output/
real 1m14.491s
user 0m53.540s
sys 0m8.307s
1000张图,图片分辨率为 640x640
平均11ms/张,即90fps
显存占用情况:
Thu Sep 9 14:35:54 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 28% 43C P0 27W / 75W | 2495MiB / 3903MiB | 33% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1658 G /usr/lib/xorg/Xorg 206MiB |
| 0 N/A N/A 15510 C python 2285MiB |
+-----------------------------------------------------------------------------+
默认是 FP16
/home/yichao/MyDocuments/tensorrtx/yolov5/yolov5.cpp
#define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
cd /home/yichao/MyDocuments/tensorrtx/yolov5
mkdir build
cd build
cp {ultralytics}/yolov5/yolov5s.wts {tensorrtx}/yolov5/build
cmake ..
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time make -j6
[ 20%] Building NVCC (Device) object
...
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
real 0m4.723s
user 0m5.887s
sys 0m0.421s
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time ./yolov5 -s yolov5s.wts yolov5s.engine s
Loading weights: yolov5s.wts
Building engine, please wait for a while...
Build engine successfully!
real 7m11.939s
user 4m43.104s
sys 0m39.300s
yolov5s.engine,21.5MB
显存占用情况:
Thu Sep 9 15:20:15 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 29% 44C P0 24W / 75W | 843MiB / 3903MiB | 16% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1658 G /usr/lib/xorg/Xorg 216MiB |
| 0 N/A N/A 17616 C ./yolov5 623MiB |
+-----------------------------------------------------------------------------+
# 下载图片 coco_calib
[GoogleDrive](https://drive.google.com/drive/folders/1s7jE9DtOngZMzJC1uL307J2MiaGwdRSI?usp=sharing)
[BaiduPan](https://pan.baidu.com/s/1GOm_-JobpyLMAqZWCDUhKg) pwd: a9wh
# 图片路径
# /home/yichao/MyDocuments/tensorrtx/yolov5/build-int8/coco_calib
# 创建软链接
ln -s /home/yichao/MyDocuments/tensorrtx/yolov5/build-int8/coco_calib /home/yichao/MyDocuments/tensorrtx/yolov5/samples
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time ./yolov5 -d yolov5s.engine ../samples
7ms
8ms
7ms
7ms
7ms
7ms
8ms
8ms
7ms
7ms
7ms
...
7ms
real 0m37.748s
user 0m27.568s
sys 0m2.609s
1000张图,图片分辨率为 640x640
平均7ms/张,即142fps
显存占用情况:
Wed Sep 8 14:35:53 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 39C P0 18W / 75W | 790MiB / 3903MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1469 G /usr/lib/xorg/Xorg 242MiB |
| 0 N/A N/A 8440 C ./yolov5 544MiB |
+-----------------------------------------------------------------------------+
# 推理后图片路径
/home/yichao/MyDocuments/tensorrtx/yolov5/build
// install python-tensorrt, pycuda, etc.
// ensure the yolov5s.engine and libmyplugins.so have been built
python yolov5_trt.py
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5$ time python yolov5_trt.py
----------- True
bingding: data (3, 640, 640)
bingding: prob (6001, 1, 1)
batch size is 1
warm_up->(640, 640, 3), time->7.03ms
warm_up->(640, 640, 3), time->6.38ms
warm_up->(640, 640, 3), time->6.99ms
warm_up->(640, 640, 3), time->6.42ms
warm_up->(640, 640, 3), time->6.42ms
warm_up->(640, 640, 3), time->6.42ms
warm_up->(640, 640, 3), time->6.99ms
warm_up->(640, 640, 3), time->7.30ms
warm_up->(640, 640, 3), time->6.98ms
warm_up->(640, 640, 3), time->7.28ms
input->['samples/COCO_train2014_000000421903.jpg'], time->7.25ms, saving into output/
input->['samples/COCO_train2014_000000145736.jpg'], time->6.71ms, saving into output/
input->['samples/COCO_train2014_000000482834.jpg'], time->6.70ms, saving into output/
input->['samples/COCO_train2014_000000393241.jpg'], time->6.79ms, saving into output/
...
input->['samples/COCO_train2014_000000221565.jpg'], time->6.70ms, saving into output/
input->['samples/COCO_train2014_000000366274.jpg'], time->6.61ms, saving into output/
input->['samples/COCO_train2014_000000048824.jpg'], time->6.70ms, saving into output/
real 0m51.729s
user 0m44.069s
sys 0m5.622s
1000张图,图片分辨率为 640x640
平均7ms/张,即142fps
显存占用情况:
Wed Sep 8 15:52:45 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 39C P0 22W / 75W | 2321MiB / 3903MiB | 29% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1469 G /usr/lib/xorg/Xorg 242MiB |
| 0 N/A N/A 11220 C python 2075MiB |
+-----------------------------------------------------------------------------+
Prepare calibration images, you can randomly select 1000s images from your train set.
For coco, you can also download my calibration images coco_calib
from GoogleDrive or BaiduPan pwd: a9wh
/home/yichao/MyDocuments/tensorrtx/yolov5/build/coco_calib
修改文件
/home/yichao/MyDocuments/tensorrtx/yolov5/yolov5.cpp
#define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
改为
#define USE_INT8 // set USE_INT8 or USE_FP16 or USE_FP32
cd /home/yichao/MyDocuments/tensorrtx/yolov5
mkdir build
cd build
cp {ultralytics}/yolov5/yolov5s.wts {tensorrtx}/yolov5/build
cmake ..
# 如果之前已经编译,清理编译
make clean
make -j6
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time make -j6
[ 20%] Building NVCC (Device) object
...
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
real 0m4.709s
user 0m5.902s
sys 0m0.373s
sudo ./yolov5 -s [.wts] [.engine] [s/m/l/x/s6/m6/l6/x6 or c/c6 gd gw] // serialize model to plan file
// For example yolov5s
sudo ./yolov5 -s yolov5s.wts yolov5s.engine s
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time ./yolov5 -s yolov5s.wts yolov5s.engine s
Loading weights: yolov5s.wts
Your platform support int8: true
Building engine, please wait for a while...
reading calib cache: int8calib.table
COCO_train2014_000000421903.jpg 0
COCO_train2014_000000145736.jpg 1
...
COCO_train2014_000000048824.jpg 999
reading calib cache: int8calib.table
writing calib cache: int8calib.table size: 13506
Build engine successfully!
real 7m27.392s
user 6m58.768s
sys 0m38.621s
yolov5s.engine,10.8MB
显存占用情况:
Wed Sep 8 15:20:47 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 33% 46C P0 18W / 75W | 920MiB / 3903MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1469 G /usr/lib/xorg/Xorg 242MiB |
| 0 N/A N/A 9326 C ./yolov5 674MiB |
+-----------------------------------------------------------------------------+
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5/build$ time ./yolov5 -d yolov5s.engine ../samples
5ms
6ms
5ms
5ms
...
5ms
6ms
5ms
real 0m24.968s
user 0m23.439s
sys 0m1.660s
1000张图,图片分辨率为 640x640
平均5ms/张,即200fps
显存占用情况:
Wed Sep 8 15:23:56 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 30% 43C P0 24W / 75W | 772MiB / 3903MiB | 38% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1469 G /usr/lib/xorg/Xorg 242MiB |
| 0 N/A N/A 9573 C ./yolov5 526MiB |
+-----------------------------------------------------------------------------+
# 推理后图片路径
/home/yichao/MyDocuments/tensorrtx/yolov5/build
// install python-tensorrt, pycuda, etc.
// ensure the yolov5s.engine and libmyplugins.so have been built
python yolov5_trt.py
(tensorRT-yolov5) yichao@yichao:~/MyDocuments/tensorrtx/yolov5$ python yolov5_trt.py
----------- True
bingding: data (3, 640, 640)
bingding: prob (6001, 1, 1)
batch size is 1
warm_up->(640, 640, 3), time->7.82ms
warm_up->(640, 640, 3), time->4.51ms
warm_up->(640, 640, 3), time->4.55ms
warm_up->(640, 640, 3), time->4.61ms
warm_up->(640, 640, 3), time->5.11ms
warm_up->(640, 640, 3), time->4.81ms
warm_up->(640, 640, 3), time->4.56ms
warm_up->(640, 640, 3), time->4.75ms
warm_up->(640, 640, 3), time->4.52ms
warm_up->(640, 640, 3), time->4.91ms
input->['samples/COCO_train2014_000000421903.jpg'], time->4.57ms, saving into output/
input->['samples/COCO_train2014_000000145736.jpg'], time->5.38ms, saving into output/
input->['samples/COCO_train2014_000000482834.jpg'], time->4.66ms, saving into output/
input->['samples/COCO_train2014_000000393241.jpg'], time->5.01ms, saving into output/output/
input->['samples/COCO_train2014_000000548377.jpg'], time->5.28ms, saving into output/
input->['samples/COCO_train2014_000000329954.jpg'], time->4.93ms, saving into output/
...
input->['samples/COCO_train2014_000000141181.jpg'], time->5.19ms, saving into output/
input->['samples/COCO_train2014_000000221565.jpg'], time->5.15ms, saving into output/
input->['samples/COCO_train2014_000000366274.jpg'], time->4.76ms, saving into output/
input->['samples/COCO_train2014_000000048824.jpg'], time->4.80ms, saving into output/
1000张图,图片分辨率为 640x640
平均5ms/张,即200fps
显存占用情况:
Wed Sep 8 15:39:14 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 27% 40C P0 20W / 75W | 2303MiB / 3903MiB | 23% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1469 G /usr/lib/xorg/Xorg 242MiB |
| 0 N/A N/A 9979 C python 2057MiB |
+-----------------------------------------------------------------------------+
参考资料
README_mAP.md
tensorrt_demo
Yolov5的3种tensorRT加速方式及3090测评结果(C++版和Python torchtrt版)