花花少年

TensorRT实现EfficientDet推理加速（一）

一、参考资料

EfficientDet tensorrt

二、相关环境

1. 系统环境

Environment
Operating System + Version: Ubuntu + 16.04
TensorRT Version: 8.0.1.6
GPU Type: GeForce GTX1650,4GB
Nvidia Driver Version: 470.63.01
CUDA Version: 11.0.194
CUDNN Version: 8.0.5
Python Version (if applicable): 3.7.11
Anaconda Version：4.10.3
gcc：7.5.0
g++：7.5.0

2. requirements-gpu.txt

absl-py==0.13.0
appdirs==1.4.4
astunparse==1.6.3
attrs==21.2.0
cached-property==1.5.2
cachetools==4.2.2
certifi==2021.5.30
charset-normalizer==2.0.5
clang==5.0
cycler==0.10.0
Cython==0.29.24
dm-tree==0.1.6
flatbuffers==1.12
gast==0.4.0
google-auth==1.35.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
graphsurgeon @ file:///.../TensorRT-8.0.1.6/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
grpcio==1.34.1
h5py==3.1.0
idna==3.2
importlib-metadata==4.8.1
keras==2.6.0
keras-nightly==2.5.0.dev2021032900
Keras-Preprocessing==1.1.2
kiwisolver==1.3.2
lxml==4.6.3
Mako==1.1.5
Markdown==3.3.4
MarkupSafe==2.0.1
matplotlib==3.4.3
neural-structured-learning==1.3.1
numpy==1.19.5
oauthlib==3.1.1
onnx==1.8.1
onnx-graphsurgeon==0.3.11
onnxruntime==1.8.0
opt-einsum==3.3.0
Pillow==8.3.2
protobuf==3.18.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0
pycuda==2021.1
pyparsing==2.4.7
python-dateutil==2.8.2
pytools==2021.2.8
PyYAML==5.4.1
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.7.2
scipy==1.7.1
six==1.15.0
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow-addons==0.14.0
tensorflow-estimator==2.5.0
tensorflow-gpu @ file:///.../tensorflow_gpu-2.5.0-cp37-cp37m-manylinux2010_x86_64.whl
tensorflow-hub==0.12.0
tensorflow-model-optimization==0.6.0
tensorrt @ file:///.../TensorRT-8.0.1.6/python/tensorrt-8.0.1.6-cp37-none-linux_x86_64.whl
termcolor==1.1.0
tf2onnx==1.8.1
typeguard==2.12.1
typing-extensions==3.7.4.3
uff @ file:///.../TensorRT-8.0.1.6/uff/uff-0.6.9-py2.py3-none-any.whl
urllib3==1.26.6
Werkzeug==2.0.1
wrapt==1.12.1
zipp==3.5.0

3. tensorRT-efficientdet.yaml

name: tensorRT-efficientdet
channels:
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=4.5=1_gnu
  - ca-certificates=2021.7.5=h06a4308_1
  - certifi=2021.5.30=py37h06a4308_0
  - ld_impl_linux-64=2.35.1=h7274673_9
  - libffi=3.3=he6710b0_2
  - libgcc-ng=9.3.0=h5101ec6_17
  - libgomp=9.3.0=h5101ec6_17
  - libstdcxx-ng=9.3.0=hd4cf53a_17
  - ncurses=6.2=he6710b0_1
  - openssl=1.1.1l=h7f8727e_0
  - pip=21.0.1=py37h06a4308_0
  - python=3.7.11=h12debd9_0
  - readline=8.1=h27cfd23_0
  - setuptools=58.0.4=py37h06a4308_0
  - sqlite=3.36.0=hc218d9a_0
  - tk=8.6.10=hbc83047_0
  - wheel=0.37.0=pyhd3eb1b0_1
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - absl-py==0.13.0
    - appdirs==1.4.4
    - astunparse==1.6.3
    - attrs==21.2.0
    - cached-property==1.5.2
    - cachetools==4.2.2
    - charset-normalizer==2.0.5
    - clang==5.0
    - cycler==0.10.0
    - cython==0.29.24
    - dm-tree==0.1.6
    - flatbuffers==1.12
    - gast==0.4.0
    - google-auth==1.35.0
    - google-auth-oauthlib==0.4.6
    - google-pasta==0.2.0
    - graphsurgeon==0.4.5
    - grpcio==1.34.1
    - h5py==3.1.0
    - idna==3.2
    - importlib-metadata==4.8.1
    - keras==2.6.0
    - keras-nightly==2.5.0.dev2021032900
    - keras-preprocessing==1.1.2
    - kiwisolver==1.3.2
    - lxml==4.6.3
    - mako==1.1.5
    - markdown==3.3.4
    - markupsafe==2.0.1
    - matplotlib==3.4.3
    - neural-structured-learning==1.3.1
    - numpy==1.19.5
    - oauthlib==3.1.1
    - onnx==1.8.1
    - onnx-graphsurgeon==0.3.11
    - onnxruntime==1.8.0
    - opt-einsum==3.3.0
    - pillow==8.3.2
    - protobuf==3.18.0
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pycocotools==2.0
    - pycuda==2021.1
    - pyparsing==2.4.7
    - python-dateutil==2.8.2
    - pytools==2021.2.8
    - pyyaml==5.4.1
    - requests==2.26.0
    - requests-oauthlib==1.3.0
    - rsa==4.7.2
    - scipy==1.7.1
    - six==1.15.0
    - tensorboard==2.6.0
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.0
    - tensorflow-addons==0.14.0
    - tensorflow-estimator==2.5.0
    - tensorflow-gpu==2.5.0
    - tensorflow-hub==0.12.0
    - tensorflow-model-optimization==0.6.0
    - tensorrt==8.0.1.6
    - termcolor==1.1.0
    - tf2onnx==1.8.1
    - typeguard==2.12.1
    - typing-extensions==3.7.4.3
    - uff==0.6.9
    - urllib3==1.26.6
    - werkzeug==2.0.1
    - wrapt==1.12.1
    - zipp==3.5.0
prefix: /home/yichao/miniconda3/envs/tensorRT-efficientdet

三、注意事项

If you are performing this conversion to run inference on the edge, such as for NVIDIA Jetson devices, it might be easier to do the ONNX conversion on a PC first.
The interface to map a .pb model into .uff is called graphsurgeon
自定义 tensorRT plugin 插件

四、相关介绍

预处理与后处理

Pre-processing and inference (output: class confidence and anchor-based bounding-box offset prediction of 5 different scales feature map).
Post-processing written in TensorFlow 1.x api which can be transfer to tensorflow graphdef. Post-process pipeline is described below:
- merge all output into one tensor;
- select topK confidence result;
- decode class-label and bbox coordinate from the TopK result;
- doing NMS for each class;

正常流程

方式一：model.index/model.meta/model.data-00000-of-00001 --> frozen_inference_graph.pb --> model.onnx --> model.trt --> infer。
方式二：frozen_inference_graph.pb -> UFF -> TensorRT。

五、关键步骤

1. 下载官方 efficientdet ，安装相关的依赖包

git clone https://github.com/google/automl.git

cd /home/yichao/MyDocuments/automl/efficientdet

pip install -r requirements.txt

pip install pyyaml
pip install tensorflow-model-optimization
pip install matplotlib

2. 下载预训练模型并解压

efficientdet-d0

# /home/yichao/Downloads/efficientdet-d0

├── efficientdet-d0
│   ├── checkpoint
│   ├── d0_coco_test-dev2017.txt
│   ├── d0_coco_val.txt
│   ├── model.data-00000-of-00001
│   ├── model.index
│   └── model.meta

3. 模型转换：转成pb文件

cd /home/yichao/MyDocuments/automl/efficientdet

python model_inspect.py \
    --runmode saved_model \
    --model_name efficientdet-d0 \
    --ckpt_path /home/yichao/Downloads/efficientdet-d0 \
    --saved_model_dir /home/yichao/Downloads/saved_model

# /home/yichao/Downloads/saved_model

├── saved_model
│   ├── efficientdet-d0_frozen.pb
│   ├── saved_model.pb
│   └── variables

# saved_model.pb，7.3MB

4. 下载 tensorRT 官方提供的 EfficientDet，生成onnx模型

git clone https://github.com/NVIDIA/TensorRT.git

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

pip install -r requirements.txt

pip install onnx-graphsurgeon --index-url https://pypi.ngc.nvidia.com

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python create_onnx.py \
    --input_shape '1,512,512,3' \
    --saved_model /home/yichao/Downloads/saved_model \
    --onnx /home/yichao/Downloads/saved_model_onnx/model.onnx

# /home/yichao/Downloads/saved_model_onnx

├── saved_model_onnx
│   └── model.onnx

# model.onnx，16.5MB

5. 生成engine引擎

tensorRT FP32

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python build_engine.py \
    --onnx /home/yichao/Downloads/saved_model_onnx/model.onnx \
    --engine /home/yichao/Downloads/saved_model_trt_fp32/engine.trt \
    --precision fp32

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ time python build_engine.py \
>     --onnx /home/yichao/Downloads/saved_model_onnx/model.onnx \
>     --engine /home/yichao/Downloads/saved_model_trt_fp32/engine.trt \
>     --precision fp32
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +320, GPU +0, now: CPU 343, GPU 476 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] INFO: No importer registered for op: EfficientNMS_TRT. Attempting to import as plugin.
[TensorRT] INFO: Searching for plugin: EfficientNMS_TRT, plugin_version: 1, plugin_namespace: 
[TensorRT] INFO: Successfully created plugin: EfficientNMS_TRT
INFO:EngineBuilder:Network Description
INFO:EngineBuilder:Input 'image_arrays:0' with shape (1, 512, 512, 3) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'num_detections' with shape (1, 1) and dtype DataType.INT32
INFO:EngineBuilder:Output 'detection_boxes' with shape (1, 100, 4) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_scores' with shape (1, 100) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_classes' with shape (1, 100) and dtype DataType.INT32
INFO:EngineBuilder:Building fp32 Engine in /home/yichao/Downloads/saved_model_trt_fp32/engine.trt
build_engine.py:203: DeprecationWarning: Use build_serialized_network instead.
  with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 361 MiB, GPU 476 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +357, GPU +148, now: CPU 721, GPU 624 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +118, GPU +148, now: CPU 839, GPU 772 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
[TensorRT] INFO: Total Host Persistent Memory: 321392
[TensorRT] INFO: Total Device Persistent Memory: 16182272
[TensorRT] INFO: Total Scratch Memory: 106066176
[TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 12 MiB, GPU 8 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1267, GPU 970 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 1268, GPU 978 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1268, GPU 962 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1267, GPU 944 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 1266 MiB, GPU 944 MiB
INFO:EngineBuilder:Serializing engine to file: /home/yichao/Downloads/saved_model_trt_fp32/engine.trt

real	1m51.712s
user	1m33.469s
sys	0m3.339s

# engine.trt，29.7MB

显存占用情况

Sat Sep 18 11:39:08 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   42C    P0    19W /  75W |   1750MiB /  3903MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                285MiB |
|    0   N/A  N/A     20266      C   python                           1461MiB |
+-----------------------------------------------------------------------------+

tensorRT FP16

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python build_engine.py \
    --onnx /home/yichao/Downloads/saved_model_onnx/model.onnx \
    --engine /home/yichao/Downloads/saved_model_trt_fp16/engine.trt \
    --precision fp16

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ time python build_engine.py \
>     --onnx /home/yichao/Downloads/saved_model_onnx/model.onnx \
>     --engine /home/yichao/Downloads/saved_model_trt_fp16/engine.trt \
>     --precision fp16
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +320, GPU +0, now: CPU 343, GPU 496 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] INFO: No importer registered for op: EfficientNMS_TRT. Attempting to import as plugin.
[TensorRT] INFO: Searching for plugin: EfficientNMS_TRT, plugin_version: 1, plugin_namespace: 
[TensorRT] INFO: Successfully created plugin: EfficientNMS_TRT
INFO:EngineBuilder:Network Description
INFO:EngineBuilder:Input 'image_arrays:0' with shape (1, 512, 512, 3) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'num_detections' with shape (1, 1) and dtype DataType.INT32
INFO:EngineBuilder:Output 'detection_boxes' with shape (1, 100, 4) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_scores' with shape (1, 100) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_classes' with shape (1, 100) and dtype DataType.INT32
INFO:EngineBuilder:Building fp16 Engine in /home/yichao/Downloads/saved_model_trt_fp16/engine.trt
build_engine.py:203: DeprecationWarning: Use build_serialized_network instead.
  with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 361 MiB, GPU 496 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +357, GPU +148, now: CPU 721, GPU 644 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +118, GPU +148, now: CPU 839, GPU 792 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
[TensorRT] INFO: Total Host Persistent Memory: 295824
[TensorRT] INFO: Total Device Persistent Memory: 8877568
[TensorRT] INFO: Total Scratch Memory: 88388864
[TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 12 MiB, GPU 8 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 1294, GPU 984 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1294, GPU 992 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1294, GPU 976 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1293, GPU 958 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 1290 MiB, GPU 958 MiB
INFO:EngineBuilder:Serializing engine to file: /home/yichao/Downloads/saved_model_trt_fp16/engine.trt

real	12m14.915s
user	10m47.976s
sys	0m17.289s

# engine.trt，23.4MB

显存占用情况

Sat Sep 18 11:20:07 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 28%   42C    P0    19W /  75W |   2640MiB /  3903MiB |     10%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     19261      C   python                           2331MiB |
+-----------------------------------------------------------------------------+

tensorRT INT8

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python build_engine.py \
    --onnx /home/yichao/Downloads/saved_model_onnx/model.onnx \
    --engine /home/yichao/Downloads/saved_model_trt_int8/engine.trt \
    --precision int8 \
    --calib_input /home/yichao/Downloads/coco_calib \
    --calib_cache /home/yichao/Downloads/calibration/calibration.cache

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ time python build_engine.py \
>     --onnx /home/yichao/Downloads/saved_model_onnx/model.onnx \
>     --engine /home/yichao/Downloads/saved_model_trt_int8/engine.trt \
>     --precision int8 \
>     --calib_input /home/yichao/Downloads/coco_calib \
>     --calib_cache /home/yichao/Downloads/calibration/calibration.cache
[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +320, GPU +0, now: CPU 343, GPU 476 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[TensorRT] INFO: No importer registered for op: EfficientNMS_TRT. Attempting to import as plugin.
[TensorRT] INFO: Searching for plugin: EfficientNMS_TRT, plugin_version: 1, plugin_namespace: 
[TensorRT] INFO: Successfully created plugin: EfficientNMS_TRT
INFO:EngineBuilder:Network Description
INFO:EngineBuilder:Input 'image_arrays:0' with shape (1, 512, 512, 3) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'num_detections' with shape (1, 1) and dtype DataType.INT32
INFO:EngineBuilder:Output 'detection_boxes' with shape (1, 100, 4) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_scores' with shape (1, 100) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_classes' with shape (1, 100) and dtype DataType.INT32
INFO:EngineBuilder:Building int8 Engine in /home/yichao/Downloads/saved_model_trt/engine.trt
build_engine.py:203: DeprecationWarning: Use build_serialized_network instead.
  with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 361 MiB, GPU 500 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +357, GPU +148, now: CPU 720, GPU 648 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +128, GPU +148, now: CPU 848, GPU 796 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
[TensorRT] INFO: Total Host Persistent Memory: 34896
[TensorRT] INFO: Total Device Persistent Memory: 0
[TensorRT] INFO: Total Scratch Memory: 106066176
[TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 8 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1258, GPU 990 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1258, GPU 998 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1258, GPU 982 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1257, GPU 964 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 1257 MiB, GPU 964 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +10, now: CPU 1258, GPU 974 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1258, GPU 982 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 1258 MiB, GPU 1138 MiB
[TensorRT] INFO: Starting Calibration.
INFO:EngineBuilder:Calibrating image 8 / 1000
[TensorRT] INFO:   Calibrated batch 0 in 0.299519 seconds.
INFO:EngineBuilder:Calibrating image 16 / 1000
...
[TensorRT] INFO:   Calibrated batch 123 in 0.275864 seconds.
INFO:EngineBuilder:Calibrating image 1000 / 1000
[TensorRT] INFO:   Calibrated batch 124 in 0.289 seconds.
INFO:EngineBuilder:Finished calibration batches
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1349, GPU 1122 (MiB)
[TensorRT] INFO:   Post Processing Calibration data in 81.7665 seconds.
[TensorRT] INFO: Calibration completed in 129.299 seconds.
[TensorRT] INFO: Writing Calibration Cache for calibrator: TRT-8001-EntropyCalibration2
INFO:EngineBuilder:Writing calibration cache data to: /home/yichao/Downloads/calibration/calibration.cache
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 82) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 177) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 180) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 263) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 266) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 270) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 273) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 277) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 284) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 297) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 300) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 305) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 308) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 314) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 329) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 332) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 338) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 341) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 356) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 361) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 366) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 370) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 385) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 390) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 395) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 398) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 413) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 418) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 421) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 425) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 434) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 440) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 449) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 453) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 462) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 465) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 469) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 478) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 481) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 485) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 494) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 497) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 501) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 510) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 515) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 524) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 530) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 539) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 545) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 554) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 560) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 572) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 600) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 644) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: Missing scale and zero-point for tensor (Unnamed Layer* 692) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1322, GPU 958 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1322, GPU 966 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] INFO: Detected 1 inputs and 4 output network tensors.
[TensorRT] INFO: Total Host Persistent Memory: 306784
[TensorRT] INFO: Total Device Persistent Memory: 6846464
[TensorRT] INFO: Total Scratch Memory: 88388864
[TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 20 MiB, GPU 158 MiB
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1369, GPU 986 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1369, GPU 994 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1369, GPU 978 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1368, GPU 962 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 1364 MiB, GPU 962 MiB
INFO:EngineBuilder:Serializing engine to file: /home/yichao/Downloads/saved_model_trt/engine.trt

real	25m27.581s
user	23m12.319s
sys	0m43.315s

# engine.trt，22.2MB

显存占用情况

Sat Sep 18 09:48:55 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 28%   42C    P0    18W /  75W |   1784MiB /  3903MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                285MiB |
|    0   N/A  N/A     15585      C   python                           1495MiB |
+-----------------------------------------------------------------------------+

6. 推理

tensorRT FP32

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python infer.py \
    --engine /home/yichao/Downloads/saved_model_trt_fp32/engine.trt \
    --input /home/yichao/Downloads/coco_calib \
    --output /home/yichao/Downloads/infer_fp32

(tensorRT-efficientdet) yichao@yichao:/media/yichao/蚁巢文件/YOYOFile/YOYOFile/PyProjects/TensorRT/samples/python/efficientdet$ time python infer.py     --engine /media/yichao/蚁巢文件/YOYOFile/EfficientDet/saved_model_trt_fp32/engine.trt     --input /media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib     --output /media/yichao/蚁巢文件/YOYOFile/EfficientDet/infer_fp32
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000009.jpg']
time_1: 511Image 1 / 1000
len(images): 1
detections: [[{'ymin': -10.355606079101562, 'xmin': 299.74897384643555, 'ymax': 240.54559707641602, 'xmax': 637.5265502929688, 'score': 0.6842542, 'class': 50}, {'ymin': 80.77005386352539, 'xmin': -1.2041282653808594, 'ymax': 436.90185546875, 'xmax': 459.16343688964844, 'score': 0.62642604, 'class': 50}, {'ymin': 188.0276870727539, 'xmin': 27.04761505126953, 'ymax': 471.3081741333008, 'xmax': 601.4738464355469, 'score': 0.57705814, 'class': 50}, {'ymin': 222.76105880737305, 'xmin': 249.59787368774414, 'ymax': 473.05328369140625, 'xmax': 562.2936248779297, 'score': 0.5430005, 'class': 55}, {'ymin': 69.5292615890503, 'xmin': 388.29090118408203, 'ymax': 141.84276580810547, 'xmax': 470.1576614379883, 'score': 0.45278752, 'class': 54}, {'ymin': 6.308698654174805, 'xmin': 19.28295135498047, 'ymax': 293.5408401489258, 'xmax': 427.21588134765625, 'score': 0.4251722, 'class': 50}]]
time_2: 159
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000151.jpg']
time_1: 15 Image 2 / 1000
len(images): 1
detections: [[{'ymin': 326.95018768310547, 'xmin': 211.58906936645508, 'ymax': 365.53333282470703, 'xmax': 250.12731552124023, 'score': 0.84729725, 'class': 12}, {'ymin': 38.47867965698242, 'xmin': 200.59566497802734, 'ymax': 634.6967315673828, 'xmax': 473.6790084838867, 'score': 0.7619643, 'class': 6}]]
time_2: 108
...
...
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000580385.jpg']
time_1: 14 Image 998 / 1000
len(images): 1
detections: [[{'ymin': 47.34233856201172, 'xmin': 125.20095825195312, 'ymax': 359.3656539916992, 'xmax': 525.5948638916016, 'score': 0.9364282, 'class': 6}]]
time_2: 117
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000581218.jpg']
time_1: 15 Image 999 / 1000
len(images): 1
detections: [[{'ymin': 309.59455490112305, 'xmin': 241.9521713256836, 'ymax': 348.5172653198242, 'xmax': 271.9435501098633, 'score': 0.6911016, 'class': 4}, {'ymin': 147.70557403564453, 'xmin': 271.3910484313965, 'ymax': 188.7246322631836, 'xmax': 300.4056739807129, 'score': 0.6408937, 'class': 4}, {'ymin': 269.51610565185547, 'xmin': 260.4315185546875, 'ymax': 310.3202438354492, 'xmax': 289.44976806640625, 'score': 0.63147306, 'class': 4}, {'ymin': 234.6424102783203, 'xmin': 308.0194282531738, 'ymax': 274.24530029296875, 'xmax': 339.48490142822266, 'score': 0.52650064, 'class': 4}, {'ymin': 273.0849838256836, 'xmin': 298.9488983154297, 'ymax': 313.12862396240234, 'xmax': 325.66226959228516, 'score': 0.4886203, 'class': 4}, {'ymin': 233.1707763671875, 'xmin': 282.28904724121094, 'ymax': 268.44730377197266, 'xmax': 312.5209426879883, 'score': 0.41873103, 'class': 4}]]
time_2: 169
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000581766.jpg']
time_1: 15 Image 1000 / 1000
len(images): 1
detections: [[{'ymin': 148.56109023094177, 'xmin': 204.04024422168732, 'ymax': 283.86542201042175, 'xmax': 297.4337339401245, 'score': 0.8775655, 'class': 69}, {'ymin': 153.7458896636963, 'xmin': 17.941415309906006, 'ymax': 285.7228219509125, 'xmax': 127.41473317146301, 'score': 0.81447136, 'class': 69}, {'ymin': 146.68381214141846, 'xmin': 373.2069432735443, 'ymax': 278.8565158843994, 'xmax': 481.57575726509094, 'score': 0.75143397, 'class': 69}]]
time_2: 111

get_batch time: 158031


Finished Processing

real	2m40.346s
user	2m30.624s
sys	0m4.989s

结论：tensorRT FP 32平均耗时 15ms/张，即 66fps。

显存占用情况

Sat Sep 18 11:44:30 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   40C    P0    20W /  75W |   1146MiB /  3903MiB |     14%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     20470      C   python                            837MiB |
+-----------------------------------------------------------------------------+

tensorRT FP16

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python infer.py \
    --engine /home/yichao/Downloads/saved_model_trt_fp16/engine.trt \
    --input /home/yichao/Downloads/coco_calib \
    --output /home/yichao/Downloads/infer_fp16

(tensorRT-efficientdet) yichao@yichao:/media/yichao/蚁巢文件/YOYOFile/YOYOFile/P
yProjects/TensorRT/samples/python/efficientdet$ time python infer.py \
>     --engine /media/yichao/蚁巢文件/YOYOFile/EfficientDet/saved_model_trt_fp16/engine.trt \
>     --input /media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib \
>     --output /media/yichao/蚁巢文件/YOYOFile/EfficientDet/infer_fp16
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000009.jpg']
time_1: 3582mage 1 / 1000
len(images): 1
detections: [[{'ymin': -10.390625, 'xmin': 299.84375, 'ymax': 240.625, 'xmax': 637.5, 'score': 0.6845703, 'class': 50}, {'ymin': 80.9375, 'xmin': -1.25, 'ymax': 436.875, 'xmax': 459.0625, 'score': 0.625, 'class': 50}, {'ymin': 188.125, 'xmin': 27.1875, 'ymax': 471.25, 'xmax': 601.25, 'score': 0.57666016, 'class': 50}, {'ymin': 222.8125, 'xmin': 249.6875, 'ymax': 472.8125, 'xmax': 562.1875, 'score': 0.54296875, 'class': 55}, {'ymin': 69.53125, 'xmin': 388.125, 'ymax': 141.875, 'xmax': 470.0, 'score': 0.453125, 'class': 54}, {'ymin': 6.328125, 'xmin': 19.375, 'ymax': 293.4375, 'xmax': 427.1875, 'score': 0.42456055, 'class': 50}]]
time_2: 230
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000151.jpg']
time_1: 10 Image 2 / 1000
len(images): 1
detections: [[{'ymin': 326.875, 'xmin': 211.5625, 'ymax': 365.625, 'xmax': 250.0, 'score': 0.84716797, 'class': 12}, {'ymin': 38.28125, 'xmin': 200.625, 'ymax': 635.0, 'xmax': 473.75, 'score': 0.7636719, 'class': 6}]]
time_2: 100
...
...
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000580385.jpg']
time_1: 11 Image 998 / 1000
len(images): 1
detections: [[{'ymin': 47.265625, 'xmin': 125.15625, 'ymax': 359.375, 'xmax': 525.625, 'score': 0.9370117, 'class': 6}]]
time_2: 114
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000581218.jpg']
time_1: 11 Image 999 / 1000
len(images): 1
detections: [[{'ymin': 309.6875, 'xmin': 241.875, 'ymax': 348.4375, 'xmax': 271.875, 'score': 0.6899414, 'class': 4}, {'ymin': 147.734375, 'xmin': 271.40625, 'ymax': 188.75, 'xmax': 300.46875, 'score': 0.640625, 'class': 4}, {'ymin': 269.375, 'xmin': 260.46875, 'ymax': 310.3125, 'xmax': 289.53125, 'score': 0.6303711, 'class': 4}, {'ymin': 234.53125, 'xmin': 308.125, 'ymax': 274.21875, 'xmax': 339.375, 'score': 0.52734375, 'class': 4}, {'ymin': 273.125, 'xmin': 299.0625, 'ymax': 313.125, 'xmax': 325.625, 'score': 0.48632812, 'class': 4}, {'ymin': 233.125, 'xmin': 282.1875, 'ymax': 268.4375, 'xmax': 312.5, 'score': 0.41870117, 'class': 4}]]
time_2: 170
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000581766.jpg']
time_1: 10 Image 1000 / 1000
len(images): 1
detections: [[{'ymin': 148.5595703125, 'xmin': 204.1015625, 'ymax': 283.69140625, 'xmax': 297.36328125, 'score': 0.8774414, 'class': 69}, {'ymin': 153.80859375, 'xmin': 17.9443359375, 'ymax': 285.64453125, 'xmax': 127.44140625, 'score': 0.81396484, 'class': 69}, {'ymin': 146.728515625, 'xmin': 373.291015625, 'ymax': 278.80859375, 'xmax': 481.689453125, 'score': 0.7519531, 'class': 69}]]
time_2: 113

get_batch time: 158293


Finished Processing

real	3m5.879s
user	2m28.488s
sys	0m5.079s

结论：tesorRT FP 16平均耗时 11ms/张，即 90fps。

显存占用情况

Sat Sep 18 11:22:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   40C    P0    20W /  75W |   1094MiB /  3903MiB |     11%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     19579      C   python                            785MiB |
+-----------------------------------------------------------------------------+

tensorRT INT8

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python infer.py \
    --engine /home/yichao/Downloads/saved_model_trt_int8/engine.trt \
    --input /home/yichao/Downloads/coco_calib \
    --output /home/yichao/Downloads/infer_int8

(tensorRT-efficientdet) yichao@yichao:/media/yichao/蚁巢文件/YOYOFile/YOYOFile/PyProjects/TensorRT/samples/python/efficientdet$ time python infer.py \
>     --engine /media/yichao/蚁巢文件/YOYOFile/EfficientDet/saved_model_trt_int8/engine.trt \
>     --input /media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib \
>     --output /media/yichao/蚁巢文件/YOYOFile/EfficientDet/infer_int8
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000009.jpg']
time_1: 499Image 1 / 1000
len(images): 1
detections: [[{'ymin': 224.6875, 'xmin': 251.40625, 'ymax': 476.5625, 'xmax': 561.875, 'score': 0.57666016, 'class': 55}, {'ymin': 1.09375, 'xmin': 311.5625, 'ymax': 238.75, 'xmax': 630.3125, 'score': 0.50146484, 'class': 50}, {'ymin': 68.125, 'xmin': -3.4375, 'ymax': 458.125, 'xmax': 527.5, 'score': 0.49902344, 'class': 50}]]
time_2: 165
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000151.jpg']
time_1: 6g Image 2 / 1000
len(images): 1
detections: [[{'ymin': 18.28125, 'xmin': 204.6875, 'ymax': 639.375, 'xmax': 505.9375, 'score': 0.5888672, 'class': 6}]]
time_2: 104
...
...
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000580385.jpg']
time_1: 6g Image 998 / 1000
len(images): 1
detections: [[{'ymin': 45.0, 'xmin': 122.03125, 'ymax': 361.875, 'xmax': 525.625, 'score': 0.9316406, 'class': 6}]]
time_2: 120
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000581218.jpg']
time_1: 6g Image 999 / 1000
len(images): 1
detections: [[]]
time_2: 177
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000581766.jpg']
time_1: 6g Image 1000 / 1000
len(images): 1
detections: [[{'ymin': 147.705078125, 'xmin': 204.1015625, 'ymax': 284.423828125, 'xmax': 297.36328125, 'score': 0.7817383, 'class': 69}, {'ymin': 151.3671875, 'xmin': 17.63916015625, 'ymax': 281.982421875, 'xmax': 128.173828125, 'score': 0.7714844, 'class': 69}, {'ymin': 145.99609375, 'xmin': 373.779296875, 'ymax': 277.34375, 'xmax': 482.666015625, 'score': 0.7338867, 'class': 69}]]
time_2: 108

get_batch time: 153597


Finished Processing

real	2m37.937s
user	2m25.907s
sys	0m2.906s

结论：tensorRT INT8平均耗时 7ms/张，即 142fps。

显存占用情况

Tue Sep 28 18:35:05 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 28%   43C    P0    18W /  75W |    935MiB /  3903MiB |     10%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1553      G   /usr/lib/xorg/Xorg                156MiB |
|    0   N/A  N/A      7282      C   python                            775MiB |
+-----------------------------------------------------------------------------+

7. 评估指标

tensorRT FP32

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python eval_coco.py \
    --engine /home/yichao/Downloads/saved_model_trt_fp32/engine.trt \
    --input /home/yichao/Downloads/COCO/val2017 \
    --annotations /home/yichao/Downloads/COCO/annotations/instances_val2017.json \
    --automl_path /home/yichao/MyDocuments/automl

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ time python eval_coco.py \
>     --engine /home/yichao/Downloads/saved_model_trt_fp32/engine.trt \
>     --input /home/yichao/Downloads/COCO/val2017 \
>     --annotations /home/yichao/Downloads/COCO/annotations/instances_val2017.json \
>     --automl_path /home/yichao/MyDocuments/automl
2021-09-18 11:47:43.730693: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Processing Image 5000 / 5000
loading annotations into memory...
Done (t=0.60s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(18145, 7)
0/18145
DONE (t=0.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=8.57s).
Accumulating evaluation results...
DONE (t=1.46s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.282
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.315
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.053
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.319
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.494
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.242
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.315
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.052
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.349
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.562

real	3m11.921s
user	2m3.908s
sys	0m17.163s

显存占用情况

Sat Sep 18 11:47:57 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   45C    P0    38W /  75W |   1146MiB /  3903MiB |     34%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     20581      C   python                            837MiB |
+-----------------------------------------------------------------------------+

tensorRT FP16

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python eval_coco.py \
    --engine /home/yichao/Downloads/saved_model_trt_fp16/engine.trt \
    --input /home/yichao/Downloads/COCO/val2017 \
    --annotations /home/yichao/Downloads/COCO/annotations/instances_val2017.json \
    --automl_path /home/yichao/MyDocuments/automl

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ python eval_coco.py \
>     --engine /home/yichao/Downloads/saved_model_trt_fp16/engine.trt \
>     --input /home/yichao/Downloads/COCO/val2017 \
>     --annotations /home/yichao/Downloads/COCO/annotations/instances_val2017.json \
>     --automl_path /home/yichao/MyDocuments/automl
2021-09-17 18:20:02.568124: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Processing Image 5000 / 5000
loading annotations into memory...
Done (t=0.63s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(18146, 7)
0/18146
DONE (t=0.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=8.83s).
Accumulating evaluation results...
DONE (t=1.46s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.282
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.315
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.053
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.319
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.494
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.242
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.052
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.349
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.562

显存占用情况

Sat Sep 18 11:27:42 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   43C    P0    28W /  75W |   1094MiB /  3903MiB |     36%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     19780      C   python                            785MiB |
+-----------------------------------------------------------------------------+

tensorRT INT8

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python eval_coco.py \
    --engine /home/yichao/Downloads/saved_model_trt_int8/engine.trt \
    --input /home/yichao/Downloads/COCO/val2017 \
    --annotations /home/yichao/Downloads/COCO/annotations/instances_val2017.json \
    --automl_path /home/yichao/MyDocuments/automl

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ time python eval_coco.py \
>     --engine /home/yichao/Downloads/saved_model_trt_int8/engine.trt \
>     --input /home/yichao/Downloads/COCO/val2017 \
>     --annotations /home/yichao/Downloads/COCO/annotations/instances_val2017.json \
>     --automl_path /home/yichao/MyDocuments/automl
2021-09-18 10:20:40.389173: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Processing Image 5000 / 5000
loading annotations into memory...
Done (t=0.65s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(14482, 7)
0/14482
DONE (t=0.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=7.95s).
Accumulating evaluation results...
DONE (t=1.38s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.227
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.318
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.254
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.024
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.231
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.430
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.199
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.254
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.254
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.022
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.245
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.492

real	2m40.845s
user	1m45.329s
sys	0m7.975s

显存占用情况

Sat Sep 18 10:23:19 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   42C    P0    18W /  75W |   1279MiB /  3903MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     16664      C   python                            972MiB |
+-----------------------------------------------------------------------------+

8. 比较原生的tensorflow和tensorRT

tensorRT FP32

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python compare_tf.py \
    --engine /home/yichao/Downloads/saved_model_trt_fp32/engine.trt \
    --saved_model /home/yichao/Downloads/saved_model \
    --input /home/yichao/Downloads/coco_calib \
    --output /home/yichao/Downloads/output_fp32

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ time python compare_tf.py \
>     --engine /home/yichao/Downloads/saved_model_trt_fp32/engine.trt \
>     --saved_model /home/yichao/Downloads/saved_model \
>     --input /home/yichao/Downloads/coco_calib \
>     --output /home/yichao/Downloads/output_fp32
2021-09-18 11:51:59.627768: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-18 11:52:05.397166: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-18 11:52:05.397267: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-18 11:52:05.397505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.68GHz coreCount: 14 deviceMemorySize: 3.81GiB deviceMemoryBandwidth: 119.24GiB/s
2021-09-18 11:52:05.397524: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-18 11:52:05.397555: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-09-18 11:52:05.397577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-09-18 11:52:05.466217: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-18 11:52:05.466545: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-18 11:52:05.466862: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/home/yichao/360Downloads/TensorRT-8.0.1.6/lib:/home/yichao/miniconda3/envs/tensorRT-yolov5/lib/libmkl_intel_lp64.so
2021-09-18 11:52:05.495741: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-09-18 11:52:05.495966: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-09-18 11:52:05.496021: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-18 11:52:05.803052: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-18 11:52:05.803585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-18 11:52:05.803601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
2021-09-18 11:52:12.462233: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-09-18 11:52:12.505942: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2899885000 Hz
Processing 100 / 100 images (TensorFlow)
Processing 100 / 100 images (TensorRT)
Processing 100 / 100 images (Visualization)

real	0m52.613s
user	1m23.590s
sys	0m3.248s

显存占用情况

Sat Sep 18 11:52:37 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 33%   44C    P0    18W /  75W |   1146MiB /  3903MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     20720      C   python                            837MiB |
+-----------------------------------------------------------------------------+

tensorRT FP16

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python compare_tf.py \
    --engine /home/yichao/Downloads/saved_model_trt_fp16/engine.trt \
    --saved_model /home/yichao/Downloads/saved_model \
    --input /home/yichao/Downloads/coco_calib \
    --output /home/yichao/Downloads/output_fp16

(tensorRT-efficientdet) yichao@yichao:/media/yichao/蚁巢文件/YOYOFile/YOYOFile/PyProjects/TensorRT/samples/python/efficientdet$ time python compare_tf.py     --engine /media/yichao/蚁巢文件/YOYOFile/EfficientDet/saved_model_trt_fp16/engine.trt     --saved_model /media/yichao/蚁巢文件/YOYOFile/EfficientDet/saved_model     --input /media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib     --output /media/yichao/蚁巢文件/YOYOFile/EfficientDet/output_fp16
2021-10-22 16:23:31.638538: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-22 16:23:32.897147: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-10-22 16:23:32.897287: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-22 16:23:32.897550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.68GHz coreCount: 14 deviceMemorySize: 3.81GiB deviceMemoryBandwidth: 119.24GiB/s
2021-10-22 16:23:32.897571: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-22 16:23:32.897604: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-10-22 16:23:32.897625: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-10-22 16:23:32.898508: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-10-22 16:23:32.898545: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-10-22 16:23:32.898692: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cuda-10.2/lib64:/home/yichao/MyDocuments/PyProjects/software/TensorRT-7.0.0.11/lib:/home/yichao/miniconda3/envs/tensorRT-yolov5/lib/libmkl_intel_lp64.so
2021-10-22 16:23:32.899198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-10-22 16:23:32.899227: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-22 16:23:32.899236: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-10-22 16:23:33.134158: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-22 16:23:33.134560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-22 16:23:33.134572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
2021-10-22 16:23:40.393064: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-22 16:23:40.438255: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2899885000 Hz
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000009.jpg']
...
...
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000061328.jpg']
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000061399.jpg']
Processing 100 / 100 images (TensorFlow)
get_batch time: 15944


time_1: 15944
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000000009.jpg']
...
...
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000061328.jpg']
len(batch_images): ['/media/yichao/蚁巢文件/YOYOFile/EfficientDet/coco_calib/COCO_train2014_000000061399.jpg']
Processing 100 / 100 images (TensorRT)
get_batch time: 2519


time_2: 2519
Processing 100 / 100 images (Visualization)

real	0m35.517s
user	1m5.616s
sys	0m3.689s

数据一：

测试数据	分辨率	TensorFlow耗时/ms	TensorRT耗时/ms	加速比
COCO，100张	640x480	16658	2577	6.5
COCO，100张	640x480	15944	2519	6.3
COCO，100张	640x480	16982	2559	6.6
COCO，100张	640x480	16026	2533	6.3

数据二：

测试数据	分辨率	TensorFlow耗时/ms	TensorRT耗时/ms	加速比
COCO，100张	640x480	16089	5268	3
COCO，100张	640x480	11888	2509	4.7
COCO，100张	640x480	12012	3008	4
COCO，100张	640x480	11803	3183	3.7
COCO，100张	640x480	11776	3140	3.7

测试数据	分辨率	TensorFlow耗时/ms	TensorRT耗时/ms	加速比
person_horse，100张	1280x720	12891	3452	3.7
person_horse，100张	1280x720	14989	3480	4.3
person_horse，100张	1280x720	15565	3515	4.4
person_horse，100张	1280x720	12883	3456	3.7

注意： $加速比=\frac{TensorFlow耗时}{TensorRT耗时}$

显存占用情况

Sat Sep 18 11:33:21 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 30%   41C    P0    18W /  75W |   1094MiB /  3903MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     20023      C   python                            785MiB |
+-----------------------------------------------------------------------------+

tensorRT INT8

cd /home/yichao/MyDocuments/TensorRT/samples/python/efficientdet

python compare_tf.py \
    --engine /home/yichao/Downloads/saved_model_trt_int8/engine.trt \
    --saved_model /home/yichao/Downloads/saved_model \
    --input /home/yichao/Downloads/coco_calib \
    --output /home/yichao/Downloads/output_int8

(tensorRT-efficientdet) yichao@yichao:~/MyDocuments/TensorRT/samples/python/efficientdet$ time python compare_tf.py \
>     --engine /home/yichao/Downloads/saved_model_trt_int8/engine.trt \
>     --saved_model /home/yichao/Downloads/saved_model \
>     --input /home/yichao/Downloads/coco_calib \
>     --output /home/yichao/Downloads/output_int8
2021-09-18 09:25:28.721580: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-18 09:25:34.855431: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-18 09:25:34.855675: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-18 09:25:34.856186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.68GHz coreCount: 14 deviceMemorySize: 3.81GiB deviceMemoryBandwidth: 119.24GiB/s
2021-09-18 09:25:34.856231: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-09-18 09:25:34.856312: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-09-18 09:25:34.856366: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-09-18 09:25:34.938908: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-18 09:25:34.939165: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-18 09:25:34.939584: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/home/yichao/360Downloads/TensorRT-8.0.1.6/lib:/home/yichao/miniconda3/envs/tensorRT-yolov5/lib/libmkl_intel_lp64.so
2021-09-18 09:25:34.974029: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-09-18 09:25:34.974272: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-09-18 09:25:34.974324: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-18 09:25:35.443757: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-18 09:25:35.445486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-18 09:25:35.445558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
2021-09-18 09:25:42.378910: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-09-18 09:25:42.434208: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2899885000 Hz
Processing 100 / 100 images (TensorFlow)
Processing 100 / 100 images (TensorRT)
Processing 100 / 100 images (Visualization)

real	0m53.631s
user	1m19.881s
sys	0m3.897s

显存占用情况

Sat Sep 18 10:27:43 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   40C    P0    18W /  75W |   1094MiB /  3903MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1465      G   /usr/lib/xorg/Xorg                305MiB |
|    0   N/A  N/A     16827      C   python                            785MiB |
+-----------------------------------------------------------------------------+

你可能感兴趣的:(深度学习,深度学习,tensorRT,EfficientDet)

【Python】已解决：ModuleNotFoundError: No module named ‘sklearn‘ 屿小夏 python sklearn 人工智能
个人简介：某不知名博主，致力于全栈领域的优质博客分享|用最优质的内容带来最舒适的阅读体验！文末获取免费IT学习资料！文末获取更多信息精彩专栏推荐订阅收藏专栏系列直达链接相关介绍书籍分享点我跳转书籍作为获取知识的重要途径，对于IT从业者来说更是不可或缺的资源。不定期更新IT图书，并在评论区抽取随机粉丝，书籍免费包邮到家AI前沿点我跳转探讨人工智能技术领域的最新发展和创新，涵盖机器学习、深度学习、自然
如何快速在Windows 10 + Anaconda 3 中使用Mxnet及gluon qianchess mxnet使用 mxnet win10 anaconda gluon 人工智能
如何快速在Windows10+Anaconda3中使用Mxnet及gluon网络上Mxnet的安装以及使用方法很多，自从其作者之一李沐推出了基于Mxnet的深度学习课程之后，我也尝试着去使用了一下Mxnet。首先第一步就是在自己的系统中安装Mxnet及其相关组建。现在的Mxnet常常会跟其虚拟环境Gluon结合在一起，所以下文就一起阐述一下，顺便记录一下自己踩的坑。注意本文的大部分内容都可以在官网
3D UNet和Swin-UNETR 学無芷境计算机视觉
3DUNet和Swin-UNETR都是用于医学图像分析的深度学习网络，它们对三维（3D）数据进行特征提取和分割。3DUNet3DUNet是UNet架构的一个变体，专门设计用于处理三维医学图像数据。UNet最初是为二维（2D）图像分割任务设计的，具有典型的编码器-解码器结构。3DUNet扩展了这种架构，以便更好地处理具有深度信息的体积数据，如CT或MRI扫描。主要特点：编码器：逐渐下采样图像，提取并
推荐3D UNet实现：深度学习3D体素数据语义分割的利器！滑辰煦Marc
推荐3DUNet实现：深度学习3D体素数据语义分割的利器！去发现同类优质开源项目:https://gitcode.com/在这个快速发展的深度学习时代，3DUNet已经成为3D图像处理领域中不可或缺的工具，尤其在医疗影像分析和3D物体识别等任务上展现出强大的潜力。这个开源项目为我们提供了一个高效、灵活的3DUNet实现，支持Tensorflow、PyTorch和Chainer三种主流深度学习框架。
锐捷路由器网关RG-NBR6135-E和锐捷交换机 Ruijie Reyee RG-ES224GC 电脑登录web方法 zh7314 硬件工程
2025年1月17日22:29:35最近淘了点东西，准备在家里搞一套深度学习的服务器，先把网关和交换机搞到了锐捷路由器网关RG-NBR6135-E电脑登录web方法在拿到机器的时候，如果不是全新建议拿根牙签，差入reset5-10秒,灯光会全部闪几下，重置机器，因为有些机器会配置的ip和网段无法访问默认的web服务ip，在机器上面的默认配置单配置参考：https://baijiahao.baidu
PyTorch机器学习与深度学习技术方法 Teacher.chenchong 机器学习 python 开发语言
近年来，随着AlphaGo、无人驾驶汽车、医学影像智慧辅助诊疗、ImageNet竞赛等热点事件的发生，人工智能迎来了新一轮的发展浪潮。尤其是深度学习技术，在许多行业都取得了颠覆性的成果。另外，近年来，Pytorch深度学习框架受到越来越多科研人员的关注和喜爱。Python基础知识串讲1、Python环境搭建（Python软件下载、安装与版本选择；PyCharm下载、安装；Python之HelloW
深度学习模块C2f代码详解你是狒狒吗目标检测人工智能计算机视觉 pytorch YOLO 神经网络
C2f是一个用于构建卷积神经网络（CNN）的模块，特别是在YOLOv5和YOLOv8等目标检测模型中。这个模块是一个改进的CSP（CrossStagePartial）Bottleneck结构，旨在提高计算效率和特征提取能力。下面是对C2f类的详细解释：类定义和初始化Python复制classC2f(nn.Module):“”“FasterImplementationofCSPBottleneckw
华为 Ascend 平台 YOLOv5 目标检测推理教程 Lunar* 目标检测华为 YOLO 目标检测
1.背景介绍随着人工智能技术的快速发展，目标检测在智能安防、自动驾驶、工业检测等领域中扮演了重要角色。YOLOv5是一种高效的目标检测模型，凭借其速度和精度的平衡广受欢迎。华为Ascend推理框架（ACL）是AscendCANN软件栈的核心组件，专为AscendAI加速硬件（如Atlas300I）设计，可实现高性能的深度学习推理。在本文中，我们将介绍如何基于华为AscendACL推理框架对YOLO
机器学习和深度学习的概念你好呀我是裤裤深度学习笔记机器学习深度学习人工智能
MachineLearning机器学习，可以看作是找一个函数。这个函数是人类找不到的，所以交给机器来找。DifferenttypesofFunctions**Regression：**函数的输出是一个数值forexample：**Classification：**给出选项，让机器去选择。forexample：检测一个邮件是不是垃圾文件，就可以通过这个来做。选项是两个：垃圾文件or非垃圾文件。下面，
Pytorch实现：LSTM-火灾温度预测骑猪玩狗 pytorch lstm 人工智能
本文为365天深度学习训练营中的学习记录博客原作者：K同学啊前期工作语言环境：Python3.9.18编译器：JupyterLab深度学习环境：Pytorch1.12.11.设置GPUimporttorchimporttorch.nnasnnimporttorchvisionfromtorchvisionimporttransforms,datasetsimportos,PIL,pathlibde
深度学习项目--基于LSTM的火灾预测研究(pytorch实现) 羊小猪~~ RNN LSTM神经网络案例机器学习/数据分析案例深度学习 lstm pytorch 人工智能机器学习 rnn gru
本文为365天深度学习训练营中的学习记录博客原作者：K同学啊前言LSTM模型一直是一个很经典的模型，这个模型当然也很复杂，一般需要先学习RNN、GRU模型之后再学，GRU、LSTM的模型讲解将在这两天发布更新，其中：深度学习基础–一文搞懂RNN深度学习基础–GRU学习笔记(李沐《动手学习深度学习》)这一篇：是基于LSTM模型火灾预测研究，讲述了如何构建时间数据、模型如何构建、pytorch中LST
每天五分钟深度学习框架pytorch：基于vgg块搭建VGG卷积神经网络每天五分钟玩转人工智能深度学习框架pytorch 深度学习 pytorch cnn VGG 卷积神经网络
本文重点前面我们使用pytorch搭建了vgg块，本文我们使用vgg块搭建卷积神经网络VGG16，我们先来看一下vgg16的模型结构是什么样的：搭建vgg16importtorchfromtorchimportnndefvgg_block(num_convs,in_channels,out_channels):net=[nn.Conv2d(in_channels,out_channels,kern
深度学习 Pytorch 张量（Tensor）的创建和常用方法白白糖深度学习pytorch python 深度学习 pytorch 人工智能
1张量的基本创建及其类型和Numpy中的array一样，张量的本质也是结构化地组织了大量的数据。并且在实际操作中，张量的创建和基本功能也与其非常类似。1.1张量(Tensor)函数创建方法张量的最基本创建方法和Numpy中创建Array的格式一致。#Numpy创建数组importnumpyasnp#导入numpya=np.array([1,2,3])importtorch#首次使用,导入torch
PyTorch 神经协同过滤 (NCF) 推荐系统教程陌北v1 pytorch python NCF 神经协同过滤
目录教程概述1.神经协同过滤模型概述NCF模型的主要组成部分：2.数据加载与预处理3.定义神经协同过滤模型4.训练模型5.模型评估6.推荐物品7.完整示例8.总结在本教程中，我们将使用PyTorch实现一个神经协同过滤（NeuralCollaborativeFiltering，简称NCF）推荐系统。神经协同过滤是一种基于深度学习的推荐系统模型，通过学习用户和物品的嵌入表示来预测用户对物品的评分，进
【大模型LoRa微调】Qwen2.5 Coder 指令微调【代码已开源】 FF-Studio 大语言模型开源
本文需要用到的代码已经放在GitHub的仓库啦，别忘了给仓库点个小心心~~~https://github.com/LFF8888/FF-Studio-Resources第001个文件哦~一、引言：大语言模型与指令微调1.1大语言模型发展简史随着深度学习的飞速发展，特别是Transformer架构在自然语言处理（NLP）领域的成功，大语言模型（LLM,LargeLanguageModel）成为近年来
10 个免费的 AI 图片生成工具分享程序员
原文：https://openaigptguide.com/ai-picture-generator/在人工智能（AI）图像生成技术的推动下，各类AI图片生成网站如雨后春笋般涌现，为我们的日常生活提供了丰富多彩的视觉体验。AI图片生成技术原理人工智能（AI）图片生成技术原理是通过计算机程序使用深度学习算法从大量的数据中学习特征，并根据特征创建新的图片。该技术可以模拟人类的绘画过程，学习输入图像的潜
假新闻检测论文（24）A comprehensive survey of multimodal fake news detection techniques... weixin_41964296 假新闻检测自然语言处理
本文综述了利用深度学习架构和注意力机制进行假新闻检测的最新和全面的研究一介绍假新闻定义：虚假或误导性新闻，或“假新闻”，是任何捏造或故意欺骗的媒体内容。假新闻危害：它可以被利用来操纵公众情绪，传播错误信息，甚至干预政治选举。它的主要目的是扭曲、欺骗或操纵个人的信仰和观点。假新闻的形式（类型）：虚假信息在媒体上传播的形式多种多样，包括讽刺、谣言、点击诱饵、错误信息等。讽刺作品通常充满幽默，用来强调特
Jetson Nano部署TensorRT加速的YOLO V8 Hylan_J 学习项目 YOLO github arm开发
JetsonNano部署YOLOV8刷JetPack镜像Step1：格式化SD卡Step2：下载镜像文件Step3：烧录镜像文件Python环境配置Solution1：更改默认的Python环境Solution2：使用conda管理Python环境YOLOV8部署Step1：源码下载Step2：trtexec环境变量设置Step3：pt权重文件导出为ONNX格式Step4：ONNX文件生成engi
YOLOv8重磅升级：引入DenseOne密集网络革新主干设计，重塑YOLO目标检测性能新高度程序员杨弋 YOLO 目标检测人工智能
随着深度学习技术的不断进步，目标检测作为计算机视觉领域的重要任务之一，其性能和应用范围也在不断扩大。作为目标检测领域的佼佼者，YOLO（YouOnlyLookOnce）系列算法以其出色的性能和实时性受到了广泛关注。而最近提出的YOLOv8更是在前代版本的基础上进行了多项优化，进一步提升了检测精度和速度。然而，尽管YOLOv8已经取得了显著的进步，但在处理复杂场景和遮挡问题时，仍然存在一定的挑战。为
深度学习驱动的极端天气预测：时空数据异常检测与应用全解析（基于Python + TensorFlow） AI_DL_CODE 深度学习 python tensorflow 人工智能天气预测
摘要：时空数据异常检测在气象领域识别偏离正常模式的数据点，对极端天气预测至关重要。深度学习，尤其是LSTM网络，因其强大的特征学习能力在该领域显示出巨大潜力。通过整合多源气象数据，深度学习模型能够自动挖掘复杂模式和非线性关系，提高预测准确性。然而，挑战依然存在，包括数据质量问题、模型可解释性不足以及极端天气的内在复杂性和不确定性。未来，通过模型架构创新、训练算法优化以及探索深度学习在气候预测、气象
基于深度学习的人脸表情识别系统：YOLOv5 + YOLOv8 + YOLOv10 + UI界面 + 数据集 2025年数学建模美赛深度学习 YOLO ui 分类人工智能
引言随着人工智能的飞速发展，深度学习技术已广泛应用于各个领域，尤其是在计算机视觉领域。人脸识别和表情识别是其中的一个重要应用，能够在多种场景下提供重要的信息，例如安全监控、情感分析、智能客服、健康监测等。在人脸表情识别任务中，准确识别人脸的情感状态（如高兴、愤怒、悲伤等）是一个极具挑战性的任务。随着YOLO系列算法的不断进步，YOLOv5、YOLOv8和YOLOv10的推出大大提高了目标检测的精度
基于YOLOv8深度学习的人脸年龄检测识别系统 2025年数学建模美赛 YOLO 深度学习人工智能 ui 数据挖掘分类
引言随着人工智能和计算机视觉的飞速发展，人脸分析技术在年龄检测领域取得了显著进展。人脸年龄检测系统在安全监控、广告推荐、健康监测等领域有广泛应用。本文将基于YOLOv8目标检测模型和UI界面，开发一个完整的人脸年龄检测识别系统。我们将详细介绍项目的技术实现、数据集构建、模型训练以及UI设计，并附上完整代码。目录引言系统架构设计数据准备公开人脸年龄数据集数据标注格式数据目录结构模型训练YOLOv8环
基于深度学习的人脸表情识别系统（YOLOv10+UI界面+数据集） 2025年数学建模美赛深度学习 YOLO ui 计算机视觉人工智能目标跟踪
在本篇博客中，我们将详细介绍如何构建一个基于深度学习的人脸表情识别系统。该系统主要由三部分组成：YOLOv10（深度学习模型）进行表情识别、UI界面展示识别结果以及数据集的准备和训练过程。我们将从系统架构、数据准备、模型训练、UI设计等多个方面进行全面讲解，最终实现一个能够实时识别并展示人脸表情的系统。目录1.系统架构2.数据集准备2.1FER2013数据集2.2数据预处理3.YOLOv10模型概
基于深度学习的人脸表情识别系统：YOLOv8 + UI界面 + 数据集完整实现 2025年数学建模美赛深度学习 YOLO ui 人工智能代码
1.引言近年来，人脸表情识别在情感计算、智能人机交互、心理学研究等领域有着广泛的应用。深度学习的快速发展，使得高效、准确的人脸表情识别成为可能。通过利用卷积神经网络（CNN）和目标检测技术，可以实现实时、精准的人脸表情识别。本文将基于YOLOv8构建一个完整的人脸表情识别系统。系统集成了数据集准备、YOLOv8模型训练、实时推理以及基于PyQt5的图形用户界面（UI）。通过本文，你将学习如何实现一
AI大模型应用架构（ALLMA）白皮书解读百度_开发者中心人工智能大模型数据库自然语言处理
随着人工智能技术的不断发展，AI大模型成为推动生产、生活方式变革，助推产业智能化转型升级，驱动数字经济高质量发展等社会经济发展方面的新引擎。为了全面展示AI大模型的发展全貌，为各界提供新思路，本文将对AI大模型应用架构（ALLMA）白皮书进行解读。一、AI大模型应用架构（ALLMA）的内涵AI大模型应用架构（ALLMA）是一种基于深度学习的人工智能应用架构，旨在通过大规模无标注数据预训练、指令微调
Web APP 阶段性综述预测模型的开发与应用研究 APP construction web app
WebAPP阶段性综述当前，WebAPP主要应用于电脑端，常被用于部署数据分析、机器学习及深度学习等高算力需求的任务。在医学与生物信息学领域，WebAPP扮演着重要角色。在生物信息学领域，诸多工具以WebAPP的形式呈现，相较之下，医学领域的此类应用数量相对较少。在医学和生物信息学的学术论文中，WebAPP是展示研究成果的有效工具，并且还能部署到网络上，服务于实际应用场景。ShinyAPP平台特性
气象海洋水文领域Python机器学习及深度学习实践应用能力提升 AAIshangyanxiu 农林生态遥感编程算法统计语言大气科学 python 机器学习深度学习
Python是功能强大、免费、开源，实现面向对象的编程语言，能够在不同操作系统和平台使用，简洁的语法和解释性语言使其成为理想的脚本语言。除了标准库，还有丰富的第三方库，Python在数据处理、科学计算、数学建模、数据挖掘和数据可视化方面具备优异的性能。上述优势使得Python在气象、海洋、地理、气候、水文和生态等地学领域的科研和工程项目中得到广泛应用。可以预见未来Python将成为气象、海洋和水文
【昇思25天学习打卡营打卡指南-第一天】基本介绍与快速入门 JeffDingAI MindSpore 学习
昇思MindSpore介绍昇思MindSpore是一个全场景深度学习框架，旨在实现易开发、高效执行、全场景统一部署三大目标。其中，易开发表现为API友好、调试难度低；高效执行包括计算效率、数据预处理效率和分布式训练效率；全场景则指框架同时支持云、边缘以及端侧场景。昇思MindSpore总体架构如下图所示：ModelZoo（模型库）：ModelZoo提供可用的深度学习算法网络，也欢迎更多开发者贡献新
NLP-语义解析(Text2SQL)：技术路线【Seq2Seq、模板槽位填充、中间表达、强化学习、图网络】 u013250861 #自然语言处理人工智能
目前关于NL2SQL技术路线的发展主要包含以下几种:Seq2Seq方法：在深度学习的研究背景下,很多研究人员将Text-to-SQL看作一个类似神经机器翻译的任务,主要采取Seq2Seq的模型框架。基线模型Seq2Seq在加入Attention、Copying等机制后,能够在ATIS、GeoQuery数据集上达到84%的精确匹配,但是在WikiSQL数据集上只能达到23.3%的精确匹配,37.0%
PyTorch 中的 expand 操作详解：用法、原理与技巧专业发呆业余科研深度模型底层原理 pytorch 人工智能 python 深度学习机器学习
在使用PyTorch进行深度学习时，张量形状与广播机制常常是让初学者感到困惑的地方。我们需要时常面对多维张量，并在批量、通道、空间位置等多个维度之间做运算。如果能熟练掌握各种维度变换操作——包括unsqueeze、expand、view/reshape、transpose/permute等，可以帮助我们灵活地操纵张量，写出高效而简洁的矩阵化（vectorized）代码。本文将重点聚焦于expand
JVM StackMapTable 属性的作用及理解 lijingyao8206 jvm 字节码 Class文件 StackMapTable
在Java 6版本之后JVM引入了栈图(Stack Map Table)概念。为了提高验证过程的效率，在字节码规范中添加了Stack Map Table属性，以下简称栈图，其方法的code属性中存储了局部变量和操作数的类型验证以及字节码的偏移量。也就是一个method需要且仅对应一个Stack Map Table。在Java 7版
回调函数调用方法百合不是茶 java
最近在看大神写的代码时,.发现其中使用了很多的回调 ,以前只是在学习的时候经常用到 ,现在写个笔记记录一下代码很简单: MainDemo :调用方法得到方法的返回结果
[时间机器]制造时间机器需要一些材料 comsci 制造
根据我的计算和推测,要完全实现制造一台时间机器,需要某些我们这个世界不存在的物质和材料... 甚至可以这样说,这种材料和物质,我们在反应堆中也无法获得......
开口埋怨不如闭口做事邓集海邓集海做人做事工作
“开口埋怨，不如闭口做事。”不是名人名言，而是一个普通父亲对儿子的训导。但是，因为这句训导，这位普通父亲却造就了一个名人儿子。这位普通父亲造就的名人儿子，叫张明正。　　　　张明正出身贫寒，读书时成绩差，常挨老师批评。高中毕业，张明正连普通大学的分数线都没上。高考成绩出来后，平时开口怨这怨那的张明正，不从自身找原因，而是不停地埋怨自己家庭条件不好、埋怨父母没有给他创造良好的学习环境。　　　　
jQuery插件开发全解析，类级别与对象级别开发 IT独行者 jquery 开发插件　函数
jQuery插件的开发包括两种：一种是类级别的插件开发，即给 jQuery添加新的全局函数，相当于给 jQuery类本身添加方法。 jQuery的全局函数就是属于 jQuery命名空间的函数，另一种是对象级别的插件开发，即给 jQuery对象添加方法。下面就两种函数的开发做详细的说明。 1 、类级别的插件开发类级别的插件开发最直接的理解就是给jQuer
Rome解析Rss 413277409 Rome解析Rss
import java.net.URL; import java.util.List; import org.junit.Test; import com.sun.syndication.feed.synd.SyndCategory; import com.sun.syndication.feed.synd.S
RSA加密解密无量加密解密 rsa
RSA加密解密代码代码有待整理 package com.tongbanjie.commons.util; import java.security.Key; import java.security.KeyFactory; import java.security.KeyPair; import java.security.KeyPairGenerat
linux 软件安装遇到的问题 aichenglong linux 遇到的问题 ftp
1 ftp配置中遇到的问题 500 OOPS: cannot change directory 出现该问题的原因:是SELinux安装机制的问题.只要disable SELinux就可以了修改方法:1 修改/etc/selinux/config 中SELINUX=disabled 2 source /etc
面试心得 alafqq 面试
最近面试了好几家公司。记录下；支付宝，面试我的人胖胖的，看着人挺好的；博彦外包的职位，面试失败；阿里金融，面试官人也挺和善，只不过我让他吐血了。。。由于印象比较深，记录下； 1，自我介绍 2，说下八种基本类型；（算上string。楼主才答了3种，哈哈，string其实不是基本类型，是引用类型） 3，什么是包装类，包装类的优点； 4，平时看过什么书？NND，什么书都没看过。。照样
java的多态性探讨百合不是茶 java
java的多态性是指main方法在调用属性的时候类可以对这一属性做出反应的情况 //package 1; class A{ public void test(){ System.out.println("A"); } } class D extends A{ public void test(){ S
网络编程基础篇之JavaScript-学习笔记 bijian1013 JavaScript
1.documentWrite <html> <head> <script language="JavaScript"> document.write("这是电脑网络学校"); document.close(); </script> </h
探索JUnit4扩展：深入Rule bijian1013 JUnit Rule 单元测试
本文将进一步探究Rule的应用，展示如何使用Rule来替代@BeforeClass，@AfterClass，@Before和@After的功能。在上一篇中提到，可以使用Rule替代现有的大部分Runner扩展，而且也不提倡对Runner中的withBefores()，withAfte
[CSS]CSS浮动十五条规则 bit1129 css
这些浮动规则，主要是参考CSS权威指南关于浮动规则的总结，然后添加一些简单的例子以验证和理解这些规则。 1. 所有的页面元素都可以浮动 2. 一个元素浮动后，会成为块级元素，比如<span>,a, strong等都会变成块级元素 3.一个元素左浮动，会向最近的块级父元素的左上角移动，直到浮动元素的左外边界碰到块级父元素的左内边界；如果这个块级父元素已经有浮动元素停靠了
【Kafka六】Kafka Producer和Consumer多Broker、多Partition场景 bit1129 partition
0.Kafka服务器配置 3个broker 1个topic，6个partition，副本因子是2 2个consumer，每个consumer三个线程并发读取 1. Producer package kafka.examples.multibrokers.producers; import java.util.Properties; import java.util.
zabbix_agentd.conf配置文件详解 ronin47 zabbix 配置文件
Aliaskey的别名，例如 Alias=ttlsa.userid:vfs.file.regexp[/etc/passwd,^ttlsa:.:([0-9]+),,,,\1]，或者ttlsa的用户ID。你可以使用key：vfs.file.regexp[/etc/passwd,^ttlsa:.: ([0-9]+),,,,\1]，也可以使用ttlsa.userid。备注: 别名不能重复，但是可以有多个
java--19.用矩阵求Fibonacci数列的第N项 bylijinnan fibonacci
参考了网上的思路，写了个Java版的： public class Fibonacci { final static int[] A={1,1,1,0}; public static void main(String[] args) { int n=7; for(int i=0;i<=n;i++){ int f=fibonac
Netty源码学习-LengthFieldBasedFrameDecoder bylijinnan java netty
先看看LengthFieldBasedFrameDecoder的官方API http://docs.jboss.org/netty/3.1/api/org/jboss/netty/handler/codec/frame/LengthFieldBasedFrameDecoder.html API举例说明了LengthFieldBasedFrameDecoder的解析机制，如下：实
AES加密解密 chicony 加密解密
AES加解密算法，使用Base64做转码以及辅助加密： package com.wintv.common; import javax.crypto.Cipher; import javax.crypto.spec.IvParameterSpec; import javax.crypto.spec.SecretKeySpec; import sun.misc.BASE64Decod
文件编码格式转换 ctrain 编码格式
package com.test; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream;
mysql 在linux客户端插入数据中文乱码 daizj mysql 中文乱码
1、查看系统客户端，数据库，连接层的编码查看方法： http://daizj.iteye.com/blog/2174993 进入mysql，通过如下命令查看数据库编码方式： mysql> show variables like 'character_set_%'; +--------------------------+------
好代码是廉价的代码 dcj3sjt126com 程序员读书
长久以来我一直主张：好代码是廉价的代码。当我跟做开发的同事说出这话时，他们的第一反应是一种惊愕，然后是将近一个星期的嘲笑，把它当作一个笑话来讲。当他们走近看我的表情、知道我是认真的时，才收敛一点。当最初的惊愕消退后，他们会用一些这样的话来反驳： “好代码不廉价，好代码是采用经过数十年计算机科学研究和积累得出的最佳实践设计模式和方法论建立起来的精心制作的程序代码。” 我只
Android网络请求库——android-async-http dcj3sjt126com android
在iOS开发中有大名鼎鼎的ASIHttpRequest库，用来处理网络请求操作，今天要介绍的是一个在Android上同样强大的网络请求库android-async-http，目前非常火的应用Instagram和Pinterest的Android版就是用的这个网络请求库。这个网络请求库是基于Apache HttpClient库之上的一个异步网络请求处理库，网络处理均基于Android的非UI线程，通
ORACLE 复习笔记之SQL语句的优化 eksliang SQL优化 Oracle sql语句优化 SQL语句的优化
转载请出自出处：http://eksliang.iteye.com/blog/2097999 SQL语句的优化总结如下 sql语句的优化可以按照如下六个步骤进行：合理使用索引避免或者简化排序消除对大表的扫描避免复杂的通配符匹配调整子查询的性能 EXISTS和IN运算符下面我就按照上面这六个步骤分别进行总结：
浅析：Android 嵌套滑动机制（NestedScrolling） gg163 android 移动开发滑动机制嵌套
谷歌在发布安卓 Lollipop版本之后，为了更好的用户体验，Google为Android的滑动机制提供了NestedScrolling特性 NestedScrolling的特性可以体现在哪里呢？ 比如你使用了Toolbar，下面一个ScrollView，向上滚
使用hovertree菜单作为后台导航 hvt JavaScript jquery .net hovertree asp.net
hovertree是一个jquery菜单插件，官方网址：http://keleyi.com/jq/hovertree/ ，可以登录该网址体验效果。 0.1.3版本：http://keleyi.com/jq/hovertree/demo/demo.0.1.3.htm hovertree插件包含文件： http://keleyi.com/jq/hovertree/css
SVG 教程（二）矩形天梯梦 svg
SVG <rect> SVG Shapes SVG有一些预定义的形状元素，可被开发者使用和操作：矩形 <rect> 圆形 <circle> 椭圆 <ellipse> 线 <line> 折线 <polyline> 多边形 <polygon> 路径 <path>
一个简单的队列 luyulong java 数据结构队列
public class MyQueue { private long[] arr; private int front; private int end; // 有效数据的大小 private int elements; public MyQueue() { arr = new long[10]; elements = 0; front
基础数据结构和算法九：Binary Search Tree sunwinner Algorithm
A binary search tree (BST) is a binary tree where each node has a Comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all
项目出现的一些问题和体会 Steven-Walker DAO Web servlet
第一篇博客不知道要写点什么，就先来点近阶段的感悟吧。这几天学了servlet和数据库等知识，就参照老方的视频写了一个简单的增删改查的，完成了最简单的一些功能，使用了三层架构。 dao层完成的是对数据库具体的功能实现，service层调用了dao层的实现方法，具体对servlet提供支持。 &
高手问答：Java老A带你全面提升Java单兵作战能力！ ITeye管理员 java
本期特邀《Java特种兵》作者：谢宇，CSDN论坛ID: xieyuooo 针对JAVA问题给予大家解答，欢迎网友积极提问，与专家一起讨论! 作者简介：淘宝网资深Java工程师，CSDN超人气博主，人称“胖哥”。 CSDN博客地址： http://blog.csdn.net/xieyuooo 作者在进入大学前是一个不折不扣的计算机白痴，曾经被人笑话过不懂鼠标是什么，