【Windows/C++/yolo开发部署03】将实例分割模型ONNX导出为 TensorRT 引擎:完整记录

【完整项目下载地址】:

【TensorRT部署YOLO项目:实例分割+目标检测】+【C++和python两种方式】+【支持linux和windows】资源-CSDN文库

目录

写在前面

环境准备

1. 使用 trtexec 将 ONNX 模型转换为 TensorRT 引擎

2. 验证 TensorRT 引擎

2.1 TensorRT 版本

2.2 GPU 信息

2.3 TensorRT 引擎信息

2.4 推理请求

2.5 推理性能

2.6 警告信息

2.7 其他信息

2.8 总结

踩坑与解决

1. nvinfer_plugin.dll 文件缺失

2. 文件保存失败

总结


写在前面

在深度学习模型部署中,将模型从 PyTorch 格式(.pt)转换为 TensorRT 引擎格式是一种常见的需求。TensorRT 引擎具有高性能和低延迟的特点,适用于实时推理场景。

前序文章【Windows/C++/yolo开发部署02:正确方法】将自定义实例分割模型导出为 ONNX 格式-CSDN博客,我们已经实现了将自定义实例分割模型导出为 ONNX 格式。

本文将详细记录如何将实例分割模型 best.onnx 导出为 TensorRT 引擎格式,包括踩坑过程和解决方案。

环境准备

在开始之前,确保您已经安装了以下必要的工具和库:

  1. Ultralytics YOLO:用于模型训练和导出。【前序博客中已经安装】

  2. ONNX:用于模型转换。【前序博客中已经安装】

  3. ONNX Runtime:用于模型推理。【前序博客中已经安装】

  4. TensorRT:用于模型转换和推理。【前序博客中已经安装】

1. 使用 trtexec 将 ONNX 模型转换为 TensorRT 引擎

使用 trtexec 工具将简化后的 ONNX 模型转换为 TensorRT 引擎:

trtexec --onnx=best.onnx --saveEngine=models/best.engine --fp16 --staticPlugins="E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll" --setPluginsToSerialize="E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll"

【Windows/C++/yolo开发部署03】将实例分割模型ONNX导出为 TensorRT 引擎:完整记录_第1张图片

输出结果:

(tensorrt_yolo_export_pt_2_onnx--2) PS E:\ultralytics-main\runs\segment\train4\weights> trtexec --onnx=best.onnx --saveEngine=models/best.engine --fp16 --staticPlugins="E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll" --setPluginsToSerialize="E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll"
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # C:\Program Files\NVIDIA GPU Computing Toolkit\TensorRT\v8.6\bin\trtexec.exe --onnx=best.onnx --saveEngine=models/best.engine --fp16 --staticPlugins=E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll --setPluginsToSerialize=E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll
[02/08/2025-09:06:59] [I] === Model Options ===
[02/08/2025-09:06:59] [I] Format: ONNX
[02/08/2025-09:06:59] [I] Model: best.onnx
[02/08/2025-09:06:59] [I] Output:
[02/08/2025-09:06:59] [I] === Build Options ===
[02/08/2025-09:06:59] [I] Max batch: explicit batch
[02/08/2025-09:06:59] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[02/08/2025-09:06:59] [I] minTiming: 1
[02/08/2025-09:06:59] [I] avgTiming: 8
[02/08/2025-09:06:59] [I] Precision: FP32+FP16
[02/08/2025-09:06:59] [I] LayerPrecisions:
[02/08/2025-09:06:59] [I] Layer Device Types:
[02/08/2025-09:06:59] [I] Calibration:
[02/08/2025-09:06:59] [I] Refit: Disabled
[02/08/2025-09:06:59] [I] Version Compatible: Disabled
[02/08/2025-09:06:59] [I] TensorRT runtime: full
[02/08/2025-09:06:59] [I] Lean DLL Path:
[02/08/2025-09:06:59] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/08/2025-09:06:59] [I] Exclude Lean Runtime: Disabled
[02/08/2025-09:06:59] [I] Sparsity: Disabled
[02/08/2025-09:06:59] [I] Safe mode: Disabled
[02/08/2025-09:06:59] [I] Build DLA standalone loadable: Disabled
[02/08/2025-09:06:59] [I] Allow GPU fallback for DLA: Disabled
[02/08/2025-09:06:59] [I] DirectIO mode: Disabled
[02/08/2025-09:06:59] [I] Restricted mode: Disabled
[02/08/2025-09:06:59] [I] Skip inference: Disabled
[02/08/2025-09:06:59] [I] Save engine: models/best.engine
[02/08/2025-09:06:59] [I] Load engine:
[02/08/2025-09:06:59] [I] Profiling verbosity: 0
[02/08/2025-09:06:59] [I] Tactic sources: Using default tactic sources
[02/08/2025-09:06:59] [I] timingCacheMode: local
[02/08/2025-09:06:59] [I] timingCacheFile:
[02/08/2025-09:06:59] [I] Heuristic: Disabled
[02/08/2025-09:06:59] [I] Preview Features: Use default preview flags.
[02/08/2025-09:06:59] [I] MaxAuxStreams: -1
[02/08/2025-09:06:59] [I] BuilderOptimizationLevel: -1
[02/08/2025-09:06:59] [I] Input(s)s format: fp32:CHW
[02/08/2025-09:06:59] [I] Output(s)s format: fp32:CHW
[02/08/2025-09:06:59] [I] Input build shapes: model
[02/08/2025-09:06:59] [I] Input calibration shapes: model
[02/08/2025-09:06:59] [I] === System Options ===
[02/08/2025-09:06:59] [I] Device: 0
[02/08/2025-09:06:59] [I] DLACore:
[02/08/2025-09:06:59] [I] Plugins: E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll
[02/08/2025-09:06:59] [I] setPluginsToSerialize: E:\TensorRT-YOLO\lib\plugin\custom_plugins.dll
[02/08/2025-09:06:59] [I] dynamicPlugins:
[02/08/2025-09:06:59] [I] ignoreParsedPluginLibs: 0
[02/08/2025-09:06:59] [I]
[02/08/2025-09:06:59] [I] === Inference Options ===
[02/08/2025-09:06:59] [I] Batch: Explicit
[02/08/2025-09:06:59] [I] Input inference shapes: model
[02/08/2025-09:06:59] [I] Iterations: 10
[02/08/2025-09:06:59] [I] Duration: 3s (+ 200ms warm up)
[02/08/2025-09:06:59] [I] Sleep time: 0ms
[02/08/2025-09:06:59] [I] Idle time: 0ms
[02/08/2025-09:06:59] [I] Inference Streams: 1
[02/08/2025-09:06:59] [I] ExposeDMA: Disabled
[02/08/2025-09:06:59] [I] Data transfers: Enabled
[02/08/2025-09:06:59] [I] Spin-wait: Disabled
[02/08/2025-09:06:59] [I] Multithreading: Disabled
[02/08/2025-09:06:59] [I] CUDA Graph: Disabled
[02/08/2025-09:06:59] [I] Separate profiling: Disabled
[02/08/2025-09:06:59] [I] Time Deserialize: Disabled
[02/08/2025-09:06:59] [I] Time Refit: Disabled
[02/08/2025-09:06:59] [I] NVTX verbosity: 0
[02/08/2025-09:06:59] [I] Persistent Cache Ratio: 0
[02/08/2025-09:06:59] [I] Inputs:
[02/08/2025-09:06:59] [I] === Reporting Options ===
[02/08/2025-09:06:59] [I] Verbose: Disabled
[02/08/2025-09:06:59] [I] Averages: 10 inferences
[02/08/2025-09:06:59] [I] Percentiles: 90,95,99
[02/08/2025-09:06:59] [I] Dump refittable layers:Disabled
[02/08/2025-09:06:59] [I] Dump output: Disabled
[02/08/2025-09:06:59] [I] Profile: Disabled
[02/08/2025-09:06:59] [I] Export timing to JSON file:
[02/08/2025-09:06:59] [I] Export output to JSON file:
[02/08/2025-09:06:59] [I] Export profile to JSON file:
[02/08/2025-09:06:59] [I]
[02/08/2025-09:06:59] [I] === Device Information ===
[02/08/2025-09:06:59] [I] Selected Device: NVIDIA GeForce RTX 3050 Ti Laptop GPU
[02/08/2025-09:06:59] [I] Compute Capability: 8.6
[02/08/2025-09:06:59] [I] SMs: 20
[02/08/2025-09:06:59] [I] Device Global Memory: 4095 MiB
[02/08/2025-09:06:59] [I] Shared Memory per SM: 100 KiB
[02/08/2025-09:06:59] [I] Memory Bus Width: 128 bits (ECC disabled)
[02/08/2025-09:06:59] [I] Application Compute Clock Rate: 1.485 GHz
[02/08/2025-09:06:59] [I] Application Memory Clock Rate: 6.001 GHz
[02/08/2025-09:06:59] [I]
[02/08/2025-09:06:59] [I] Note: The application cloc

你可能感兴趣的:(CV计算机视觉,Ultralytics,yolo,实例分割,模型部署)