冬日and暖阳

TensorRT trtexec的用法说明

TensorRT Command-Line Wrapper: trtexec

Table Of Contents

Description
Building trtexec
Using trtexec
- Example 1: Simple MNIST model from Caffe
- Example 2: Profiling a custom layer
- Example 3: Running a network on DLA
- Example 4: Running an ONNX model with full dimensions and dynamic shapes
- Example 5: Collecting and printing a timing trace
- Example 6: Tune throughput with multi-streaming
Tool command line arguments
Additional resources
License
Changelog
Known issues

Description

Included in the samples directory is a command line wrapper tool, called trtexec. trtexec is a tool to quickly utilize TensorRT without having to develop your own application. The trtexec tool has two main purposes:

It’s useful for benchmarking networks on random data.
It’s useful for generating serialized engines from models.

Benchmarking network - If you have a model saved as a UFF file, ONNX file, or if you have a network description in a Caffe prototxt format, you can use the trtexec tool to test the performance of running inference on your network using TensorRT. The trtexec tool has many options for specifying inputs and outputs, iterations for performance timing, precision allowed, and other options.

Serialized engine generation - If you generate a saved serialized engine file, you can pull it into another application that runs inference. For example, you can use the TensorRT Laboratory to run the engine with multiple execution contexts from multiple threads in a fully pipelined asynchronous way to test parallel inference performance. There are some caveats, for example, if you used a Caffe prototxt file and a model is not supplied, random weights are generated. Also, in INT8 mode, random weights are used, meaning trtexec does not provide calibration capability.

Building `trtexec`

trtexec can be used to build engines, using different TensorRT features (see command line arguments), and run inference. trtexec also measures and reports execution time and can be used to understand performance and possibly locate bottlenecks.

Compile this sample by running make in the /samples/trtexec directory. The binary named trtexec will be created in the /bin directory.

cd /samples/trtexec
make

Where is where you installed TensorRT.

Using `trtexec`

trtexec can build engines from models in Caffe, UFF, or ONNX format.

Example 1: Simple MNIST model from Caffe

The example below shows how to load a model description and its weights, build the engine that is optimized for batch size 16, and save it to a file.
trtexec --deploy=/path/to/mnist.prototxt --model=/path/to/mnist.caffemodel --output=prob --batch=16 --saveEngine=mnist16.trt

Then, the same engine can be used for benchmarking; the example below shows how to load the engine and run inference on batch 16 inputs (randomly generated).
trtexec --loadEngine=mnist16.trt --batch=16

Example 2: Profiling a custom layer

You can profile a custom layer using the IPluginRegistry for the plugins and trtexec. You’ll need to first register the plugin with IPluginRegistry.

If you are using TensorRT shipped plugins, you should load the libnvinfer_plugin.so file, as these plugins are pre-registered.

If you have your own plugin, then it has to be registered explicitly. The following macro can be used to register the plugin creator YourPluginCreator with the IPluginRegistry.
REGISTER_TENSORRT_PLUGIN(YourPluginCreator);

Example 3: Running a network on DLA

To run the AlexNet network on NVIDIA DLA (Deep Learning Accelerator) using trtexec in FP16 mode, issue:

./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt --output=prob --useDLACore=1 --fp16 --allowGPUFallback

To run the AlexNet network on DLA using trtexec in INT8 mode, issue:

./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt --output=prob --useDLACore=1 --int8 --allowGPUFallback

To run the MNIST network on DLA using trtexec, issue:

./trtexec --deploy=data/mnist/mnist.prototxt --output=prob --useDLACore=0 --fp16 --allowGPUFallback

For more information about DLA, see Working With DLA.

Example 4: Running an ONNX model with full dimensions and dynamic shapes

To run an ONNX model in full-dimensions mode with static input shapes:

./trtexec --onnx=model.onnx

The following examples assumes an ONNX model with one dynamic input with name input and dimensions [-1, 3, 244, 244]

To run an ONNX model in full-dimensions mode with an given input shape:

./trtexec --onnx=model.onnx --shapes=input:32x3x244x244

To benchmark your ONNX model with a range of possible input shapes:

./trtexec --onnx=model.onnx --minShapes=input:1x3x244x244 --optShapes=input:16x3x244x244 --maxShapes=input:32x3x244x244 --shapes=input:5x3x244x244

Example 5: Collecting and printing a timing trace

When running, trtexec prints the measured performance, but can also export the measurement trace to a json file:

./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt --output=prob --exportTimes=trace.json

Once the trace is stored in a file, it can be printed using the tracer.py utility. This tool prints timestamps and duration of input, compute, and output, in different forms:

./tracer.py trace.json

Similarly, profiles can also be printed and stored in a json file. The utility profiler.py can be used to read and print the profile from a json file.

Example 6: Tune throughput with multi-streaming

Tuning throughput may require running multiple concurrent streams of execution. This is the case for example when the latency achieved is well within the desired
threshold, and we can increase the throughput, even at the expense of some latency. For example, saving engines for batch sizes 1 and 2 and assume that both
execute within 2ms, the latency threshold:

trtexec --deploy=GoogleNet_N2.prototxt --output=prob --batch=1 --saveEngine=g1.trt --int8 --buildOnly
trtexec --deploy=GoogleNet_N2.prototxt --output=prob --batch=2 --saveEngine=g2.trt --int8 --buildOnly

Now, the saved engines can be tried to find the combination batch/streams below 2 ms that maximizes the throughput:

trtexec --loadEngine=g1.trt --batch=1 --streams=2
trtexec --loadEngine=g1.trt --batch=1 --streams=3
trtexec --loadEngine=g1.trt --batch=1 --streams=4
trtexec --loadEngine=g2.trt --batch=2 --streams=2

Tool command line arguments

To see the full list of available options and their descriptions, issue the ./trtexec --help command.

Note: Specifying the --safe parameter turns the safety mode switch ON. By default, the --safe parameter is not specified; the safety mode switch is OFF. The layers and parameters that are contained within the --safe subset are restricted if the switch is set to ON. The switch is used for prototyping the safety restricted flows until the TensorRT safety runtime is made available. For more information, see the Working With Automotive Safety section in the TensorRT Developer Guide.

Additional resources

The following resources provide more details about trtexec:

Documentation

NVIDIA trtexec
TensorRT Sample Support Guide
NVIDIA’s TensorRT Documentation Library

License

For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement
documentation.

Changelog

April 2019
This is the first release of this README.md file.

Known issues

There are no known issues in this sample.

=== Model Options ===
  --uff=<file>                UFF model
  --onnx=<file>               ONNX model
  --model=<file>              Caffe model (default = no model, random weights used)
  --deploy=<file>             Caffe prototxt file
  --output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
  --uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
  --uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)

=== Build Options ===
  --maxBatch                  Set max batch size and build an implicit batch engine (default = 1)
  --explicitBatch             Use explicit batch sizes when building the engine (default = implicit)
  --minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided
                              Note: if any of min/max/opt is missing, the profile will be completed using the shapes
                                    provided and assuming that opt will be equal to max unless they are both specified;
                                    partially specified shapes are applied starting from the batch size;
                                    dynamic shapes imply explicit batch
                                    input names can be wrapped with single quotes (ex: 'Input:0')
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --inputIOFormats=spec       Type and formats of the input tensors (default = all inputs in fp32:chw)
  --outputIOFormats=spec      Type and formats of the output tensors (default = all outputs in fp32:chw)
                              IO Formats: spec  ::= IOfmt[","spec]
                                          IOfmt ::= type:fmt
                                          type  ::= "fp32"|"fp16"|"int32"|"int8"
                                          fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32")["+"fmt]
  --workspace=N               Set workspace size in megabytes (default = 16)
  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)
  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)
  --fp16                      Enable fp16 algorithms, in addition to fp32 (default = disabled)
  --int8                      Enable int8 algorithms, in addition to fp32 (default = disabled)
  --calib=<file>              Read INT8 calibration cache file
  --safe                      Only test the functionality available in safety restricted flows
  --saveEngine=<file>         Save the serialized engine
  --loadEngine=<file>         Load a serialized engine

=== Inference Options ===
  --batch=N                   Set batch size for implicit batch engines (default = 1)
  --shapes=spec               Set input shapes for dynamic shapes inputs. Input names can be wrapped with single quotes(ex: 'Input:0')
                              Input shapes spec ::= Ishp[","spec]
                                           Ishp ::= name":"shape
                                          shape ::= N[["x"N]*"*"]
  --loadInputs=spec           Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
                              Input values spec ::= Ival[","spec]
                                           Ival ::= name":"file
  --iterations=N              Run at least N inference iterations (default = 10)
  --warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)
  --duration=N                Run performance measurements for at least N seconds wallclock time (default = 3)
  --sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
  --streams=N                 Instantiate N engines to use concurrently (default = 1)
  --exposeDMA                 Serialize DMA transfers to and from device. (default = disabled)
  --useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
  --threads                   Enable multithreading to drive engines with independent threads (default = disabled)
  --useCudaGraph              Use cuda graph to capture engine execution and then launch inference (default = disabled)
  --buildOnly                 Skip inference perf measurement (default = disabled)

=== Build and Inference Batch Options ===
                              When using implicit batch, the max batch size of the engine, if not given,
                              is set to the inference batch size;
                              when using explicit batch, if shapes are specified only for inference, they
                              will be used also as min/opt/max in the build profile; if shapes are
                              specified only for the build, the opt shapes will be used also for inference;
                              if both are specified, they must be compatible; and if explicit batch is
                              enabled but neither is specified, the model must provide complete static
                              dimensions, including batch size, for all inputs

=== Reporting Options ===
  --verbose                   Use verbose logging (default = false)
  --avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)
  --percentile=P              Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%)
  --dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)
  --dumpProfile               Print profile information per layer (default = disabled)
  --exportTimes=<file>        Write the timing results in a json file (default = disabled)
  --exportOutput=<file>       Write the output tensors to a json file (default = disabled)
  --exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)

=== System Options ===
  --device=N                  Select cuda device N (default = 0)
  --useDLACore=N              Select DLA core N for layers that support DLA (default = none)
  --allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
  --plugins                   Plugin library (.so) to load (can be specified multiple times)

=== Help ===
  --help                      Print this message
Note: CUDA graphs is not supported in this version.

机器学习专栏（62）：手把手实现工业级ResNet-34及调优全攻略
目录一、ResNet革命性突破解析1.1残差学习核心思想1.2ResNet-34结构详解二、工业级Keras实现详解2.1数据预处理流水线2.2完整模型实现三、模型训练调优策略3.1学习率动态调整3.2混合精度训练四、性能优化技巧4.1分布式训练配置4.2TensorRT推理加速五、实战应用案例5.1医疗影像分类5.2工业质检系统六、模型可视化分析6.1特征热力图6.2参数量分析七、常见问题解决方
英伟达Triton 推理服务详解 leo0308 基础知识机器人 Triton 人工智能
1.TritonInferenceServer简介TritonInferenceServer（简称Triton，原名NVIDIATensorRTInferenceServer）是英伟达推出的一个开源、高性能的推理服务器，专为AI模型的部署和推理服务而设计。它支持多种深度学习框架和硬件平台，能够帮助开发者和企业高效地将AI模型部署到生产环境中。Triton主要用于模型推理服务化，即将训练好的模型通过
模型实战（21）之 C++ - tensorRT部署yolov8-det 目标检测明月醉窗台 #深度学习实战例程人工智能 c++YOLO 目标检测计算机视觉人工智能
C++-tensorRT部署yolov8-det目标检测python环境下如何直接调用推理模型转换并导出：pt->onnx->.engineC++tensorrt部署检测模型不写废话了，直接上具体实现过程+all代码1.Python环境下推理直接命令行推理，巨简单yolodetectpredictmodel=yolov8n.ptsource='https
【深度学习】大模型GLM-4-9B Chat ，微调与部署(3) TensorRT-LLM、TensorRT量化加速、Triton部署 XD742971636 深度学习机器学习深度学习人工智能
文章目录获取TensorRT-LLM代码：构建docker镜像并安装TensorRT-LLM：运行docker镜像：安装依赖魔改下部分package代码：量化：构建图：全局参数插件配置常用配置参数测试推理是否可以代码推理CLI推理性能测试小结验证是否严重退化使用NVIDIATriton部署在线推理服务器代码弄下来编译镜像启动容器安装依赖量化构建trtengines图Triton模板说明实操发起Tr
Jetson Orin NX Super安装TensorRT-LLM u013250861 #LLM/部署&推理 elasticsearch 大数据搜索引擎
根据图片中显示的JetsonOrinNXSuper系统环境（JetPack6.2+CUDA12.6+TensorRT10.7），以下是针对该平台的TensorRT-LLM安装优化方案：一、环境适配调整基于你的实际配置：JetPack6.2（含CUDA12.6,TensorRT10.7）Python3.10.12aarch64架构需选择适配的TensorRT-LLM版本。由于官方预编译包可能未覆盖此
TensorRT-LLM：大模型推理加速引擎的架构与实践
前言：技术背景与发展历程：随着GPT-4、LLaMA等千亿级参数模型的出现，传统推理框架面临三大瓶颈：显存占用高（单卡可达80GB）、计算延迟大（生成式推理需迭代处理）、硬件利用率低（Transformer结构存在计算冗余）。根据MLPerf基准测试，原始PyTorch推理的token生成速度仅为12.3tokens/s（A100显卡）。一、TensorRT-LLM介绍：TensorRT-LLM是
【TensorRT】TensorRT及加速原理浩瀚之水_csdn tensorrt
一、TensorRT架构概览TensorRT是NVIDIA推出的高性能推理优化器，专为GPU加速设计。其核心架构分为三层：前端解析器支持ONNX/UFF/Caffe等格式的模型解析执行格式验证和初步结构优化优化引擎核心优化层（层融合、精度校准、内存优化等）生成优化后的计算图（OptimizedGraph）运行时环境管理GPU内存分配执行优化后的计算图二、核心加速原理（8大关键技术）1.层融合（La
使用numpy或pytorch校验两个张量是否相等
文章目录1、numpy2、pytorch做算法过程中，如果涉及到模型落地，那必然会将原始的深度学习的框架训练好的模型转换成目标硬件模型的格式，如onnx,tensorrt,openvino,tflite;那么就有对比不同格式模型输出的一致性，从而判断模型转换是否成功。1、numpy用到的核心代码就一行，就是：importnumpyasnpnp.testing.assert_allclose(act
YOLOV10的tensorrt C++部署 dddccc1234 YOLO
根据博客进行python版本安装YOLOv10最全使用教程（含ONNX和TensorRT推理）-CSDN博客并将pt转为onnx：yoloexportmodel=yolov10s.ptformat=onnxopset=13simplify然后采用：https://github.com/hamdiboukamcha/yolov10-tensorrt.git进行c++编译配置好cuda11.7tens
tensorRT 与 torchserve-GPU性能对比 joker-G 计算机视觉 pytorch python
实验对比前端时间搭建了TensorRT、Torchserve-GPU，最近抽时间将这两种方案做一个简单的实验对比。实验数据Cuda11.0、Xeon®62423.1*80、RTX309024G、Resnet50TensorRT、Torchserve-GPU各自一张卡搭建10进程接口，感兴趣的可以查看我个人其他文章。30进程并发、2000张1200*720像素图像的总量数据TensorRT的部署使用
YOLOv8模型在RDK5开发板上的部署指南：.pt到.bin转换与优化实践 pk_xz123456 python 算法仿真模型 YOLO 人工智能 rnn 深度学习开发语言 lstm
以下是针对在RDK5开发板（基于NVIDIAJetsonOrin平台）部署YOLOv8模型的详细技术指南，涵盖从模型转换、优化到部署的全流程：YOLOv8模型在RDK5开发板上的部署指南：.pt到.bin转换与优化实践——基于TensorRT的高性能嵌入式部署方案第一章：技术背景与核心概念1.1RDK5开发板硬件架构NVIDIAJetsonOrinNX核心参数：1024-coreAmpereGPU
Pytorch模型安卓部署 python&java pytorch 人工智能 python
Pytorch是一种流行的深度学习框架，用于算法开发，而Android是一种广泛应用的操作系统，多应用于移动设备当中。目前多数的研究都是在于算法上，个人觉得把算法落地是一件很有意思的事情，因此本人准备分享一些模型落地的文章(后续可能分享微信小程序部署，PyQt部署以及exe打包，ncnn部署，tensorRT部署，MNN部署)。本篇文章主要分享Pytorch的Android端部署。看这篇文章的读者
昇腾AI生态组件全解析：与英伟达生态的深度对比
随着人工智能技术的快速发展，国产AI芯片的崛起正在改变全球计算产业的格局。华为昇腾（Ascend）系列AI处理器凭借自主创新的达芬奇架构，构建了完整的软硬件生态体系。本文将从核心组件对比、显卡性能对标两个维度，深入剖析昇腾与英伟达（NVIDIA）生态的技术差异与适用场景。一、昇腾核心组件与英伟达对标分析1.推理引擎：MindIEvsTensorRT昇腾MindIE1.0.0基于昇腾芯片的深度学习推
【推理加速】TensorRT C++ 部署YOLO11全系模型 gloomyfish c++开发语言
YOLO11YOLO11C++推理YOLO11是Ultralytics最新发布的目标检测、实例分割、姿态评估的系列模型视觉轻量化框架，基于前代YOLO8版本进行了多项改进和优化。YOLO11在特征提取、效率和速度、准确性以及环境适应性方面都有显著提升，达到SOTA。TensorRTC++SDK最新版本的TensorRT10.x版本已经修改了推理的接口函数与查询输入输出层的函数，其中以YOLO11对
Java全栈AI平台实战：从模型训练到部署的革命性突破——Spring AI+Deeplearning4j+TensorFlow Java API深度解析墨夶 Java学习资料3 java 人工智能 spring
一、背景与需求：为什么需要Java驱动的AI平台？某医疗影像公司面临以下挑战：多语言开发混乱：Python训练模型，C++部署推理，Java调用服务，导致维护成本高昂部署效率低下：PyTorch模型需手动转换ONNX格式，TensorRT优化耗时2小时/模型实时性不足：视频流分析延迟达3秒，无法满足急诊场景需求通过Java全栈AI平台，我们实现了：端到端开发：Java调用PyTorch训练模型，直
【Bug】Could not locate zlibwapi.dll. Please make sure it is in your library path!
报错信息：使用tensort加速，cmake编译失败，提示缺少zlibwapi.dll文件Couldnotlocatezlibwapi.dll.Pleasemakesureitisinyourlibrarypath!解决方案：从以下链接下载zlibwapi.dllhttp://www.winimage.com/zLibDll/我是在windows10系统下进行的TensorRT加速下载得到的压缩包
win10安装wsl2(ubuntu20.04)并安装 TensorRT-8.6.1.6、cuda_11.6、cudnn 狄龙疤 wsl wsl2 win10 tensorrt cuda cudnn ubuntu
参考博客：1.CUDA】如何在windows上安装Ollama3+openwebui（docker+WSL2+ubuntu+nvidia-container）：https://blog.csdn.net/smileyan9/article/details/1403916672.在Windows10上安装WSL2：https://download.csdn.net/blog/column/10991
【代码分析】TensorRT sampleINT8 详解 HaoBBNuanMM
目录前言代码分析Main入口构建(Build)网络BatchStream推理(Infer)过程资源释放前言TensorRT可以通过INT8量化处理网络，然后大幅加速网络推理速度，本文旨在详细分析MNISTINT8Sample的代码，解释如何使用TensorRT对网络做INT8量化处理。关于INT8量化的背景知识可以参考博文TensorRTINT8校准与量化原理代码分析sampleINT8的gith
TensorRT × TVM 联合优化实战：多架构异构平台的统一推理加速与性能调优全流程观熵大模型高阶优化技术专题架构人工智能
TensorRT×TVM联合优化实战：多架构异构平台的统一推理加速与性能调优全流程关键词TensorRT、TVM、异构推理优化、跨平台部署、GPU加速、NPU融合、自动调度、深度学习推理引擎、性能调优摘要在深度学习模型推理部署场景中，面对GPU、NPU、CPU等多架构异构平台的并存，如何实现统一的高性能推理优化成为企业工程落地的关键挑战。本文聚焦TensorRT与TVM的联合优化策略，从平台结构适
retinaface在ubuntu20.04(wsl2)下使用tensorrt(c++)部署狄龙疤 c++retinaface tensorrt cuda opencv 人脸识别神经网络模型
1.参考博客：1.RetinafaceTensorrtPython/C++部署：https://blog.csdn.net/weixin_45747759/article/details/1245340792.B站视频教程：https://www.bilibili.com/video/BV1Nv4y1K727/3.Retinaface_Tensorrtgithub仓库：https://github
独家首发！低照度环境下YOLOv8的增强方案——从理论到TensorRT部署向哆哆 YOLO 架构 yolov8
文章目录引言一、低照度图像增强技术现状1.1传统低照度增强方法局限性1.2深度学习-based方法进展二、Retinexformer网络原理2.1Retinex理论回顾2.2Retinexformer创新架构2.2.1光照感知Transformer2.2.2多尺度Retinex分解2.2.3自适应特征融合三、YOLOv8-Retinexformer实现3.1网络架构修改3.2联合训练策略四、实验与
win10 环境进行 python + pytorch + yolov8 + tensorRT( c++版 ) 测试过程记录狄龙疤 python pytorch c++cuda tensorRT yolov8 计算机视觉
参考博客：1.YOLOv8模型转换pt-＞onnx(附上代码)：https://blog.csdn.net/2303_80018785/article/details/1381949612.yolov8的TensorRT部署（C++版本）：https://blog.csdn.net/liujiahao123987/article/details/133892746test.cpp就是使用此博客的d
【实战分享】TensorRT+LLM：大模型推理性能优化初探 fengbeely java
TensorRT-LLM初体验千呼万唤始出来，备受期待的Tensorrt-LLM终于发布，发布版本0.5.0。github:https://github.com/NVIDIA/TensorRT-LLM/tree/main1.介绍TensorRT-LLM可以视为TensorRT和FastTransformer的结合体，旨为大模型推理加速而生。1.1丰富的优化特性除了FastTransformer对T
NIPS-2013《Distributed PCA and $k$-Means Clustering》 Christo3 机器学习 kmeans 算法大数据人工智能
推荐深蓝学院的《深度神经网络加速：cuDNN与TensorRT》，课程面向就业，细致讲解CUDA运算的理论支撑与实践，学完可以系统化掌握CUDA基础编程知识以及TensorRT实战，并且能够利用GPU开发高性能、高并发的软件系统，感兴趣可以直接看看链接：深蓝学院《深度神经网络加速：cuDNN与TensorRT》核心思想该论文的核心思想是将主成分分析（PCA）与分布式kkk-均值聚类相结合，提出一种
NVIDIA 实现通义千问 Qwen3 的生产级应用集成和部署【2025年 5月 2日】 u013250861 #LLM/部署&推理 jetson
阿里巴巴近期发布了其开源的混合推理大语言模型（LLM）通义千问Qwen3，此次Qwen3开源模型系列包含两款混合专家模型(MoE)235B-A22B（总参数2,350亿，激活参数220亿）和30B-A3B，以及六款稠密（Dense）模型0.6B、1.7B、4B、8B、14B、32B。现在，开发者能够基于NVIDIAGPU，使用NVIDIATensorRT-LLM、Ollama、SGLang、vLL
YOLO学习笔记｜ YOLO11对象检测，实例分割，姿态评估的TensorRT部署c++ 单北斗SLAMer YOLO学习从零到1 YOLO 机器学习深度学习 c++python
以下是YOLOv11在TensorRT上部署的步骤指南，涵盖对象检测、实例分割和姿态评估：1.模型导出与转换1.1导出ONNX模型importtorchfrommodels.experimentalimportattempt_loadmodel=attempt_load('yolov11s.pt',fuse=True)model.eval
✅ TensorRT Python 安装精简流程（适用于 Ubuntu 20.04+） dbcccccsds python ubuntu 开发语言
安装TensorRTPython轮子的步骤确保pip和wheel模块已更新并安装：参考链接python3-mpipinstall--upgradepippython3-mpipinstallwheel1.确认环境要求Python：版本3.8-3.13OS：Ubuntu20.04+或Windows10+CPU：x86_64或ARMSBSA架构安装前确保pip、wheel是最新的：python3-mp
TensorRT-LLM——优化大型语言模型推理以实现最大性能的综合指南知来者逆 LLM 语言模型人工智能自然语言处理 TensorRT LLM 大语言模型深度学习
引言随着对大型语言模型(LLM)的需求不断增长，确保快速、高效和可扩展的推理变得比以往任何时候都更加重要。NVIDIA的TensorRT-LLM通过提供一套专为LLM推理设计的强大工具和优化，TensorRT-LLM可以应对这一挑战。TensorRT-LLM提供了一系列令人印象深刻的性能改进，例如量化、内核融合、动态批处理和多GPU支持。这些改进使推理速度比传统的基于CPU的方法快8倍，从而改变了
tensorrt部署yolov8 张张张子 YOLO python 边缘计算
记录一下部署过程遇到的问题，我是要再jstson上部署，首先导出onnx文件，没什么问题，然后又两种方案转为engine文件1：trtexec.exe--onnx=best.onnx--saveEngine=best.engine--fp16tensorrt库命令转换，过程中会遇到一些问题，这里不细讲了，可以查。2：用yolov8官方版本转换，较为容易，官方库写的比较好最后会得到trt文件或eng
YOLOv8 TensorRT 部署（Python 推理）保姆级教程码农的日常搅屎棍 YOLO python
本教程手把手教你如何在NVIDIAGPU或RK3588上部署YOLOv8TensorRT推理，让你从零基础到高性能AI推理！1.部署前的准备1.1硬件要求NVIDIAGPU（如RTX3060/4090、Jetson系列）或RK3588NPU（支持TensorRT）CUDA（如11.x）、cuDNN、TensorRT已正确安装可运行nvcc--version、dpkg-l|grepTensorRT检
多线程编程之存钱与取钱周凡杨 java thread 多线程存钱取钱
生活费问题是这样的：学生每月都需要生活费，家长一次预存一段时间的生活费，家长和学生使用统一的一个帐号，在学生每次取帐号中一部分钱，直到帐号中没钱时通知家长存钱，而家长看到帐户还有钱则不存钱，直到帐户没钱时才存钱。问题分析：首先问题中有三个实体，学生、家长、银行账户，所以设计程序时就要设计三个类。其中银行账户只有一个，学生和家长操作的是同一个银行账户，学生的行为是
java中数组与List相互转换的方法征客丶 JavaScript java jsonp
1.List转换成为数组。（这里的List是实体是ArrayList) 　　调用ArrayList的toArray方法。　　toArray 　　public T[] toArray(T[] a)返回一个按照正确的顺序包含此列表中所有元素的数组；返回数组的运行时类型就是指定数组的运行时类型。如果列表能放入指定的数组，则返回放入此列表元素的数组。否则，将根据指定数组的运行时类型和此列表的大小分
Shell 流程控制 daizj 流程控制 if else while case shell
Shell 流程控制和Java、PHP等语言不一样，sh的流程控制不可为空，如(以下为PHP流程控制写法)： <?php if(isset($_GET["q"])){ search(q);}else{// 不做任何事情} 在sh/bash里可不能这么写，如果else分支没有语句执行，就不要写这个else，就像这样 if else if if 语句语
Linux服务器新手操作之二周凡杨 Linux 简单操作
1.利用关键字搜寻Man Pages man -k keyword 其中-k 是选项，keyword是要搜寻的关键字如果现在想使用whoami命令，但是只记住了前3个字符who，就可以使用 man -k who来搜寻关键字who的man命令 [haself@HA5-DZ26 ~]$ man -k
socket聊天室之服务器搭建朱辉辉33 socket
因为我们做的是聊天室，所以会有多个客户端，每个客户端我们用一个线程去实现，通过搭建一个服务器来实现从每个客户端来读取信息和发送信息。我们先写客户端的线程。 public class ChatSocket extends Thread{ Socket socket; public ChatSocket(Socket socket){ this.sock
利用finereport建设保险公司决策分析系统的思路和方法老A不折腾 finereport 金融保险分析系统报表系统项目开发
决策分析系统呈现的是数据页面，也就是俗称的报表，报表与报表间、数据与数据间都按照一定的逻辑设定，是业务人员查看、分析数据的平台，更是辅助领导们运营决策的平台。底层数据决定上层分析，所以建设决策分析系统一般包括数据层处理（数据仓库建设）。项目背景介绍通常，保险公司信息化程度很高，基本上都有业务处理系统（像集团业务处理系统、老业务处理系统、个人代理人系统等）、数据服务系统（通过
始终要页面在ifream的最顶层林鹤霄
index.jsp中有ifream，但是session消失后要让login.jsp始终显示到ifream的最顶层。。。始终没搞定，后来反复琢磨之后，得到了解决办法，在这儿给大家分享下。。 index.jsp--->主要是加了颜色的那一句 <html> <iframe name="top" ></iframe> <ifram
MySQL binlog恢复数据 aigo mysql
1，先确保my.ini已经配置了binlog： # binlog log_bin = D:/mysql-5.6.21-winx64/log/binlog/mysql-bin.log log_bin_index = D:/mysql-5.6.21-winx64/log/binlog/mysql-bin.index log_error = D:/mysql-5.6.21-win
OCX打成CBA包并实现自动安装与自动升级 alxw4616 ocx cab
近来手上有个项目,需要使用ocx控件 (ocx是什么? http://baike.baidu.com/view/393671.htm) 在生产过程中我遇到了如下问题. 1. 如何让 ocx 自动安装? a) 如何签名? b) 如何打包? c) 如何安装到指定目录? 2.
Hashmap队列和PriorityQueue队列的应用百合不是茶 Hashmap队列 PriorityQueue队列
HashMap队列已经是学过了的,但是最近在用的时候不是很熟悉,刚刚重新看以一次, HashMap是K,v键 ,值 put()添加元素 //下面试HashMap去掉重复的 package com.hashMapandPriorityQueue; import java.util.H
JDK1.5 returnvalue实例 bijian1013 java thread java多线程 returnvalue
Callable接口：返回结果并且可能抛出异常的任务。实现者定义了一个不带任何参数的叫做 call 的方法。 Callable 接口类似于 Runnable，两者都是为那些其实例可能被另一个线程执行的类设计的。但是 Runnable 不会返回结果，并且无法抛出经过检查的异常。 ExecutorService接口方
angularjs指令中动态编译的方法(适用于有异步请求的情况) 内嵌指令无效 bijian1013 JavaScript AngularJS
在directive的link中有一个$http请求，当请求完成后根据返回的值动态做element.append('......');这个操作，能显示没问题，可问题是我动态组的HTML里面有ng-click，发现显示出来的内容根本不执行ng-click绑定的方法！
【Java范型二】Java范型详解之extend限定范型参数的类型 bit1129 extend
在第一篇中，定义范型类时，使用如下的方式： public class Generics<M, S, N> { //M,S,N是范型参数 } 这种方式定义的范型类有两个基本的问题： 1. 范型参数定义的实例字段，如private M m = null;由于M的类型在运行时才能确定，那么我们在类的方法中，无法使用m，这跟定义pri
【HBase十三】HBase知识点总结 bit1129 hbase
1. 数据从MemStore flush到磁盘的触发条件有哪些？ a.显式调用flush，比如flush 'mytable' b.MemStore中的数据容量超过flush的指定容量，hbase.hregion.memstore.flush.size,默认值是64M 2. Region的构成是怎么样？ 1个Region由若干个Store组成
服务器被DDOS攻击防御的SHELL脚本 ronin47
mkdir /root/bin vi /root/bin/dropip.sh #!/bin/bash/bin/netstat -na|grep ESTABLISHED|awk ‘{print $5}’|awk -F:‘{print $1}’|sort|uniq -c|sort -rn|head -10|grep -v -E ’192.168|127.0′|awk ‘{if($2!=null&a
java程序员生存手册-craps 游戏-一个简单的游戏 bylijinnan java
import java.util.Random; public class CrapsGame { /** * *一个简单的赌*博游戏，游戏规则如下： *玩家掷两个骰子，点数为1到6，如果第一次点数和为7或11，则玩家胜， *如果点数和为2、3或12，则玩家输， *如果和为其它点数，则记录第一次的点数和，然后继续掷骰，直至点数和等于第一次掷出的点
TOMCAT启动提示NB: JAVA_HOME should point to a JDK not a JRE解决开窍的石头 JAVA_HOME
当tomcat是解压的时候，用eclipse启动正常，点击startup.bat的时候启动报错; 报错如下： The JAVA_HOME environment variable is not defined correctly This environment variable is needed to run this program NB: JAVA_HOME shou
[操作系统内核]操作系统与互联网 comsci 操作系统
我首先申明：我这里所说的问题并不是针对哪个厂商的，仅仅是描述我对操作系统技术的一些看法操作系统是一种与硬件层关系非常密切的系统软件，按理说，这种系统软件应该是由设计CPU和硬件板卡的厂商开发的，和软件公司没有直接的关系，也就是说，操作系统应该由做硬件的厂商来设计和开发
富文本框ckeditor_4.4.7 文本框的简单使用支持IE11 cuityang 富文本框
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>知识库内容编辑</tit
Property null not found darrenzhu datagrid Flex Advanced propery null
When you got error message like "Property null not found ***", try to fix it by the following way: 1)if you are using AdvancedDatagrid, make sure you only update the data in the data prov
MySQl数据库字符串替换函数使用 dcj3sjt126com mysql 函数替换
需求：需要将数据表中一个字段的值里面的所有的 . 替换成 _ 原来的数据是 site.title site.keywords .... 替换后要为 site_title site_keywords 使用的SQL语句如下： updat
mac上终端起动MySQL的方法 dcj3sjt126com mysql mac
首先去官网下载: http://www.mysql.com/downloads/ 我下载了5.6.11的dmg然后安装,安装完成之后..如果要用终端去玩SQL.那么一开始要输入很长的:/usr/local/mysql/bin/mysql 这不方便啊,好想像windows下的cmd里面一样输入mysql -uroot -p1这样...上网查了下..可以实现滴. 打开终端,输入: 1
Gson使用一（Gson） eksliang json gson
转载请出自出处：http://eksliang.iteye.com/blog/2175401 一.概述从结构上看Json，所有的数据（data）最终都可以分解成三种类型：第一种类型是标量（scalar），也就是一个单独的字符串（string）或数字（numbers），比如"ickes"这个字符串。第二种类型是序列（sequence），又叫做数组（array）
android点滴4 gundumw100 android
Android 47个小知识 http://www.open-open.com/lib/view/open1422676091314.html Android实用代码七段（一） http://www.cnblogs.com/over140/archive/2012/09/26/2611999.html http://www.cnblogs.com/over140/arch
JavaWeb之JSP基本语法 ihuning javaweb
目录 JSP模版元素 JSP表达式 JSP脚本片断 EL表达式 JSP注释特殊字符序列的转义处理如何查找JSP页面中的错误 JSP模版元素 JSP页面中的静态HTML内容称之为JSP模版元素，在静态的HTML内容之中可以嵌套JSP
App Extension编程指南（iOS8/OS X v10.10）中文版啸笑天 ext
当iOS 8.0和OS X v10.10发布后，一个全新的概念出现在我们眼前，那就是应用扩展。顾名思义，应用扩展允许开发者扩展应用的自定义功能和内容，能够让用户在使用其他app时使用该项功能。你可以开发一个应用扩展来执行某些特定的任务，用户使用该扩展后就可以在多个上下文环境中执行该任务。比如说，你提供了一个能让用户把内容分
SQLServer实现无限级树结构 macroli oracle sql SQL Server
表结构如下：数据库id path titlesort 排序 1 0 首页 0 2 0,1 新闻 1 3 0,2 JAVA 2 4 0,3 JSP 3 5 0,2,3 业界动态 2 6 0,2,3 国内新闻 1 创建一个存储过程来实现，如果要在页面上使用可以设置一个返回变量将至传过去 create procedure test as begin decla
Css居中div，Css居中img，Css居中文本，Css垂直居中div qiaolevip 众观千象学习永无止境每天进步一点点 css
/**********Css居中Div**********/ div.center { width: 100px; margin: 0 auto; } /**********Css居中img**********/ img.center { display: block; margin-left: auto; margin-right: auto; }
Oracle 常用操作(实用) 吃猫的鱼 oracle
SQL>select text from all_source where owner=user and name=upper('&plsql_name'); SQL>select * from user_ind_columns where index_name=upper('&index_name'); 将表记录恢复到指定时间段以前
iOS中使用RSA对数据进行加密解密 witcheryne ios rsa iPhone objective c
RSA算法是一种非对称加密算法,常被用于加密数据传输.如果配合上数字摘要算法, 也可以用于文件签名. 本文将讨论如何在iOS中使用RSA传输加密数据. 本文环境 mac os openssl-1.0.1j, openssl需要使用1.x版本, 推荐使用[homebrew](http://brew.sh/)安装. Java 8 RSA基本原理 RS