安装依赖环境:Ubuntu18.04/cuda10.0/cudnn7.6.5/tensorrt7.0.0/opencv3.3
1、安装cuda
下载地址cuda-10.0-download. 下载.deb安装包
安装命令如下:
sudo dpkg -i cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
2、安装TensorRT
下载链接ndidia-tensorrt-7x-download.需要注册登录nvidia帐号。
下载TensorRT,注意cuda和ubuntu版本。
安装命令如下:
sudo dpkg -i nv-tensorrt-repo-ubuntu1604-cuda10.0-trt7.0.0.11-ga-20191216_1-1_amd64.deb
sudo apt update
sudo apt install tensorrt
3、安装opencv
sudo add-apt-repository ppa:timsc/opencv-3.3
sudo apt-get update
sudo apt install libopencv-dev
4、检查安装完成
dpkg -l | grep cuda
dpkg -l | grep nvinfer
dpkf -l | grep opencv
二、TensorRTx部署编译lenet5分类网络
1、通过conda安装pytorch虚拟环境
conda create -n tensorrtx_tool python=3.7 -y
conda activate tensorrtx_tool
conda install pytorch==1.7 torchvision -c pytorch
pip install -r requirements.txt
2、克隆模型转换toolkit
git clone https://github.com/wang-xinyu/pytorchx
cd pytorchx/lenet
3、下载并转换pytorch模型到tensorrt模型.wts
python lenet5.py
python inference.py
输出网络模型,则网络转换成功
cuda device count: 2
input: torch.Size([1, 1, 32, 32])
conv1 torch.Size([1, 6, 28, 28])
pool1: torch.Size([1, 6, 14, 14])
conv2 torch.Size([1, 16, 10, 10])
pool2 torch.Size([1, 16, 5, 5])
view: torch.Size([1, 400])
fc1: torch.Size([1, 120])
lenet out: tensor([[0.0950, 0.0998, 0.1101, 0.0975, 0.0966, 0.1097, 0.0948, 0.1056, 0.0992,
0.0917]], device='cuda:0', grad_fn=)
4、下载并编译lenets工程
git clone https://github.com/wang-xinyu/tensorrtx
cd tensorrtx/lenet
cp [PATH-OF-pytorchx]/pytorchx/lenet/lenet5.wts .
mkdir build
cd build
cmake ..
make
在build目录下存在lenet5.engine.
./lenet -s
序列化工程并运行前向推理
./lenet -d
输出结果如下:
Output:
0.0949623, 0.0998472, 0.110072, 0.0975036, 0.0965564, 0.109736, 0.0947979, 0.105618, 0.099228, 0.0916792,
5、对比pytorch和tensorrt输出模型是否一致
The pytorch output is
0.0950, 0.0998, 0.1101, 0.0975, 0.0966, 0.1097, 0.0948, 0.1056, 0.0992, 0.0917
The tensorrt output is
0.0949623, 0.0998472, 0.110072, 0.0975036, 0.0965564, 0.109736, 0.0947979, 0.105618, 0.099228, 0.0916792
由于模型特征精度不一致,通过四舍五入特征值相同,则表明完成!!!自己的pytorch模型也类似lenet5相同的方式进行。
6、接下来看下.wts模型文件内容
10
conv1.weight 150 be40ee1b bd20bab8 bdc4bc53 .......
conv1.bias 6 bd327058 .......
conv2.weight 2400 3c6f2220 3c693090 ......
conv2.bias 16 bd183967 bcb1ac8a .......
fc1.weight 48000 3c162c20 bd25196a ......
fc1.bias 120 3d3c3d49 bc64b948 ......
fc2.weight 10080 bce095a4 3d33b9dc ......
fc2.bias 84 bc71eaa0 3d9b276c .......
fc3.weight 840 3c252870 3d855351 .......
fc3.bias 10 bdbe4bb8 3b119ee0 ......
其中,第一行代表有多少行,不包括自己本行。
没一行结构如下:
[weight name] [value count = N] [value1] [value2], ..., [valueN]
参考原地址:https://github.com/wang-xinyu/tensorrtx