nvidia_tao实现lpr训练

参考官方博客:https://developer.nvidia.com/blog/creating-a-real-time-license-plate-detection-and-recognition-app/
参考博客:https://blog.csdn.net/zong596568821xp/article/details/114143709

在虚拟环境中安装依赖库

我是在conda中新建了虚拟环境,

conda create -n tao python=python3.6
pip3 install nvidia-pyindex
pip3 install nvidia-tao
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.2.0/zip -O cv_samples_v1.2.0.zip
unzip -u cv_samples_v1.2.0.zip  -d ./cv_samples_v1.2.0 && rm -rf cv_samples_v1.2.0.zip && cd ./cv_samples_v1.2.0

准备lpr数据集

$ git clone https://github.com/openalpr/benchmarks benchmarks
$ wget https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/blob/release/tlt3.0/misc/dev_blog/LPDR/lpr/preprocess_openalpr_benchmark.py

$ python preprocess_openalpr_benchmark.py --input_dir=./benchmarks/endtoend/us --output_dir=./data/openalpr

训练

$ tlt lprnet train -e /workspace/tlt-experiments/lprnet/tutorial_spec.txt \
                   -r /workspace/tlt-experiments/lprnet/ \
                   -k nvidia_tlt \
                   -m /workspace/tlt-experiments/lprnet/us_lprnet_baseline18_trainable.tlt

训练时环境报错:
报错1

No file found at: /.docker/config.json. Did you run docker login?

发现需要登录docker

sudo docker login

然后输入账号和密码,没有的需要去docker官网注册。https://hub.docker.com/
参考:https://www.503error.com/2019/dockerconfig-json-%E6%96%87%E4%BB%B6%E9%87%8C%E5%AD%98%E5%82%A8%E7%9A%84%E6%98%AF%E4%BB%80%E4%B9%88/1629.html

登录完成之后还是报同样的错,发现我生成的config.json不在~/.docker/config.json这个地方,复制过去还是报错,好像是报权限限制。这个文件必须用sudo才能打开,后来改了权限,在代码中指定了路径,这个报错才解决。

报错2:
Please run docker login nvcr.io
说是我的docker不是用nvcr.io登录的。
解决:

$ docker login nvcr.io 
Username: $oauthtoken 
Password: <Your Key>

重新生成.docker/config.json
参考:https://blog.csdn.net/chqfeiyang/article/details/89674055

报错3:
docker.errors.DockerException: Error while fetching server API version: (‘Connection aborted.’, Perm…
解决:
将当前用户添加到docker组

sudo groupadd docker
sudo gpasswd -a  ${UAER} docker`
sudo service docker restart
#或者执行以下命令,无须重新登录
newgrp docker 

解决。

运行程序:

~/tao/cv_samples_v1.2.0/lprnet$ ./train.sh 
~/.tao_mounts.json wasn't found. Falling back to obtain mount points and docker configs from ~/.tlt_mounts.json.
Please note that this will be deprecated going forward.
2021-11-04 19:42:27,266 [INFO] root: Registry: ['nvcr.io']
zl
2021-11-04 19:42:27,404 [INFO] tlt.components.docker_handler.docker_handler: The required docker doesn't exist locally/the manifest has changed. Pulling a new docker.
2021-11-04 19:42:27,404 [INFO] tlt.components.docker_handler.docker_handler: Pulling the required container. This may take several minutes if you're doing this for the first time. Please wait here.
...
Repository name: nvcr.io/nvidia/tao/tao-toolkit-tf
2021-11-04 20:08:41,025 [INFO] tlt.components.docker_handler.docker_handler: Container pull complete.
2021-11-04 20:08:41,025 [INFO] root: No mount points were found in the /home/zhanglu/.tlt_mounts.json file.
2021-11-04 20:08:41,026 [WARNING] tlt.components.docker_handler.docker_handler: 

查看镜像是:

nvcr.io/nvidia/tao/tao-toolkit-tf    v3.21.08-py3        8672180cbf38        2 months ago        15.7GB

安装cuda11.2:
https://blog.csdn.net/qq_42167046/article/details/113246994

第一次运行需要拉取镜像,后边就不需要拉取镜像了。

tao train ****时会报错找不到路径,这里缺少一步,设置docker和本地路径的关联。:参考:
https://blog.csdn.net/chenjambo/article/details/118399910

创建~/.tlt_mounts.json,把需要关联的路径都写上。

{
    "Mounts": [
        {
            "source": "/home//tlt-experiments",
            "destination": "/workspace/tlt-experiments"
        },
        {
            "source": "/home//openalpr",
            "destination": "/workspace/openalpr"
        }

    ]
}

lpr deepstream部署

参考:https://blog.csdn.net/zong596568821xp/article/details/114143709
下载代码

git clone https://github.com/NVIDIA-AI-IOT/deepstream_lpr_app.git
cd deepstream_lpr_app/

下载相关模型

./download_ch.sh

在官网下载tlt-converter,然后进行模型转化
dizhi:https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#id2

./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 models/LP/LPR/ch_lprnet_baseline18_deployable.etlt -t fp16 -e models/LP/LPR/lpr_ch_onnx_b16.engine

编译运行

make
cd deepstream-lpr-app
cp dict_ch.txt dict.txt
sudo ./deepstream-lpr-app 2 2 0 ch_car_test.mp4 output.264

后续1:
重新拉的镜像是:

nvcr.io/nvidia/tao/tao-toolkit-tf   v3.21.11-tf1.15.5-py3   c607b0237bc5   2 weeks ago    16.3GB

训练,测试,推理,导出一切正常。但是在nx模组上转engine时报错:

$ sudo ./tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 /data/models/deepstream_lpr_app/models/LP/myLPR/lprnet_epoch-300.etlt -t fp16 -e /data/models/deepstream_lpr_app/models/LP/myLPR/lprnet_epoch-300.engine
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: axes
Aborted

排查原因是镜像更新导致。重新下载了v3.21.08-py3版本。但是只修改镜像不行,运行的时候会自动下载11版本。所以就重新下载了tao 0.1.19版本。

你可能感兴趣的:(nvidia)