1.在本地pc配置环境:
#更新源
$ sudo apt update
#启用HTTPS
$ sudo apt install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
添加GPG key
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
#添加稳定版的源
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
#更新源
$ sudo apt update
#安装Docker CE
$ sudo apt install -y docker-ce
验证Docker CE
如果出现下面的内容,说明安装成功。
$ sudo docker run hello-world
...
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. ...
#添加源
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
#安装并重启docker
$ sudo apt update && sudo apt install -y nvidia-container-toolkit
$ sudo systemctl restart docker
$ sudo docker run --gpus all nvidia/cuda:9.0-base nvidia-smi
#启动支持双GPU的容器
$ sudo docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi
#指定GPU 1,运行容器
$ sudo docker run --gpus device=0 nvidia/cuda:9.0-base nvidia-smi
能看到显卡信息就说明OK了,当前image是基于Ubuntu 16.04的。
参考:https://blog.csdn.net/weixin_30764771/article/details/99831184?spm=1001.2101.3001.6650.4&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-4.no_search_link&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-4.no_search_link
2.此时docker配置完成,接下来拉取镜拉取镜像:
sudo docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3
启动镜像:
sudo docker run --gpus 2 -itd --name tlt -v /home/tao/contain/:/workspace/tlt-experiments -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash
第一个8888是自己的端口号,第二个8888是容器的端口号
进入容器
docker exec -it tlt /bin/bash
后边遇到报错:docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].和docker: Error response from daemon: Unknown runtime specified nvidia. See ‘docker run --help’.
试了很多方法,最后发现是镜像拉错了,不是v3.0-dp-py3
3.训练自己的数据集:https://blog.csdn.net/weixin_38106878/article/details/108979178
https://zongxp.blog.csdn.net/article/details/107386744
指令记录:
sudo docker run --gpus 2 -itd --name tlt -v /home/zhanglu/tao/contain/:/workspace/tlt-experiments -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 /bin/bash
sudo docker cp ./resize_kitti_labels tlt:/workspace/tlt-experiments/
docker exec -it tlt /bin/bash
sudo docker exec -it tlt /bin/bash
训练:tlt-train detectnet_v2 -e specs/detectnet_v2_train_resnet18_car.txt -r /workspace/examples/detectnet_v2/backup/train_model/ -k <输入key> -n car --gpus 2
测试:tlt-evaluate detectnet_v2 -e specs/detectnet_v2_train_resnet18_car.txt -m backup-1101/train_model/model.step-529470.tlt -k <输入key>
剪枝:tlt-prune -m backup-1101/train_model/model.step-529470.tlt -o backup-1101/train_model/model.step-529470-pruned.tlt -eq union -pth 0.3 -k <输入key>
pth 阈值用来平衡模型的速度与精度,阈值越大,剪枝的越厉害,得到的模型越小,推理速度越快,但精度损失越严重,相反,阈值越小,裁剪完的模型越大,精度保持的越好,但速度提升没那么明显,可以根据自己的情况调整这个值,例如可以依次试试0.3、0.5、0.7等值。
重新训练:tlt-train detectnet_v2 -e specs/detectnet_v2_train_resnet18_car.txt -r /workspace/examples/detectnet_v2/backup/train_model/ -k <输入key> -n car --gpus 2
发现tlt2没有lpr,tlt3的镜像我这里总是报错Server Error: Internal Server Error (“could not select device driver “” with capabilities: [[gpu]]”)
于是就开始整tao环境。参考: