NOTE:
NVIDIA 内核驱动版本与系统驱动一定要一致
输入下条命令,查看你的显卡驱动所使用的内核版本
cat /proc/driver/nvidia/version
g@g-Inspiron-5675:~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 430.26 Tue Jun 4 17:40:52 CDT 2019
GCC version: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
g@g-Inspiron-5675:~$
我的内核版本为430
输入下条命令,卸载电脑驱动
sudo apt-get purge nvidia*
输入下条命令,把显卡驱动加入ppa(个人软件包文档,仅支持Ubuntu),类似于应用商店
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
输入下条命令,重装415版本驱动(大家可以安装适合自己nvidia驱动版本,确保版本号匹配即可)
sudo apt-get install nvidia-415 nvidia-settings nvidia-prime
NOW ,you can check your NVIDIA driver by command nvidia-smi
optional: 可以禁用当前版本的本地更新,命令如下
sudo apt-mark hold nvidia-430
卸载已经在Ubuntu系统中安装的docker
sudo apt-get remove docker docker-engine docker-ce docker.io
更新apt
sudo apt-get update
安装以下包,以使apt可以通过https来使用repository
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
添加Docker官方的apt-key并更新apt
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-get update
列出所有可用版本docker
sudo apt-cache madison docker-ce
显示如下:
docker-ce | 5:18.09.4~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 5:18.09.3~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 5:18.09.2~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 5:18.09.1~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 5:18.09.0~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.3~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.2~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.1~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.0~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.03.1~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.03.0~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.12.1~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.12.0~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
根据需要选择自己需要的版本进行安装
sudo apt-get install docker-ce=18.06.3~ce~3-0~ubuntu
查看是否安装正常,运行以下命令即可
sudo systemctl start docker
sudo docker info
或者查看docker 版本
docker -v
Docker version 18.06.3-ce, build d7080c1
卸载docker1
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge nvidia-docker
添加apt源并更新
Set the repository and update
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
# 获取系统版本 eg:我的为ubuntu16.04(不懂可以不用管)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
安装 nvidia-docker 2.0
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
此时就是看运气的时候, 如果你的docker版本恰好和apt默认加载的nvidia-docker版本兼容,那么到这里就安装成功了,但是大多数会遇到如下错误:
Abort
测试nvidia-docker
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
报错了!!!
g@g-Inspiron-5675:~$ sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.
很正常,我们要指定版本安装自己环境的nvidia-docker
首先check the versions in the github,看看所有可用版本
sudo apt-cache madison nvidia-docker2 nvidia-container-runtime
nvidia-docker2 | 2.0.3+docker18.06.3-3 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.2-2 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.2-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.03.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.03.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker17.12.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker17.12.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker17.09.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
由于我安装的docker-ce 版本为18.06 所以我选择
sudo apt-get install nvidia-docker2=2.0.3+docker18.06.3-3
sudo pkill -SIGHUP dockerd
FINALY run the nvidia-docker 2.0 again:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
成功了!!!
g@g-Inspiron-5675:~$ sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
Tue Nov 19 06:29:24 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:09:00.0 On | N/A |
| 41% 35C P8 5W / 120W | 547MiB / 6069MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
懒得写了, 根据官方文档来安装docker版本的tensorrt
https://github.com/NVIDIA/TensorRT#prerequisites