查看GPU型号:
lspci | grep -i nvidia
02:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
03:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
84:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
下载驱动:https://www.geforce.cn/drivers/beta-legacy,根据自己的GPU型号去下载相应的驱动
卸载
sudo apt-get remove --purge nvidia*
在/etc/modprobe.d/blacklist.conf/文件末尾添加如下内容
blacklist nouveau
options nouveau modeset=0
更新
sudo update-initramfs -u
检查禁用,如无任何输出则成功
lsmod | grep nouveau
如果还有输出,可以尝试重启一下
关闭lightdm
sudo service lightdm stop
赋权
sudo chmod a+x NVIDIA-Linux-x86_64-440.59.run
执行安装脚本
sudo ./NVIDIA-Linux-x86_64-440.59.run
按照提示选择安装即可,按照我给的步骤做基本没啥问题
执行nvidia-smi
tilyp@tilyp:~/$ nvidia-smi
Thu Mar 5 14:39:58 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 17% 37C P0 63W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:82:00.0 Off | N/A |
| 11% 38C P0 29W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
这里可以看到GPU的信息,驱动安装成功
下载地址:https://developer.nvidia.com/cuda-downloads,根据驱动对应的版本下载相应的CUDA版本
赋权
sudo chmod a+x cuda_10.2.89_440.33.01_linux.run
执行脚本
sudo ./cuda_10.2.89_440.33.01_linux.run
这里要注意,不能勾选重新安装Nvidia-driver
在~/.bashrc文件末尾添加如下内容
export PATH=/usr/local/cuda-10.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
加载配置
source ~/.bashrc
看到如下信息证明安装成功
tilyp@tilyp:~/$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
下载地址:https://developer.nvidia.com/rdp/cudnn-archive,同样需要找对应的版本下载
解压安装包
tar -zxvf cudnn-10.1-linux-x64-v7.6.4.38.tgz
复制到cuda目录下
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
查看cudnn版本
tilyp@tilyp:~/$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
更新软件
sudo apt-get update
安装https依赖
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
添加gpg KEY
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
验证指纹
sudo apt-key fingerprint 0EBFCD88
添加docker源到系统中
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
更新源索引
sudo apt-get update
安装docker
sudo apt-get install docker-ce docker-ce-cli containerd.io
验证安装是否成功
tilyp@tilyp:~/$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:fc6a51919cfeb2e6763f62b6d9e8815acbf7cd2e476ea353743570610737b752
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
至此,docker安装成功
添加源并更新
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
安装nvidia-docker
sudo apt-get install nvidia-docker2
重启docker
sudo pkill -SIGHUP dockerd
验证nvidia-docker
tilyp@tilyp:~/$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.2-base-ubuntu16.04 nvidia-smi
Thu Mar 5 06:11:30 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 16% 36C P0 63W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:82:00.0 Off | N/A |
| 14% 38C P0 30W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
看到在运行nvidia-docker中执行nvidia-smi和在服务器上直接执行nvidia-smi的结果是一样的,所以我们安装到此结束,需要说明的一点是--runtime=nvidia这个参数,这个是在安装nvidia-docker的过程中,它会在/etc/docker/daemon.json做好了配置,我们只需要通过指定就可以调用nvidia-docker了。
/etc/docker/daemon.json配置如下:
{
"insecure-registries": ["192.168.0.198:5000"],
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
完毕,,,
有问题加QQ群: 526855734