ubuntu16.04 安装Nvidia驱动和Nvidia-Docker过程

1 安装Nvidia驱动

    查看GPU型号:

lspci | grep -i nvidia

02:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
03:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)
84:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 12GB] (rev a1)

    下载驱动:https://www.geforce.cn/drivers/beta-legacy,根据自己的GPU型号去下载相应的驱动

1.1 卸载原有驱动

    卸载

sudo apt-get remove --purge nvidia*

1.2 禁用nouveau驱动

    在/etc/modprobe.d/blacklist.conf/文件末尾添加如下内容

blacklist nouveau
options nouveau modeset=0

    更新

sudo update-initramfs -u

    检查禁用,如无任何输出则成功

lsmod | grep nouveau

    如果还有输出,可以尝试重启一下

1.3 关闭图形显示管理工具

   关闭lightdm

sudo service lightdm stop

1.4 给脚本赋权

    赋权

sudo chmod  a+x NVIDIA-Linux-x86_64-440.59.run

1.5 安装驱动

    执行安装脚本

sudo ./NVIDIA-Linux-x86_64-440.59.run

    按照提示选择安装即可,按照我给的步骤做基本没啥问题

1.6 验证驱动是否安装成功

   执行nvidia-smi

tilyp@tilyp:~/$ nvidia-smi

Thu Mar  5 14:39:58 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0 Off |                  N/A |
| 17%   37C    P0    63W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:82:00.0 Off |                  N/A |
| 11%   38C    P0    29W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

    这里可以看到GPU的信息,驱动安装成功

2 安装CUDA(nvidia-docker不需要)

2.1 下载cuda安装脚本

    下载地址:https://developer.nvidia.com/cuda-downloads,根据驱动对应的版本下载相应的CUDA版本

2.2 脚本赋权

    赋权

sudo chmod  a+x cuda_10.2.89_440.33.01_linux.run 

2.3 安装

    执行脚本

sudo ./cuda_10.2.89_440.33.01_linux.run

   这里要注意,不能勾选重新安装Nvidia-driver

2.4 配置环境变量

   在~/.bashrc文件末尾添加如下内容

export PATH=/usr/local/cuda-10.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH

  加载配置

source ~/.bashrc

2.5  查看CUDA版本

   看到如下信息证明安装成功

tilyp@tilyp:~/$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

3  安装cudnn(nvidia-docker不需要)

3.1 下载cudnn

    下载地址:https://developer.nvidia.com/rdp/cudnn-archive,同样需要找对应的版本下载

     解压安装包

tar -zxvf cudnn-10.1-linux-x64-v7.6.4.38.tgz

     复制到cuda目录下

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

    查看cudnn版本

tilyp@tilyp:~/$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

4 安装docker

     更新软件

sudo apt-get update

    安装https依赖

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

    添加gpg KEY

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

    验证指纹

sudo apt-key fingerprint 0EBFCD88

    添加docker源到系统中

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

    更新源索引

sudo apt-get update

    安装docker

sudo apt-get install docker-ce docker-ce-cli containerd.io

     验证安装是否成功

tilyp@tilyp:~/$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete 
Digest: sha256:fc6a51919cfeb2e6763f62b6d9e8815acbf7cd2e476ea353743570610737b752
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

     至此,docker安装成功

5 安装nvidia-docker

     添加源并更新

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

    安装nvidia-docker

sudo apt-get install nvidia-docker2

   重启docker

sudo pkill -SIGHUP dockerd

   验证nvidia-docker

tilyp@tilyp:~/$ sudo docker run --runtime=nvidia --rm nvidia/cuda:10.2-base-ubuntu16.04 nvidia-smi
Thu Mar  5 06:11:30 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0 Off |                  N/A |
| 16%   36C    P0    63W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:82:00.0 Off |                  N/A |
| 14%   38C    P0    30W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

    看到在运行nvidia-docker中执行nvidia-smi和在服务器上直接执行nvidia-smi的结果是一样的,所以我们安装到此结束,需要说明的一点是--runtime=nvidia这个参数,这个是在安装nvidia-docker的过程中,它会在/etc/docker/daemon.json做好了配置,我们只需要通过指定就可以调用nvidia-docker了。

/etc/docker/daemon.json配置如下:

{
    "insecure-registries": ["192.168.0.198:5000"],
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

 

完毕,,,

有问题加QQ群:  526855734

你可能感兴趣的:(ubuntu,nvidia,docker)