目录
1.宿主机环境
2.安装docker
3.安装nvidia-docker(如果想要在docker容器中调用nvidia驱动必须要安装)
4.拉取镜像
4.1验证下-gpus选项
4.2运行利用GPU的Ubuntu容器
4.3写一个拉取镜像的脚本如下:
4.4运行脚本
5、安装CUDA
5.1CUDA推荐下载.run可以根据提示安装
5.2安装完成后,设置环境变量
6.cudnn的安装
6.1下载安装文件
6.2安装cudnn
6.3查看cudnn版本
系统:ubuntu20.04
GPU驱动:nvidia-driver-418-server
CUDA版本:cuda10.1
CUDNN版本:cudnn7.6.4
安装命令如下:
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
也可以使用国内 daocloud 一键安装命令:
curl -sSL https://get.daocloud.io/docker | sh
测试
docker run hello-world
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker
# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
安装完成nvidia-docker之后会自动在/etc/docker下创建daemon.json文件,修改daemon.json文件入下:
lu@computer:~$ cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia",
"registry-mirrors": [
"https://kfwkfulq.mirror.aliyuncs.com",
"https://2lqq34jg.mirror.aliyuncs.com",
"https://pee6w651.mirror.aliyuncs.com",
"https://registry.docker-cn.com",
"https://hub-mirrot.c.163.com"
],
"dns": ["8.8.8.8", "8.8.4.4"]
}
lu@computer:~$
修改完成之后重启docker服务:
sudo systemctl restart docker
$ docker run --help | grep -i gpus
--gpus gpu-request GPU devices to add to the container ('all' to pass all GPUs)
$ docker run -it --rm --gpus all ubuntu nvidia-smi
故障排除
您是否遇到以下错误消息:
$ docker run -it --rm --gpus all ubuntu
docker: Error response from daemon: linux runtime spec devices: could not select device driver “” with capabilities: [[gpu]].
上述错误意味着Nvidia无法正确注册Docker。它实际上意味着驱动程序未正确安装在主机上。这也可能意味着安装了nvidia容器工具而无需重新启动docker守护程序:您需要重新启动docker守护程序。
建议回去验证是否安装了nvidia-container-runtime或者重新启动Docker守护进程。
安装nvidia-container-runtime:
$ apt-get install nvidia-container-runtime
如果上面步骤都验证通过,下面可以拉取镜像了
lu@computer:~/docker_home/ubuntu16.04_nvidia$ pwd
/home/lu/docker_home/ubuntu16.04_nvidia
lu@computer:~/docker_home/ubuntu16.04_nvidia$
lu@computer:~/docker_home/ubuntu16.04_nvidia$ cat run-ubuntu16.04_nvidia_docker.sh
#/bin/bash
export MY_CONTAINER="ubuntu16.04_nvidia-`whoami`"
num=`sudo docker ps -a|grep -w "$MY_CONTAINER$"|wc -l`
echo $num
echo $MY_CONTAINER
if [ 0 -eq $num ];then
sudo xhost +
sudo docker run \
-e DISPLAY=unix$DISPLAY --net=host --ipc=host --pid=host \
-it --runtime=nvidia --privileged --name $MY_CONTAINER \
-v $PWD:/home/share --gpus all ubuntu:16.04 bash
else
sudo docker start $MY_CONTAINER
sudo docker exec -ti $MY_CONTAINER /bin/bash
fi
lu@computer:~/docker_home/ubuntu16.04_nvidia$
./run-ubuntu16.04_nvidia_docker.sh
#之后会自动拉取ubuntu16.04镜像和进入到容器
root@computer:/home/share# nvidia-smi
Sun May 23 01:25:39 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.197.02 Driver Version: 418.197.02 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940M Off | 00000000:04:00.0 Off | N/A |
| N/A 57C P0 N/A / N/A | 374MiB / 2004MiB | 16% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 773 G /usr/lib/xorg/Xorg 109MiB |
| 0 1449 G /usr/bin/gnome-shell 111MiB |
| 0 1887 G ...AAgAAAAAAAAACAAAAAAAAAA= --shared-files 150MiB |
+-----------------------------------------------------------------------------+
root@computer:/home/share#
可以看到docker容器已经可以调用GPU驱动了
下面就是和宿主机一样安装cuda10.1和cudnn7.6.4了
链接:https://developer.nvidia.com/cuda-toolkit-archive
执行如下命令:
sudo bash cuda_10.0.130_410.48_linux.run
压住回车键,直到服务条款显示到100%。接着按下面的步骤选择:
accept
n(不要安装driver)
y
y
y
报错缺少libxml2
root@computer:/home/share# bash cuda_10.1.105_418.39_linux.run
./cuda-installer: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
root@computer:/home/share# apt install libxml2
打开主目录下的 .bashrc文件添加如下路径,例如我的.bashrc文件在/home/lu/下。
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64
export PATH=$PATH:/usr/local/cuda-10.0/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-10.0
终端运行:source ~/.bashrc
检查:nvcc --version
按需求下载cudnn的安装文件:https://developer.nvidia.com/rdp/cudnn-archive
解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:
sudo cp cuda/include/cudnn* /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn*
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
在终端输入
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
参考链接:
https://blog.csdn.net/BigData_Mining/article/details/104991349
https://blog.csdn.net/EasonCcc/article/details/108098930