为了在docker中使用gpu而安装了nvidia-docker,但是根据网上的教程安装完之后输入nvidia-docker run nvidia/cuda nvidia-smi进行验证,总是报错
Error response from daemon: oci runtime error:
container_linux.go:247: starting container process caused "exec: \"nvidia-smi\":
executable file not found in $PATH".
十分头大,最后终于安装成功,记录下来。
如果曾经安装过docker,那么先把旧版本删除。
# 删除docker
$ sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
# 删除docker-ce
$ yum -y remove docker \
docker-common \
docker-selinux \
docker-engine \
docker-engine-selinux \
container-selinux docker-ce
# 删除过去的镜像
$ rm -rf /var/lib/docker
sudo wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
sudo yum install epel-release
sudo yum install container-selinux
# 不安装selinux会报错:
# Requires: container-selinux >= 2.9
# 安装相关的依赖包
$ sudo yum install -y yum-utils \
device-mapper-persistent-data \
lvm2
# 设置仓库
$ sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
$ sudo yum install docker-ce docker-ce-cli containerd.io
时间是2019年3月1日。第一种方法可能会因为docker的版本和nvidia-docker不兼容而失败,报错内容为
Error: Package: nvidia-docker2-2.0.3-1.docker18.09.2.ce.noarch (nvidia-docker)
Requires: docker-ce = 3:18.09.2
Installed: 3:docker-ce-18.09.3-3.el7.x86_64 (@docker-ce-stable)
docker-ce = 3:18.09.3-3.el7
这个错误的意思是nvidia-docker2-2.0.3需要的docker-ce版本为3:18.09.2,而我根据第一种方法安装的最新版docker-ce3:18.09.3与nvidia-docker不兼容,但官方说已经修复了这个bug。读者有时间可以验证一下。
$ yum list docker-ce --showduplicates | sort -r
docker-ce.x86_64 3:18.09.1-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.0-3.el7 docker-ce-stable
docker-ce.x86_64 18.06.1.ce-3.el7 docker-ce-stable
docker-ce.x86_64 18.06.0.ce-3.el7 docker-ce-stable
$ sudo yum install docker-ce-18.09.2 docker-ce-cli-18.09.2 containerd.io
$ systemctl enable docker.service
$ systemctl start docker
$ sudo docker run hello-world
如果成功了,会有如下内容:
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:2557e3c07ed1e38f26e389462d03ed943586f744621577a99efb77324b0fe535
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
# 如果已经安装过nvidia-docker1.0,那么先卸载。
$ docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
$ sudo yum remove nvidia-docker
# 添加仓库
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
# 安装nvidia-docker2,重载Docker daemon configuration
$ sudo yum install -y nvidia-docker2
$ sudo pkill -SIGHUP dockerd
# 测试一下是否安装成功
$ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
如果安装成功了,会有如下内容
使用任何东西,先去官方找文档,CSDN虽好,但终归不是第一手材料!!!
使用docker进行服务器部署的时候,服务器上只装一个nvidia驱动,剩下的所有依赖全部放在nvidia-docker的镜像里。
参考文档:
1、https://docs.docker.com/install/linux/docker-ce/centos/#install-docker-ce
2、https://github.com/NVIDIA/nvidia-docker#centos-7-docker-ce-rhel-7475-docker-ce-amazon-linux-12