环境:Ubuntu server 18.04
安装:docker CE
参考:ubuntu 18.04 安装docker ce
以下参考:NVIDIA Container Toolkit
Make sure you have installed the NVIDIA driver and Docker 19.03 for your Linux distribution Note that you do not need to install the CUDA toolkit on the host, but the driver needs to be installed
以下参考:CUDA TOOLKIT DOCUMENTATION
Pre-installation Actions:
The NVIDIA CUDA Toolkit is available at http://developer.nvidia.com/cuda-downloads.
Choose the platform you are using and download the NVIDIA CUDA Toolkit
The CUDA Toolkit contains the CUDA driver and tools needed to create, build and run a CUDA application as well as libraries, header files, CUDA samples source code, and other resources.
Package Manager Installation:
Perform the pre-installation actions.
Install repository meta-data
$ sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
Installing the CUDA public GPG key
When installing using the local repo:
$ sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
When installing using network repo on Ubuntu 18.04/18.10:
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/7fa2af80.pub
When installing using network repo on Ubuntu 16.04:
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/7fa2af80.pub
Update the Apt repository cache
$ sudo apt-get update
Install CUDA
$ sudo apt-get install cuda
Perform the post-installation actions.
可能会出现的问题:
参考:Ubuntu18.04下搭建深度学习环境(tensorflow CPU GPU、Keras、Pytorch、Pycharm、Jupyter)
安装完显卡驱动后,系统需要重启加载驱动,注意如果按照上述流程进行驱动安装,那在重启系统时,会出现一个蓝色背景的界面 perform mok management :
(1)当进入蓝色背景的界面perform mok management 后,选择 enroll mok ,
(2)进入enroll mok 界面,选择 continue ,
(3)进入enroll the key 界面,选择 yes ,
(4)接下来输入你在安装驱动时输入的密码,
(5)之后会跳到蓝色背景的界面perform mok management 选择第一个 reboot这样,重启后N卡驱动就加载了
简洁安装方式:
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
重启系统
以下参考:NVIDIA Container Toolkit
Ubuntu 16.04/18.04, Debian Jessie/Stretch/Buster
# Add the package repositories $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit $ sudo systemctl restart docker
以上版本的nvidia-docker还需安装nvidia-container-runtime
以下参考:nvidia-container-runtime
Installation:Ubuntu distributions
Install the repository for your distribution by following the instructions here.
Install the
nvidia-container-runtime
package:sudo apt-get install nvidia-container-runtime
Docker Engine setup:
Do not follow this section if you installed the nvidia-docker2 package, it already registers the runtime.
To register the
nvidia
runtime, use the method below that is best suited to your environment.
You might need to merge the new argument with your existing configuration.Systemd drop-in file
sudo mkdir -p /etc/systemd/system/docker.service.d sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF [Service] ExecStart= ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime EOF sudo systemctl daemon-reload sudo systemctl restart docker
Daemon configuration file
sudo tee /etc/docker/daemon.json <<EOF { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } } EOF sudo pkill -SIGHUP dockerd
You can optionally reconfigure the default runtime by adding the following to
/etc/docker/daemon.json
:"default-runtime": "nvidia"
Command line
sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]
以下参考:【官方】TensorFlow/install/Docker
注意:
nvidia-docker
v1 使用nvidia-docker
别名,而 v2 使用docker --runtime=nvidia
。使用最新的 TensorFlow GPU 映像在容器中启动
bash
shell 会话:docker run --runtime=nvidia -it tensorflow/tensorflow:latest-gpu bash