大部分Docker的远程TensorFlow环境的搭建,都是围绕着Tensorflow这个镜像运作的,这个镜像的好处是安装简单,大体步骤就是“Nvidia/CUDA >> Nvidia-Docker2 >> Tensorflow-xx-xx-...”。缺点是:
--- 没有Anaconda环境,安装就要考虑新的conda环境怎样使用镜像中的Tensorflow
--- 默认的Jupyter Notebook很蠢,想配置个远程访问密码,呵呵呵
--- 如果你像我一样需要Google模型,也是,呵呵呵
所以考虑一下,为什么不可以“Nvidia/CUDA >> Nvidia-Docker2 >> Anaconda”之后 pip 安装呢?
docker安装网上一大把,不做赘述,根据自己的操作系统版本选择。
推荐一个教程
至于显卡驱动,先确认是个N卡,其次支持GPU运算,剩下的就是官网走一波了。
官方教程
Ubuntu 16.04/18.04, Debian Jessie/Stretch/Buster
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker
CentOS 7 (docker-ce), RHEL 7.4/7.5 (docker-ce), Amazon Linux 1/2
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ sudo yum install -y nvidia-container-toolkit
$ sudo systemctl restart docker
1. 首先配置国内镜像
$ curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://f1361db2.m.daocloud.io
2. 查看需要安装的版本
2.2 查看显卡型号(1080)
$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b80 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b80 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
2.3 查看驱动有效及版本(OK)
$ nvidia-smi
Tue Jan 14 17:43:43 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
| 29% 24C P0 38W / 200W | 0MiB / 8117MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A |
| 30% 23C P0 36W / 200W | 0MiB / 8119MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
2.4 查看cuDNN的版本(7)
$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 3
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
3. 安装
综上,推得我要安装的镜像为 nvidia/cuda:10.0-cudnn7-devel-centos7(系统任选,但是会影响与下文的dockerfile)
接下来就是基于nvidia/cuda:10.0-cudnn7-devel-centos7安装镜像,这里我已经将安装文件整理好了。
https://pan.baidu.com/s/1UDBINARdtqvS3BMUhnsrdQ 提取码:2auu
在宿主机上新建目录/.../install和/.../DockerDir
将四个安装文件和Dockerfile放进/.../install,进入该目录。
$ docker build -t tf-gpu .
这是写好的Dockerfile,需要修改的只有jupyter的访问密码
### 基础镜像
FROM nvidia/cuda:10.0-cudnn7-devel-centos7
### 声明变量
# Anaconda安装文件
ARG conda_install=Anaconda3-2019.10-Linux-x86_64.sh
# TensorFlow安装文件
ARG tensorflow_install=tensorflow_gpu-1.15.0-cp37-cp37m-manylinux2010_x86_64.whl
# Google模型安装文件
ARG models_insatll=models-master.zip
# Protoc模型安装文件
ARG protoc_insatll=protobuf-python-3.11.2.tar.gz
ARG protoc_dir=protobuf-3.11.2
# jupyter 配置
ARG jupyter_password=6789@jkl
### 更新/安装基础工具
RUN yum install -y unzip
RUN yum install -y zip
RUN yum install -y vim
RUN yum install -y wget
RUN yum install -y gcc
RUN yum install -y automake
RUN yum install -y autoconf
RUN yum install -y libtool
RUN yum install -y make
### 安装Anaconda3
COPY ${conda_install} /root/
RUN /usr/bin/bash /root/${conda_install} -b -p /usr/local/anaconda3
ENV CONDA_HOME=/usr/local/anaconda3
ENV PATH=${CONDA_HOME}/bin:$PATH
### 安装tensorflow-gpu-1.15.0
COPY ${tensorflow_install} /root/
RUN ${CONDA_HOME}/bin/pip install /root/${tensorflow_install}
### 安装Google模型
COPY ${models_insatll} /root/
RUN /usr/bin/unzip /root/${models_insatll} -d /usr/local/
RUN mv /usr/local/models-master /usr/local/models
RUN cd ${CONDA_HOME}/lib/python3.7/site-packages; \
echo "/usr/local/models/research" >> tensorflow_model.pth; \
echo "/usr/local/models/research/slim" >> tensorflow_model.pth
### 安装Protoc
COPY ${protoc_insatll} /root/
RUN cd /root/; \
/usr/bin/tar -zxvf ${protoc_insatll};
RUN cd /root/${protoc_dir}; \
./configure --prefix=/usr/local/protobuf; \
make; \
make install;
ENV PROTOC_HOME=/usr/local/protobuf
ENV PATH=${PROTOC_HOME}/bin:$PATH
ENV PKG_CONFIG_PATH=/usr/local/protobuf/lib/pkgconfig/
RUN echo "/usr/local/protobuf/lib" >> /etc/ld.so.conf; \
ldconfig;
### 配置jupyter
RUN ${CONDA_HOME}/bin/jupyter notebook --generate-config
RUN cd /root/; \
echo "from notebook.auth import passwd" >> get_pw.py; \
echo "pwd = passwd('${jupyter_password}')" >> get_pw.py; \
echo "print(pwd)" >> get_pw.py
RUN cd /root/.jupyter/; \
echo "c.NotebookApp.allow_remote_access = True" >> jupyter_notebook_config.py; \
echo "c.NotebookApp.allow_root = True" >> jupyter_notebook_config.py; \
echo "c.NotebookApp.ip = '*'" >> jupyter_notebook_config.py; \
echo "c.NotebookApp.open_browser = False" >> jupyter_notebook_config.py; \
echo "c.NotebookApp.password = u'`${CONDA_HOME}/bin/python /root/get_pw.py`'" >> jupyter_notebook_config.py; \
echo "c.NotebookApp.password_required = True" >> jupyter_notebook_config.py; \
echo "c.NotebookApp.port = 5555" >> jupyter_notebook_config.py; \
echo "c.NotebookApp.quit_button = False" >> jupyter_notebook_config.py
一切就绪启动命令如下:(绑定目录可以改、映射端口可以改)
nvidia-docker run -i \
-p 5555:5555 \
-v /.../DockerDir:/LocalDir \
tf-gpu:latest \
/bin/bash -c "jupyter notebook" \
> tf-jupyter.log 2>&1 &
直接访问宿主机 IP:5555 成了!