Ubuntu+Docker+Tensorflow+GPU安装

Docker对于在Linux下快速建立深度学习的工作环境很有帮助,参考一些文章,2小时安装完成。

 

0.预备

GCC,Python, CUDA等需要提前安装好。

CUDA上次安装Kaldi时我已经安装好了,是9.1版本。

 

1. 安装docker[1]

$ sudo apt-get remove docker docker-engine docker.io
$ sudo apt-get update
$ sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ sudo apt-get update

如果在 production 系統上安裝 docker 最好選擇固定版本,不然就直接用 latest (optional but recommanded)

$ apt-cache madison docker-ce
 docker-ce | 17.09.0~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
 docker-ce | 17.06.2~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
 docker-ce | 17.06.1~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
 docker-ce | 17.06.0~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
 docker-ce | 17.03.2~ce-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
 docker-ce | 17.03.1~ce-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
 docker-ce | 17.03.0~ce-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
$ sudo apt-get install docker-ce=17.09.0~ce-0~ubuntu

直接安裝 latest (fast in dev environment)

$ sudo apt-get install -y docker-ce

post installation

$ sudo groupadd docker
$ sudo usermod -aG docker $USER

重新登入測試能不能不用 sudo 執行 docker

$ docker run hello-world
$ sudo systemctl enable docker

若要移除

$ sudo apt-get purge docker-ce
$ sudo rm -rf /var/lib/docker

 

 

2. 安装 nvidia-docker


wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

# Test nvidia-smi 验证是否安装成功

nvidia-docker run --rm nvidia/cuda nvidia-smi

然后在执行下面这句,默认用nvdia-docker替代docker命令:

echo 'alias docker=nvidia-docker' >> ~/.bashrc
bash


3.下载使用TensorFlow镜像
从国内的阿里云Pull images拉取镜像:

sudo docker pull ubuntu # 获取 ubuntu 官方镜像
sudo docker pull registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow

查看当前镜像列表:

$ sudo docker images

基于某个镜像新建一个容器,并运行bash工具(镜像可以用IMAGE ID表示):

sudo docker run  -i -t registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow /bin/bash

在另一个终端窗口,查看容器:
sudo docker ps # 查看当前运行的容器, ps -a 列出当前系统所有的容器

sunfoot@sunfoot-BigBoy:~$ sudo docker exec -it 6665676662b1 /bin/bash
[sudo] sunfoot 的密码:
Error: No such container: 6665676662b1
sunfoot@sunfoot-BigBoy:~$ sudo docker ps
CONTAINER ID        IMAGE                                                     COMMAND             CREATED             STATUS              PORTS                NAMES
be332eead58b        registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow   "/bin/bash"         2 minutes ago       Up 2 minutes        6006/tcp, 8888/tcp   dreamy_lamport
sunfoot@sunfoot-BigBoy:~$ sudo docker exec -it be332eead58b /bin/bash

运行 TensorFlow,在Docker容器中打开一个 python 终端

$ python

>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print sess.run(hello)
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print sess.run(a+b)
42

4. NotebookApp下运行Mnist数据集+TF[3]

 

5. 如何在容器与主机间共享文件

从Docker容器内拷贝文件到主机上,这里bf1bdfb8d32c是容器ID号。

docker cp bf1bdfb8d32c:notebooks/1_hello_tensorflow.ipynb  /home/sunfoot/Work/tmp

从主机拷贝文件到容器

用-v挂载主机数据卷到容器内 ,这里db2de2a7c410是IMAGE ID。

docker run -it -v /home/sunfoot/Work/tmp:/mnt db2de2a7c410 /bin/bash 
在容器内拷贝  
cp /mnt/sourcefile /path/to/destfile  

6. TF用GPU运行

可看到49b48d227d6e image是支持GPU的。

sunfoot@sunfoot-BigBoy:~$ docker image ls -a
REPOSITORY                                                TAG                 IMAGE ID            CREATED             SIZE
nvidia/cuda                                            latest              04a9ce0dec6d        3 months ago        1.96GB
tensorflow/tensorflow                             latest-gpu      49b48d227d6e        3 months ago        3.1GB
ubuntu                                                    latest             735f80812f90        3 months ago        83.5MB
hello-world                                             latest              2cb0d9787c4d        4 months ago        1.85kB
registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow   latest              db2de2a7c410        7 months ago        1.27GB

所以运行docker run -it -v /home/sunfoot/Work/tmp:/mnt 49b48d227d6e   /bin/bash

然后运行python tf_test_2.py, 这个脚本里是使用GPU的,如下.运行成功。

import tensorflow as tf

with tf.device("/gpu:0"):
  matrix1 = tf.constant([[3., 3.]])
  matrix2 = tf.constant([[2.],[2.]])
  product = tf.matmul(matrix1, matrix2)
  sess = tf.Session()
  result = sess.run(product)
  print result
  sess.close()

 

参考:

[1] https://wyde.github.io/2017/11/08/How-to-Install-Docker-CE-on-Ubuntu-16-04-and-Fedora-26/

[2] https://wyde.github.io/2017/11/09/How-to-Install-Tensorflow-using-Docker-on-Ubuntu-16-04/ 
[3] https://blog.csdn.net/baobei0112/article/details/79025309

[4] https://yeasy.gitbooks.io/docker_practice/content/container/run.html

 

 

 

 

 

你可能感兴趣的:(Base)