Docker对于在Linux下快速建立深度学习的工作环境很有帮助,参考一些文章,2小时安装完成。
0.预备
GCC,Python, CUDA等需要提前安装好。
CUDA上次安装Kaldi时我已经安装好了,是9.1版本。
1. 安装docker[1]
$ sudo apt-get remove docker docker-engine docker.io
$ sudo apt-get update
$ sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ sudo apt-get update
如果在 production 系統上安裝 docker 最好選擇固定版本,不然就直接用 latest (optional but recommanded)
$ apt-cache madison docker-ce
docker-ce | 17.09.0~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.06.2~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.06.1~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.06.0~ce-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.03.2~ce-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.03.1~ce-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 17.03.0~ce-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
$ sudo apt-get install docker-ce=17.09.0~ce-0~ubuntu
直接安裝 latest (fast in dev environment)
$ sudo apt-get install -y docker-ce
post installation
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
重新登入測試能不能不用 sudo 執行 docker
$ docker run hello-world
$ sudo systemctl enable docker
若要移除
$ sudo apt-get purge docker-ce
$ sudo rm -rf /var/lib/docker
2. 安装 nvidia-docker
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
# Test nvidia-smi 验证是否安装成功
nvidia-docker run --rm nvidia/cuda nvidia-smi
然后在执行下面这句,默认用nvdia-docker替代docker命令:
echo 'alias docker=nvidia-docker' >> ~/.bashrc
bash
3.下载使用TensorFlow镜像
从国内的阿里云Pull images拉取镜像:
sudo docker pull ubuntu # 获取 ubuntu 官方镜像
sudo docker pull registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow
查看当前镜像列表:
$ sudo docker images
基于某个镜像新建一个容器,并运行bash工具(镜像可以用IMAGE ID表示):
sudo docker run -i -t registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow /bin/bash
在另一个终端窗口,查看容器:
sudo docker ps # 查看当前运行的容器, ps -a 列出当前系统所有的容器
sunfoot@sunfoot-BigBoy:~$ sudo docker exec -it 6665676662b1 /bin/bash
[sudo] sunfoot 的密码:
Error: No such container: 6665676662b1
sunfoot@sunfoot-BigBoy:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
be332eead58b registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow "/bin/bash" 2 minutes ago Up 2 minutes 6006/tcp, 8888/tcp dreamy_lamport
sunfoot@sunfoot-BigBoy:~$ sudo docker exec -it be332eead58b /bin/bash
运行 TensorFlow,在Docker容器中打开一个 python 终端
$ python
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print sess.run(hello)
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print sess.run(a+b)
42
4. NotebookApp下运行Mnist数据集+TF[3]
5. 如何在容器与主机间共享文件
从Docker容器内拷贝文件到主机上,这里bf1bdfb8d32c是容器ID号。
docker cp bf1bdfb8d32c:notebooks/1_hello_tensorflow.ipynb /home/sunfoot/Work/tmp
从主机拷贝文件到容器
用-v挂载主机数据卷到容器内 ,这里db2de2a7c410是IMAGE ID。
docker run -it -v /home/sunfoot/Work/tmp:/mnt db2de2a7c410 /bin/bash
在容器内拷贝 cp /mnt/sourcefile /path/to/destfile
6. TF用GPU运行
可看到49b48d227d6e image是支持GPU的。
sunfoot@sunfoot-BigBoy:~$ docker image ls -a
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda latest 04a9ce0dec6d 3 months ago 1.96GB
tensorflow/tensorflow latest-gpu 49b48d227d6e 3 months ago 3.1GB
ubuntu latest 735f80812f90 3 months ago 83.5MB
hello-world latest 2cb0d9787c4d 4 months ago 1.85kB
registry.cn-hangzhou.aliyuncs.com/denverdino/tensorflow latest db2de2a7c410 7 months ago 1.27GB
所以运行docker run -it -v /home/sunfoot/Work/tmp:/mnt 49b48d227d6e /bin/bash
然后运行python tf_test_2.py, 这个脚本里是使用GPU的,如下.运行成功。
import tensorflow as tf
with tf.device("/gpu:0"):
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)
sess = tf.Session()
result = sess.run(product)
print result
sess.close()
参考:
[1] https://wyde.github.io/2017/11/08/How-to-Install-Docker-CE-on-Ubuntu-16-04-and-Fedora-26/
[2] https://wyde.github.io/2017/11/09/How-to-Install-Tensorflow-using-Docker-on-Ubuntu-16-04/
[3] https://blog.csdn.net/baobei0112/article/details/79025309
[4] https://yeasy.gitbooks.io/docker_practice/content/container/run.html