docker: deploy a deep learning environment manually

Since currently a server is shared between numerous lab members, I usually need to run my code with non-root environment. All too often, I find it is indispensable to install new packages and polish the running environment, yet it is extremely inconvenient with a non-root user. Therefore, I plan to use docker, which could separate each other’s running environment. Here I record how I make a deep learning docker image from a basic Ubuntu image.

First I check which version of CUDA and driver were installed in the server previously (stable version is preferred and I check that to avoid unnecessary pitfall)

cat /proc/driver/nvidia/version
cat /usr/local/cuda/version.txt

The command nvcc --version gives the CUDA compiler version (which matches the toolkit version).

Then, we should use nvidia-docker to enable the docker container to use GPU of server (QuickStart) :


# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

Moreover, I imitate the Dockerfile written previously by a machine learning server provider:
Dockerfile.gpu1
And I use the docker image built by nvidia (find the version that suits you in this website), which is 9.0-cudnn7-devel-ubuntu16.04 (Dockerfile).

  1. Enter the docker image just downloaded, install corresponding packages:
apt-get update
apt-get install -y bc \
	build-essential \
	cmake \
	curl \
	g++ \
	gfortran \
	git \
	libopenblas-dev \
	software-properties-common \
	vim \
	wget
  1. Clean the installation cache to control the image size:
apt-get clean
apt-get autoremove
rm -rf /var/lib/apt/lists/*
  1. Link BLAS library to use OpenBLAS using the alternative mechanism (https://www.scipy.org/scipylib/building/linux.html#debian-ubuntu)
update-alternatives --set libblas.so.3 /usr/lib/openblas-base/libblas.so.3
  1. Install pip
curl -O https://bootstrap.pypa.io/get-pip.py && \
python3 get-pip.py && \
rm get-pip.py
  1. Install TensorFlow GPU version
pip --no-cache-dir install tensorflow-gpu

你可能感兴趣的:(docker)