tensorflow-gpu和docker和cuda的一些设置总结

docker安装的坑

推荐有条件的,在一开始就用ubuntu18的系统,这样python版本和gcc都比较新,然后下载专业版pycharm,然后用docker配置运行环境。

 

1.

按照https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-using-the-repository攻略一直安装到Install the latest version of Docker Engine - Community and containerd, or go to the next step to install a specific version: $ sudo apt-get install docker-ce docker-ce-cli containerd.io

这一步时。我发现需要安装更高版本的libseccomp2_2.4.1-0ubuntu0.16.04.2_amd64.deb,所以从https://ubuntu.pkgs.org/16.04/ubuntu-updates-main-amd64/libseccomp2_2.4.1-0ubuntu0.16.04.2_amd64.deb.html这个网站下载了一个,并直接sudp apt install ./deb,升级了libseccomp2。然后docker也就安装成功了。

 

(base) gpu604@gpu604:~$ sudo docker version

Client: Docker Engine - Community

Version: 19.03.8

API version: 1.40

Go version: go1.12.17

Git commit: afacb8b7f0

Built: Wed Mar 11 01:25:58 2020

OS/Arch: linux/amd64

Experimental: false

 

Server: Docker Engine - Community

Engine:

Version: 19.03.8

API version: 1.40 (minimum version 1.12)

Go version: go1.12.17

Git commit: afacb8b7f0

Built: Wed Mar 11 01:24:30 2020

OS/Arch: linux/amd64

Experimental: false

containerd:

Version: 1.2.13

GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429

runc:

Version: 1.0.0-rc10

GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd

docker-init:

Version: 0.18.0

GitCommit: fec3683

(base) gpu604@gpu604:~$ cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 410.48 Thu Sep 6 06:36:33 CDT 2018

GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)

 

 

2.

按照https://github.com/NVIDIA/nvidia-docker输入下面的指令

# Add the package repositories

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

 

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

sudo systemctl restart docker

 

安装完成后用Usage测试

#### Test nvidia-smi with the latest official CUDA image

docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

似乎之后就有了sudo docker images

 

3.

然后

docker pull tensorflow/tensorflow:latest-gpu-jupyter # latest release w/ GPU support and Jupyter

这时sudo docker images也出现了tensorflow

之后又用sudo docker image rm tensorflow/tensorflow:latest-gpu-jupyter删除了

sudo docker ps -a # Lists containers (and tells you which images they are spun from)

sudo docker images # Lists images

sudo docker rm # Removes a container

 

sudo docker rmi # Removes an image

# Will fail if there is a running instance of that image i.e. container

 

sudo docker rmi -f # Forces removal of image even if it is referenced in multiple repositories,

# i.e. same image id given multiple names/tags

# Will still fail if there is a docker container referencing image

最终确认需要的tensorflow gpu版本,需要和自己的cuda版本,驱动版本对应

 

4.

下载并运行支持 GPU 的 TensorFlow 映像

sudo docker run --gpus all -it --rm tensorflow/tensorflow:2.0.0-gpu-py3 python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

 

启动bash:sudo docker run --gpus all -it tensorflow/tensorflow:2.0.0-gpu-py3 bash

但提示说你用了root启动,需要输入userid启动才好。所以首先搜索userid:(base) gpu604@gpu604:~$ id -u gpu604

1000,所以我用下面的命令进入bash

sudo docker run -u 1000:1000 --gpus all -it tensorflow/tensorflow:2.0.0-gpu-py3 bash

 

5.

接下来就可以pip install新的库了,可以用exit退出bash

但我发现pycharm只有专业版才能用docker这样的远程解释器。

所以没办法还是用anaconda了。但是anaconda里面想用新的tensorflow,必须升级一次cuda cudnn,实在太麻烦了。

 

6.

推荐有条件的,在一开始就用ubuntu18的系统,这样python版本和gcc都比较新,然后下载专业版pycharm,然后用docker配置运行环境。

你可能感兴趣的:(tensorflow)