制作的docker镜像 Anaconda3+tensorflow-gpu +nvidia-docker

制作的docker镜像 Anaconda3+tensorflow-gpu1.12 踩坑记录

前提:已经安装好了docker和nvidia-docker

一、dockerfile制作镜像

1、创建dockerfile所在的文件夹

mkdir tfgpu
cd tfgpu
vim DockerFile

DockerFile内容
RUN echo -e “[global]\nindex-url = https://pypi.mirrors.ustc.edu.cn/simple/” >> ~/pip.conf这一步貌似美起作用,可以不写,直接临时换源下载即可
tensorflow可以直接指定为1.12.0,默认不写会下载为2.1.0,最后还的卸载

FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
MAINTAINER bobwang

# install basic dependencies
RUN apt-get update 
RUN apt-get install -y wget \
		vim \
		cmake

# install Anaconda3
RUN wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.2.0-Linux-x86_64.sh -O ~/anaconda3.sh
RUN bash ~/anaconda3.sh -b -p /home/anaconda3 \
	&& rm ~/anaconda3.sh 
ENV PATH /home/anaconda3/bin:$PATH

# change mirror
RUN mkdir ~/.pip \
	&& cd ~/.pip 	
RUN	echo -e "[global]\nindex-url = https://pypi.mirrors.ustc.edu.cn/simple/" >> ~/pip.conf

# install tensorflow
RUN /home/anaconda3/bin/pip install wrapt --ignore-installed &&  pip install --upgrade pip -i https://mirrors.aliyun.com/pypi/simple/ && /home/anaconda3/bin/pip install tensorflow-gpu -i https://mirrors.aliyun.com/pypi/simple/ 


3、制作镜像

docker build -t tf-gpu .

4、运行镜像

sudo nvidia-docker run -it --rm --name test c225e83ca98e /bin/bash

5、使用tensorflow

import tensorflow as tf

报错

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uGQnzpFB-1587399659477)(/home/bob/.config/Typora/typora-user-images/image-20200420141725441.png)]

关键信息

W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64

解决:

可能原因1、tensorRT包缺少

原因2:tensorflow版本与cuda不兼容,

对照表

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UXvqfWa4-1587399659484)(/home/bob/.config/Typora/typora-user-images/image-20200420151538217.png)]

查看cuda的版本

cat /usr/local/cuda/version.txt

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-y7lWUTD1-1587399659487)(/home/bob/.config/Typora/typora-user-images/image-20200420153933596.png)]

查看cudnn的版本

cat /usr/local/cuda/include/cudnn.h |grep CUDNN_MAJOR -A 2

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bVl3dKIX-1587399659491)(/home/bob/.config/Typora/typora-user-images/image-20200420154134620.png)]

显示无信息

尝试1:安装与cuda对应的tensorflow版本1.12

pip uninstall tensorflow-gpu
pip install tensorflow-gpu==1.12.0 -i https://mirrors.aliyun.com/pypi/simple/ 

验证是否安装成功

import tensorflow as tf
a = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[2,3],name='a')
b = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[2,3],name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))
输出:
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2020-04-20 08:48:12.497338: I tensorflow/core/common_runtime/placer.cc:927] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
Const: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2020-04-20 08:48:12.497482: I tensorflow/core/common_runtime/placer.cc:927] Const: (Const)/job:localhost/replica:0/task:0/device:CPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2020-04-20 08:48:12.497788: I tensorflow/core/common_runtime/placer.cc:927] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2020-04-20 08:48:12.497971: I tensorflow/core/common_runtime/placer.cc:927] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]
# 成功

二、将项目倒入验证运行

1、安装环境

pip install numpy==1.16.2 -i https://mirrors.aliyun.com/pypi/simple/ 
pip install diango==2.2.5  -ihttps://mirrors.aliyun.com/pypi/simple/ 
 pip install filetype
 pip install pymysql
pip install django-cors-headers
 pip install djangorestframework
  pip install opencv-python==4..1.1.26
  
  
 dpkg --add-architecture i386
apt-get update
apt-get upgrade

apt-get install libsm6
apt-get install libxrender1
apt-get install libxext-dev

pip install dlib
#  apt-get libSM-1.2.2-2.e17.x86_64 -- setopt=protected_multilib=false
 conda install keras=2.2.4

三、制备成新的镜像

sudo docker commit -a bobwang -m "tf12gpu" 1bfd0ffe6988 tf-gpu1.12-emotion:1.0

四、运行镜像

sudo nvidia-docker run --rm -it -p8000:8000 -v /home/bob/tfgpu/emotion-classifier/:/root/projects  --name emotion a6313cb19987 /bin/bash

五、报错

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Q78NOJPS-1587399659494)(/home/bob/.config/Typora/typora-user-images/image-20200420232548068.png)]

原因:这是python的编码问题

解决:

# 修改环境变量
vim /etc/profile
# 添加
export PYTHONIOENCODING=utf-8
export LANG='en_US.UTF-8'
# 是环境变量生效即可
# 在python的环境中检查
import sys
sys.stdout.encoding()
>>:'utf-8'

六、再保存

sudo docker commit -a bobwang -m "tf12gpu-utf8" 34085eec0108 tf-gpu1.12-emotion-utf8:2.0
sha256:8cd080380d7371461df4b65cbc832a7b7dc876110af8fa4bc1aedec2a9813636

你可能感兴趣的:(docker,tensorflow-GPU)