tensorflow在linux系统上的安装

tensorflow在ubuntu系统上按照官方文档安装起来相对容易,在centos上由于没有apt-get( yum)相对困难一些,本文会提到一些安装过程中遇到的一些坑及解放方案。

官方文档:https://www.tensorflow.org/versions/r0.11/get_started/os_setup.html#download-and-setup

System: ubuntu/Centos, 通常linux系统都是针对ubuntu(源码安装时用到的Bazel)
Python: Anaconda3,python2

(一)Conda安装:
conda install -c conda-forge tensorflow

Error:
Missing write permissions in: /export/App/anaconda3
# ls -l /export/App/anaconda3
total 80
drwxrwxr-x 2 hadoop hadoop 12288 Sep 5 15:57 bin

Fix:
# chmod -R 777 /export/App/anaconda3/
# ls -l /export/App/anaconda3
total 80
drwxrwxrwx 2 hadoop hadoop 12288 Sep 5 15:57 bin

下载packages出错:
The following NEW packages will be INSTALLED:
funcsigs: 1.0.2-py27_0 conda-forge
mkl: 11.3.3-0
mock: 2.0.0-py27_0 conda-forge
numpy: 1.11.2-py27_0
pbr: 1.10.0-py27_0 conda-forge
protobuf: 3.0.0-py27_0 conda-forge
six: 1.10.0-py27_1 conda-forge
tensorflow: 0.11.0rc2-py27_0 conda-forge
Fetching packages ...
protobuf-3.0.0 1% ##############################################
CondaRuntimeError: Runtime error: RuntimeError: Runtime error: Could not open u'/root/anaconda2/pkgs/protobuf-3.0.0-py27_0.tar.bz2.part' for writing (HTTPSConnectionPool(host='binstar-cio-packages-prod.s3.amazonaws.com', port=443): Read timed out.).
Fix:
下载protobuf时出错,可以直接去 https://anaconda.org/anaconda/protobuf/files?version=3.0.0下载相应的版本,然后放到anaconda2/pkgs目录下,这里需要重命名:
mv linux-64-protobuf-3.0.0-py27_0.tar.bz2 protobuf-3.0.0-py27_0.tar.bz2

安装完成。但运行时报错:
>>>import tensorflow as tf
Traceback (most recent call last):
File "", line 1, in
File "/export/App/anaconda3/envs/ml2/lib/python2.7/site-packages/tensorflow/__init__.py", line 23, in
from tensorflow.python import *
File "/export/App/anaconda3/envs/ml2/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/export/App/anaconda3/envs/ml2/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in
_pywrap_tensorflow = swig_import_helper()
File "/export/App/anaconda3/envs/ml2/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /export/App/anaconda3/envs/ml2/lib/python2.7/site- packages/tensorflow/python/_pywrap_tensorflow.so)
问题分析:
上面的问题主要是glibc的版本太低,和tensorflow编译使用的glibc环境不一样,升级glibc是一个危险的动作,可能会造成系统无法运行。
解决参考: http://blog.csdn.net/levy_cui/article/details/51251095
http://blog.csdn.net/daluguishou/article/details/51773830#t26

建议用root用户去操作
下载glibc地址:http://ftp.gnu.org/pub/gnu/glibc/
解压tar.xz文件:先 xz -d xxx.tar.xz 将 xxx.tar.xz解压成 xxx.tar 然后,再用 tar xvf xxx.tar来解包
/opt/glibc-2.14/
export LD_LIBRARY_PATH=/opt/glibc-2.14/lib:$LD_LIBRARY_PATH
ImportError: /opt/glibc-2.14/lib/libc.so.6: version `GLIBC_2.17' not found(required by /export/App/anaconda3/envs/ml2/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so)
安装2.17,过程和上面相同,这里省略
export LD_LIBRARY_PATH=/opt/glibc-2.17/lib:$LD_LIBRARY_PATH
error while loading shared libraries: __vdso_time: invalid mode for dlopen(): Invalid argument
这里还涉及到了ldconfig
采用直接升级glibc来解决:
这里不是使用export的方式,直接升级解决安装glibc2.17的问题:
wget http://ftp.gnu.org/pub/gnu/glibc/glibc-2.17.tar.xz
xz -d glibc-2.17.tar.xz
tar -xvf glibc-2.17.tar
cd glibc-2.17
mkdir build
cd build
../configure --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include --with-binutils=/usr/bin
// 这里可能会遇到问题:*** LD_LIBRARY_PATH shouldn't contain the current directory when building glibc. Please change the environment variable and run configure again.
// 通过echo $LD_LIBRARY_PATH可以得到:/usr/local/lib64,查看环境变量.bashrc里配置了该选项,临时解决:export LD_LIBRARY_PATH= // 还可能遇到的问题:configure: error: support for --no-whole-archive is needed,暂时未解决。
make && make install
需要等大概10分钟。
输入strings /lib64/libc.so.6|grep GLIBC发现已经更新
执行后,又会遇到下面的glibc++的错误:
ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.19' not found (required by /opt/jmr/anaconda2/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so)
同样,直接升级glibc++来解决:
strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX //注意地址: /usr/local/lib64/libstdc++.so.6
wget http://ftp.de.debian.org/debian/pool/main/g/gcc-4.9/libstdc++6_4.9.2-10_amd64.deb
ar -x libstdc++6_4.9.2-10_amd64.deb
xz -d data.tar.xz
tar xvf data.tar
cd usr/lib/x86_64-linux-gnu/
find / -name libstdc++.so.6
mv /usr/lib64/libstdc++.so.6 /usr/lib64/libstdc++.so.6.bak
cp libstdc++.so.6.0.20 /usr/lib64/
cd /usr/lib64/
chmod +x libstdc++.so.6.0.20
ln -s libstdc++.so.6.0.20 libstdc++.so.6
继续执行,报错:
ImportError: /lib64/libc.so.6: version `GLIBC_2.18' not found (required by libstdc++.so.6)
直接安装最新版本,高版本是兼容低版本的,但是一些高版本是不能安装在较低的软件环境中:
从 http://ftp.gnu.org/pub/gnu/glibc/glibc-2.18.tar.xz 下载,重复上面的步骤。
把所有需要的依赖升级完后,安装完成,最新版没问题。

(二)Pip安装
https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation
python3.5:
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp35-cp35m-linux_x86_64.whl
sudo pip3 install --upgrade $TF_BINARY_URL
报错:
Installing collected packages: setuptools, protobuf, numpy, tensorflow
Found existing installation: setuptools 23.0.0
Cannot remove entries from nonexistent file /export/App/anaconda3/envs/ml3/lib/python3.5/site-packages/easy-install.pth

(三)源码安装:
https://www.tensorflow.org/versions/r0.11/get_started/os_setup.html#installing-from-sources
(1)git clone https://github.com/tensorflow/tensorflow --recursive
(2)安装google的bazel来编译:
下载地址:https://github.com/bazelbuild/bazel/releases
chmod +x PATH_TO_INSTALL.SH
./PATH_TO_INSTALL.SH --user

直接安装如果失败,可以尝试从源码安装bazel:
git clone https://github.com/ibmsoe/bazel
cd bazel
git checkout master
./compile.sh
Build successful! Binary is here: /export/App/bazel/output/bazel
注意安装完后的路径是上面给出的路径,由于上面的sh已经把bazel安装到/usr/local/bin/bazel下,所以要使用新安装的bazel修改 tensorflow/configure里的bazel改为绝对路径。

(3)安装其他依赖:
# For Python 2.7:
$ sudo apt-get install python-numpy swig python-dev python-wheel
# For Python 3.x:
$ sudo apt-get install python3-numpy swig python3-dev python3-wheel
(4)configuration:
./configure
会提示more than serveral minutes,以及修改参数:
bazel clean --expunge_async
报错:
ROR: /root/.cache/bazel/_bazel_root/9571b1b1b31378f1c909e87f9c9ba23a/external/com_googlesource_code_re2/BUILD:11:1: C++ compilation of rule '@com_googlesource_code_re2//:re2' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 ... (remaining 29 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/com_googlesource_code_re2/re2/prog.h:19,
from external/com_googlesource_code_re2/re2/bitstate.cc:25:
This is due to the lack of cpu ram or swap. you can modify --jobs value or --ram_utilization_factor value . or check if there is any process that occupies large ram. and kill it. It happends to me that there may exist two bazel servers. so I need to kill one.
(5)Create the pip package and install:
bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
报错:
/usr/local/bin/bazel: line 88: /usr/local/lib/bazel/bin/bazel-real: cannot execute binary file
这里解决是重新安装bazel,参考(2)中的从源码安装bazel
(6)安装
mkdir _python_build
cd _python_build
ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .
ln -s ../tensorflow/tools/pip_package/* .
python setup.py develop


你可能感兴趣的:(深度学习)