基本配置 | 版本号 |
---|---|
CPU | Intel® Core™ i5-8400 CPU @ 2.80GHz × 6 |
GPU | GeForce RTX 2070 SUPER/PCIe/SSE2 |
OS | Ubuntu18.04 |
openjdk | 1.8.0_242 |
python | 3.6.9 |
bazel | 0.21.0 |
gcc | 4.8.5 |
g++ | 4.8.5 |
hint:
下载bazel 0.21.0
https://github.com/bazelbuild/bazel/releases?after=0.25.3
下载到本地后, 安装
sudo bash bazel-0.21.0-installer-linux-x86_64.sh
查看bazel 版本
bazel version
如何卸载bazel
rm -rf ~/.bazel
rm -rf /usr/bin/bazel
首先, 确认gcc和g++版本是否一致, 分别输入如下命令:
gcc --version
g++ --version
如果版本不统一请安装相同的版本. 命令如下
sudo apt-get install gcc-4.8 g++-4.8
查看是否安装完成
ls /usr/bin/gcc*
结果:
/usr/bin/g++
/usr/bin/g++-4.8
/usr/bin/gcc
/usr/bin/gcc-4.8
/usr/bin/gcc-ar-4.8
/usr/bin/gcc-nm-4.8
/usr/bin/gcc-ranlib-4.8
应用gcc
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 100
sudo update-alternatives --config gcc
应用g++
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 100
sudo update-alternatives --config g++
最后查看版本, 有显示版本即是成功.
如果系统有多个gcc,g++版本
普及如何切换gcc和g++的版本, 命令如下
sudo update-alternatives --config gcc
下载地址:
https://github.com/tensorflow/tensorflow/tree/r1.13
下载到本地后解压:
进入到tensorflow-r1.13目录下, 执行
./configure
开始安装, 大部分都是选择n, 除非安装cuda和cudnn, 默认的正确可以直接回车.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python
Found possible Python library paths:
/usr/lib/python3.6/dist-packages
/usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/pytho n3.6/dist-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]:10
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7
Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]:
Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]:
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia. com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 6.1,6.1]:
Do you want to use clang as CUDA compiler? [y/N]: n
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:n
Please specify optimization flags to use during compilation when bazel option "- -config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
tensorflow-r1.13目录下, 输入以下命令
bazel build --config=opt tensorflow/python/tools:freeze_graph
bazel build --config=opt tensorflow/contrib/lite/toco:toco
错误信息:
this rule is missing dependency declarations for the following files included by 'external/com_google_absl/absl/types/optional.cc':
'/usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h'
'/usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h'
'/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h'
'/usr/lib/gcc/x86_64-linux-gnu/8/include/stdarg.h'
'/usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h'
建议下载低版本的gcc和g++, 笔者下载的是gcc4.8和g++4.8, 下载过程如上文2.2部分.
更换一些配置的时候, 提示错误
ERROR: /home/gezp/.cache/bazel/_bazel_gezp/2e4f7705435d0bd99b2c7f0d4e7595e7/external/protobuf_archive/BUILD:93:1: undeclared inclusion(s) in rule '@protobuf_archive//:protobuf_lite':
this rule is missing dependency declarations for the following files included by 'external/protobuf_archive/src/google/protobuf/stubs/int128.cc'
需要删除bazel cache, 重新编译
rm -rf /home/gezp/.cache/bazel
会提示以下错误,
/home/xxh/tensorflow/tensorflow/contrib/boosted_trees/BUILD:559:1: Linking of rule '//tensorflow/contrib/boosted_trees:gen_gen_stats_accumulator_ops_py_wrap_py_wrappers_cc' failed (Exit 1)/usr/bin/ld: warning: libcudnn.so.7, needed by bazel-out/host/bin/_solib_local
/_U_S_Stensorflow_Scontrib_Sboosted_Utrees_Cgen_Ugen_Ustats_Uaccumulator_Uops_Upy_Uwrap_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
或者是提示
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
输入以下命令即可解决:
sudo echo "/usr/local/cuda/lib64" > /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
no such package '@curl
no such package '@grpc
no such package '@org_python_pypi_backports_weakref
no such package '@aws
no such package '@termcolor_archive
no such package '@boringssl
no such package '@icu
解决措施, 搭建本地服务器, 手动下载所需要的文件, 放在本地, 无需联网下载.
安装服务器
sudo apt-get install nginx-light
sudo service nginx start
之后打开浏览器输入http://127.0.0.1看到Welcome to nginx!之类的内容就说明安装成功了。
把手动下载下来的文件放到/var/www/html/路径下.
cp *.tar.gz /var/www/html/
修改模块, 模块在如下路径的文件中
tensorflow-r.13/tensorflow/workspace.bzl
以grpc为例, 找到以下代码, 将本地路径加到url第一行
# WARNING: make sure ncteisen@ and vpai@ are cc-ed on any CL to change the below rule
tf_http_archive(
name = "grpc",
sha256 = "1aa84387232dda273ea8fdfe722622084f72c16f7b84bfc519ac7759b71cdc91",
strip_prefix = "grpc-69b6c047bc767b4d80e7af4d00ccb7c45b683dae",
system_build_file = clean_dep("//third_party/systemlibs:grpc.BUILD"),
urls = [
"http://127.0.0.1/grpc-69b6c047bc767b4d80e7af4d00ccb7c45b683dae.tar.gz",
"https://mirror.bazel.build/github.com/grpc/grpc/archive/69b6c047bc767b4d80e7af4d00ccb7c45b683dae.tar.gz",
"https://github.com/grpc/grpc/archive/69b6c047bc767b4d80e7af4d00ccb7c45b683dae.tar.gz",
],
)
icu和aws模块分别在
tensorflow-r.13/third_party/icu/workspace.bzl
tensorflow-r.13/third_party/aws/workspace.bzl
配置tensorflow ./configure的时候要注意, python路径选错.
# python 路径
/usr/bin/python
#python 模块路径
/usr/local/lib/python3.6/dist-packages
会提示以下错误:
compilation of rule '//tensorflow/python:bfloat16_lib' failed
解决措施:
downgrade numpy to 1.18.5
tensorflow>=1.13不支持CUDA<=9.0,且明确提示要求GPU的compute capacity[8]>=3.5
tensorflow1.12可以支持CUDA9.0,并且没有提示compute capacity[8]要求
[1] https://blog.csdn.net/qq_17130909/article/details/78637329
[2] https://www.jianshu.com/p/d92913173d5b
[3] https://blog.csdn.net/surtol/article/details/97638399#0_1
[4] https://blog.csdn.net/qq_26535271/article/details/84930868#commentBox
[5] https://blog.csdn.net/xiaolt90/article/details/104971920
[6] https://blog.csdn.net/qq_26535271/article/details/83031412#commentBox
[7] http://blog.leanote.com/post/mrwaterzhou/0417e050f84c
[8] https://github.com/tensorflow/tensorflow/issues/15889