ubuntu--bazel编译tensorflow源码

bazel编译tensorflow源码

  • 1. 配置
  • 2. 理想安装过程
    • 2.1 安装bazel
    • 2.2 安装gcc和g++
    • 2.3根据源码安装tensorflow-r1.13
  • 3. 开始编译
  • 4. 踩坑记录
    • 4.1 gcc版本版本过高, 或者gcc和g++版本不一致
    • 4.2 清除bazel缓存
    • 4.3 找不到libcudnn.so
    • 4.4 网络问题, 一些package not found
    • 4.4 python ModuleNotFoundError
    • 4.5 numpy问题
    • 4.6 编译GPU版本
  • 参考

1. 配置

基本配置 版本号
CPU Intel® Core™ i5-8400 CPU @ 2.80GHz × 6
GPU GeForce RTX 2070 SUPER/PCIe/SSE2
OS Ubuntu18.04
openjdk 1.8.0_242
python 3.6.9
bazel 0.21.0
gcc 4.8.5
g++ 4.8.5

hint:

  1. gcc和g++必须版本一致,不然会报错.
  2. 我试了tensorflow-master, tensorflow-r14, 都有各种各样的问题, 最后下载tensorflow-r13, 虽然也遇到一些问题, 但是解决的难度小很多.
  3. 配置tensorflow ./configure的时候, python和python库的路径别搞错.

2. 理想安装过程

2.1 安装bazel

下载bazel 0.21.0
https://github.com/bazelbuild/bazel/releases?after=0.25.3
下载到本地后, 安装

sudo bash bazel-0.21.0-installer-linux-x86_64.sh

查看bazel 版本

bazel version

显示版本即是成功, 如下:
ubuntu--bazel编译tensorflow源码_第1张图片


如何卸载bazel

rm -rf ~/.bazel
rm -rf /usr/bin/bazel

2.2 安装gcc和g++

首先, 确认gcc和g++版本是否一致, 分别输入如下命令:

gcc --version
g++ --version

如果版本不统一请安装相同的版本. 命令如下

sudo apt-get install gcc-4.8 g++-4.8

查看是否安装完成

ls /usr/bin/gcc*

结果:

/usr/bin/g++  
/usr/bin/g++-4.8  
/usr/bin/gcc  
/usr/bin/gcc-4.8  
/usr/bin/gcc-ar-4.8  
/usr/bin/gcc-nm-4.8  
/usr/bin/gcc-ranlib-4.8

应用gcc

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 100
sudo update-alternatives --config gcc

应用g++

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 100
sudo update-alternatives --config g++

最后查看版本, 有显示版本即是成功.


如果系统有多个gcc,g++版本
普及如何切换gcc和g++的版本, 命令如下

sudo update-alternatives --config gcc

2.3根据源码安装tensorflow-r1.13

下载地址:
https://github.com/tensorflow/tensorflow/tree/r1.13
下载到本地后解压:
进入到tensorflow-r1.13目录下, 执行

./configure

开始安装, 大部分都是选择n, 除非安装cuda和cudnn, 默认的正确可以直接回车.

Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python
 
Found possible Python library paths:
  /usr/lib/python3.6/dist-packages
  /usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/pytho n3.6/dist-packages]
 
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n
No XLA JIT support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
 
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
 
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]:10
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7
  
Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]:
 
Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]:
 
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia. com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 6.1,6.1]:
 
Do you want to use clang as CUDA compiler? [y/N]: n
 
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
 
Do you wish to build TensorFlow with MPI support? [y/N]:n
 
Please specify optimization flags to use during compilation when bazel option "- -config=opt" is specified [Default is -march=native -Wno-sign-compare]:
 
 
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:  n

3. 开始编译

tensorflow-r1.13目录下, 输入以下命令

bazel build --config=opt tensorflow/python/tools:freeze_graph
bazel build --config=opt tensorflow/contrib/lite/toco:toco

4. 踩坑记录

4.1 gcc版本版本过高, 或者gcc和g++版本不一致

错误信息:

this rule is missing dependency declarations for the following files included by 'external/com_google_absl/absl/types/optional.cc':
  '/usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h'
  '/usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h'
  '/usr/lib/gcc/x86_64-linux-gnu/8/include/stddef.h'
  '/usr/lib/gcc/x86_64-linux-gnu/8/include/stdarg.h'
  '/usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h'

建议下载低版本的gcc和g++, 笔者下载的是gcc4.8和g++4.8, 下载过程如上文2.2部分.

4.2 清除bazel缓存

更换一些配置的时候, 提示错误

ERROR: /home/gezp/.cache/bazel/_bazel_gezp/2e4f7705435d0bd99b2c7f0d4e7595e7/external/protobuf_archive/BUILD:93:1: undeclared inclusion(s) in rule '@protobuf_archive//:protobuf_lite':
this rule is missing dependency declarations for the following files included by 'external/protobuf_archive/src/google/protobuf/stubs/int128.cc'

需要删除bazel cache, 重新编译

rm -rf /home/gezp/.cache/bazel 

4.3 找不到libcudnn.so

会提示以下错误,

 /home/xxh/tensorflow/tensorflow/contrib/boosted_trees/BUILD:559:1: Linking of rule '//tensorflow/contrib/boosted_trees:gen_gen_stats_accumulator_ops_py_wrap_py_wrappers_cc' failed (Exit 1)/usr/bin/ld: warning: libcudnn.so.7, needed by bazel-out/host/bin/_solib_local
 /_U_S_Stensorflow_Scontrib_Sboosted_Utrees_Cgen_Ugen_Ustats_Uaccumulator_Uops_Upy_Uwrap_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)

或者是提示

bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]' 

输入以下命令即可解决:

sudo echo "/usr/local/cuda/lib64" > /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

4.4 网络问题, 一些package not found

no such package '@curl
no such package '@grpc
no such package '@org_python_pypi_backports_weakref
no such package '@aws
no such package '@termcolor_archive
no such package '@boringssl
no such package '@icu
解决措施, 搭建本地服务器, 手动下载所需要的文件, 放在本地, 无需联网下载.
安装服务器

sudo apt-get install nginx-light
sudo service nginx start

之后打开浏览器输入http://127.0.0.1看到Welcome to nginx!之类的内容就说明安装成功了。
把手动下载下来的文件放到/var/www/html/路径下.

cp *.tar.gz /var/www/html/

修改模块, 模块在如下路径的文件中
tensorflow-r.13/tensorflow/workspace.bzl
以grpc为例, 找到以下代码, 将本地路径加到url第一行

   # WARNING: make sure ncteisen@ and vpai@ are cc-ed on any CL to change the below rule
   tf_http_archive(
       name = "grpc",
       sha256 = "1aa84387232dda273ea8fdfe722622084f72c16f7b84bfc519ac7759b71cdc91",
       strip_prefix = "grpc-69b6c047bc767b4d80e7af4d00ccb7c45b683dae",
       system_build_file = clean_dep("//third_party/systemlibs:grpc.BUILD"),
       urls = [
           "http://127.0.0.1/grpc-69b6c047bc767b4d80e7af4d00ccb7c45b683dae.tar.gz",
           "https://mirror.bazel.build/github.com/grpc/grpc/archive/69b6c047bc767b4d80e7af4d00ccb7c45b683dae.tar.gz",
           "https://github.com/grpc/grpc/archive/69b6c047bc767b4d80e7af4d00ccb7c45b683dae.tar.gz",
       ],
   )

icu和aws模块分别在
tensorflow-r.13/third_party/icu/workspace.bzl
tensorflow-r.13/third_party/aws/workspace.bzl

4.4 python ModuleNotFoundError

配置tensorflow ./configure的时候要注意, python路径选错.

# python 路径
/usr/bin/python
#python 模块路径
/usr/local/lib/python3.6/dist-packages

4.5 numpy问题

会提示以下错误:

compilation of rule '//tensorflow/python:bfloat16_lib' failed 

解决措施:
downgrade numpy to 1.18.5

4.6 编译GPU版本

tensorflow>=1.13不支持CUDA<=9.0,且明确提示要求GPU的compute capacity[8]>=3.5
tensorflow1.12可以支持CUDA9.0,并且没有提示compute capacity[8]要求

参考

[1] https://blog.csdn.net/qq_17130909/article/details/78637329
[2] https://www.jianshu.com/p/d92913173d5b
[3] https://blog.csdn.net/surtol/article/details/97638399#0_1
[4] https://blog.csdn.net/qq_26535271/article/details/84930868#commentBox
[5] https://blog.csdn.net/xiaolt90/article/details/104971920
[6] https://blog.csdn.net/qq_26535271/article/details/83031412#commentBox
[7] http://blog.leanote.com/post/mrwaterzhou/0417e050f84c
[8] https://github.com/tensorflow/tensorflow/issues/15889

你可能感兴趣的:(#bazel,安装)