参考链接:https://blog.csdn.net/u012440550/article/details/113361176?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&dist_request_id=&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control
https://zhuanlan.zhihu.com/p/259789357
编译好的动态库下载链接:
CPU版:https://download.csdn.net/download/weixin_43140187/15745733
GPU版:https://download.csdn.net/download/weixin_43140187/15745707
环境准备
在8个并行任务下(默认并行数为CPU线程数),应有不小于10G的内存,否则会产生编译器堆空间不足的错误。
首先Python需要安装一些包:six、numpy、wheel、setuptools、keras_applications和keras_preprocessing,使用管理员权限打开命令提示符:
pip install six numpy wheel setuptools
pip install keras_applications --no-deps
pip install keras_preprocessing --no-deps
注意,Python路径中不能出现空格,即Windows下默认安装路径C:\Program Files\Python39会在编译时报错,因此如果装到了这个路径,需要在一个没有空格的目录下创建一个链接(不是快捷方式),用mklink命令。
(本人采用的Anaconda3自带的python环境)
这里选的CUDA 11.1,CUDA官网下载安装,没什么好说的。
然后是Bazel,bazel很简单,就一个exe,需要设置环境变量给到Path下,我偷懒直接放到CUDA的bin目录下。我选的版本是3.7.2。
再安装MSYS2,同样需要给msys64\usr\bin目录设置环境变量。
官方教程少提了一个zip包,因此安装命令如下:
pacman -S git patch unzip zip
然后是VS,下载VS安装器,为避免麻烦,装到C盘默认路径(这次我没有尝试非C盘路径,不知道找不到编译器的bug还在不在)。如果非VS用户,只需安装除必选组件外的MSVC v142 - VS 2019 C++ x64/x86生成工具(随便一个,我选的最新版本)和Windows 10 SDK(同样随便,我选的最新的)。
(注意如果编译CPU版, CUDA那一步可以选择N)
D:\tf2\tensorflow-2.4.1>python configure.py
You have bazel 3.7.2 installed.
Please specify the location of python. [Default is C:\Users\XJWT\anaconda3\python.exe]:
Found possible Python library paths:
C:\Users\XJWT\anaconda3\lib\site-packages
Please input the desired Python library path to use. Default is [C:\Users\XJWT\anaconda3\lib\site-packages]
Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Found CUDA 11.1 in:
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/lib/x64
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/include
Found cuDNN 8 in:
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/lib/x64
C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/include
Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.5,7.5
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]:
Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]: Y
Eigen strong inline overridden.
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=mkl_aarch64 # Build with oneDNN support for Aarch64.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
更改编译输出路径:
知道源码中的.bazelrc文件,在最后的末尾加上以下代码:
try-import %workspace%/.bazelrc.user
startup --output_user_root=D:/tf2/out
之后开始执行编译代码:
GPU编译指令:
Dll:bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:tensorflow_cc.dll
Lib:bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:tensorflow_cc_dll_import_lib
Include:bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:install_headers
CPU编译指令:
bazel build //tensorflow:tensorflow_cc.dll
bazel build //tensorflow:tensorflow_cc_dll_import_lib
bazel build //tensorflow:install_headers
报错解决方案:
从2.4开始TensorFlow将MKL用到的OpenMP从直接下载二进制可执行文件变为了从LLVM项目下载开源代码并编译,带来了一系列问题,这里建议官网下载安装程序(Pre-Built Binaries)安装LLVM(记住安装中写入系统环境变量),作用有二:
(1).用现成的libiomp5md.lib和libiomp5md.dll取代编译过程,MSVC编译LLVM的OpenMP会出错。
(2).将LLVM用到的DLL(在LLVM安装目录\bin下)复制到msys安装目录\usr\bin下,否则在后续编译步骤中使用msys里的bash时没有复制系统环境变量的Path会报错(比如编译Lite相关内容时会报找不到api-ms-win-crt-locale-l1-1-0.dll的错)。
将libiomp5md.lib放到third_party\mkl目录下,并在third_party\mkl\mkl.BUILD的第75行后插入:
cc_import(
name = "iomp5",
interface_library = "libiomp5md.lib",
system_provided = 1,
)
然后将下方Windows编译配置修改为:
cc_library(
name = "mkl_libs_windows",
deps = [
"iomp5"
],
visibility = ["//visibility:public"],
)
将libiomp5md.dll放到系统环境变量里。
将third_party\llvm_openmp\BUILD第74行修改为0,取消强制使用MSVC:
omp_vars_win = {
"MSVC": 0,
}
此外,最后链接时找不到DLL,需要将LLVM的libiomp5md.dll放到python.exe同目录下
.编译完成后,随便写个简单的代码跑一跑,就会报缺少符号的错误,旧版本的tf有人做了补丁添加符号,新版本需在需要的地方加 TF_EXPORT ,根据报错的信息添加。
如果只是简单跑pb图的推断,可以按上文链接里的改,下图是具体改法,如果需要其他的,需要根据具体代码和报错进行修改。
在tensorflow-master\tensorflow\core\public\session.h里
这几个地方加 TF_EXPORT,
并且加 #include "tensorflow/core/platform/macros.h"
在 /tensorflow/core/public/session_options.h 里添加