Windows下编译带CUDA 11.2的TensorFlow 2.4.1

Windows下编译带CUDA 11.2的TensorFlow 2.4.1

参考链接:https://blog.csdn.net/u012440550/article/details/113361176?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&dist_request_id=&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control

https://zhuanlan.zhihu.com/p/259789357

编译好的动态库下载链接:

CPU版:https://download.csdn.net/download/weixin_43140187/15745733

GPU版:https://download.csdn.net/download/weixin_43140187/15745707

 

环境准备

1. 内存要求

在8个并行任务下(默认并行数为CPU线程数),应有不小于10G的内存,否则会产生编译器堆空间不足的错误。

2. Python & Pip

首先Python需要安装一些包:six、numpy、wheel、setuptools、keras_applications和keras_preprocessing,使用管理员权限打开命令提示符:

pip install six numpy wheel setuptools

pip install keras_applications --no-deps

pip install keras_preprocessing --no-deps

注意,Python路径中不能出现空格,即Windows下默认安装路径C:\Program Files\Python39会在编译时报错,因此如果装到了这个路径,需要在一个没有空格的目录下创建一个链接(不是快捷方式),用mklink命令。

(本人采用的Anaconda3自带的python环境)

3. CUDA

这里选的CUDA 11.1,CUDA官网下载安装,没什么好说的。

4. Bazel

然后是Bazel,bazel很简单,就一个exe,需要设置环境变量给到Path下,我偷懒直接放到CUDA的bin目录下。我选的版本是3.7.2。

5. MSYS2

再安装MSYS2,同样需要给msys64\usr\bin目录设置环境变量。

官方教程少提了一个zip包,因此安装命令如下:

pacman -S git patch unzip zip

6. Visual Studio 2019

 然后是VS,下载VS安装器,为避免麻烦,装到C盘默认路径(这次我没有尝试非C盘路径,不知道找不到编译器的bug还在不在)。如果非VS用户,只需安装除必选组件外的MSVC v142 - VS 2019 C++ x64/x86生成工具(随便一个,我选的最新版本)和Windows 10 SDK(同样随便,我选的最新的)。

 

编译

配置

下载TensorFlow 2.4.1源码,使用CMD进入解压后的根目录,执行:python configure.py

(注意如果编译CPU版, CUDA那一步可以选择N)

D:\tf2\tensorflow-2.4.1>python configure.py

You have bazel 3.7.2 installed.

Please specify the location of python. [Default is C:\Users\XJWT\anaconda3\python.exe]:

Found possible Python library paths:

  C:\Users\XJWT\anaconda3\lib\site-packages

Please input the desired Python library path to use.  Default is [C:\Users\XJWT\anaconda3\lib\site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: N

No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y

CUDA support will be enabled for TensorFlow.

Found CUDA 11.1 in:

    C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/lib/x64

    C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/include

Found cuDNN 8 in:

    C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/lib/x64

    C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/include

Please specify a list of comma-separated CUDA compute capabilities you want to build with.

You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.

Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.5,7.5

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]:

Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]: Y

Eigen strong inline overridden.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N

Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.

        --config=mkl            # Build with MKL support.

        --config=mkl_aarch64    # Build with oneDNN support for Aarch64.

        --config=monolithic     # Config for mostly static monolithic build.

        --config=ngraph         # Build with Intel nGraph support.

        --config=numa           # Build with NUMA support.

        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.

        --config=v2             # Build TensorFlow 2.x instead of 1.x.

Preconfigured Bazel build configs to DISABLE default on features:

        --config=noaws          # Disable AWS S3 filesystem support.

        --config=nogcp          # Disable GCP support.

        --config=nohdfs         # Disable HDFS support.

        --config=nonccl         # Disable NVIDIA NCCL support.

 

 

更改编译输出路径:

知道源码中的.bazelrc文件,在最后的末尾加上以下代码:

try-import %workspace%/.bazelrc.user

startup --output_user_root=D:/tf2/out

之后开始执行编译代码:

GPU编译指令:

Dll:bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:tensorflow_cc.dll

Lib:bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:tensorflow_cc_dll_import_lib

Include:bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:install_headers

CPU编译指令:

bazel build //tensorflow:tensorflow_cc.dll

bazel build //tensorflow:tensorflow_cc_dll_import_lib

bazel build //tensorflow:install_headers

报错解决方案:

1、启用MKL时报错解决方案

从2.4开始TensorFlow将MKL用到的OpenMP从直接下载二进制可执行文件变为了从LLVM项目下载开源代码并编译,带来了一系列问题,这里建议官网下载安装程序(Pre-Built Binaries)安装LLVM(记住安装中写入系统环境变量),作用有二:

(1).用现成的libiomp5md.lib和libiomp5md.dll取代编译过程,MSVC编译LLVM的OpenMP会出错。

(2).将LLVM用到的DLL(在LLVM安装目录\bin下)复制到msys安装目录\usr\bin下,否则在后续编译步骤中使用msys里的bash时没有复制系统环境变量的Path会报错(比如编译Lite相关内容时会报找不到api-ms-win-crt-locale-l1-1-0.dll的错)。

将libiomp5md.lib放到third_party\mkl目录下,并在third_party\mkl\mkl.BUILD的第75行后插入:

cc_import(

name = "iomp5",

interface_library = "libiomp5md.lib",

system_provided = 1,

)

然后将下方Windows编译配置修改为:

cc_library(

name = "mkl_libs_windows",

deps = [

"iomp5"

],

visibility = ["//visibility:public"]

)

将libiomp5md.dll放到系统环境变量里。

将third_party\llvm_openmp\BUILD第74行修改为0,取消强制使用MSVC:
omp_vars_win = {

"MSVC": 0,

}

 

此外,最后链接时找不到DLL,需要将LLVM的libiomp5md.dll放到python.exe同目录下

 

2、调用不到tensorflow函数错误

.编译完成后,随便写个简单的代码跑一跑,就会报缺少符号的错误,旧版本的tf有人做了补丁添加符号,新版本需在需要的地方加 TF_EXPORT ,根据报错的信息添加。

如果只是简单跑pb图的推断,可以按上文链接里的改,下图是具体改法,如果需要其他的,需要根据具体代码和报错进行修改。

在tensorflow-master\tensorflow\core\public\session.h里

Windows下编译带CUDA 11.2的TensorFlow 2.4.1_第1张图片

 

 

 

 

这几个地方加 TF_EXPORT,

并且加 #include "tensorflow/core/platform/macros.h"

在 /tensorflow/core/public/session_options.h 里添加

Windows下编译带CUDA 11.2的TensorFlow 2.4.1_第2张图片

 

 

你可能感兴趣的:(动态库编译,tensorflow编译,cuda加速,深度学习,tensorflow,windows)