Windows下编译带CUDA 11.3的TensorFlow 2.5.0(Python3.9.5,cuDNN 8.2.0,兼容性3.5 - 8.6,附编译结果下载)

基本参照我的这篇文章:《Windows下编译带CUDA 11.2的TensorFlow 2.4.1(Python3.9.1,cuDNN 8.1.0,兼容性3.5 - 8.6,附编译结果下载)》,有些地方有所改动。

环境准备

1. 内存要求

在8个并行任务下(默认并行数为CPU线程数),应有不小于10G的内存,否则会产生编译器堆空间不足的错误。

2. Python & Pip

首先Python需要安装一些包:six、numpy、wheel、setuptools、keras_applications和keras_preprocessing,使用管理员权限打开命令提示符:

pip install six numpy wheel setuptools
pip install Keras_applications Keras_preprocessing --no-deps

 注意,Python路径中不能出现空格,即Windows下默认安装路径C:\Program Files\Python39会在编译时报错,因此如果装到了这个路径,需要在一个没有空格的目录下创建一个链接(不是快捷方式),用mklink命令。

3. CUDA

这里选的CUDA 11.3,CUDA官网下载安装,没什么好说的。

4. Bazel

然后是Bazel,bazel很简单,就一个exe,需要设置环境变量给到Path下,我偷懒直接放到CUDA的bin目录下。我选的版本是3.7.2。

5. MSYS2

再安装MSYS2,同样需要给msys64\usr\bin目录设置环境变量。

装好后再安装一些包,用的是pacman,由于默认源极慢极慢,所以建议国内换源。

进到msys64\etc\pacman.d目录下,修改所有mirrolist,分别在各自所有Server行前加一行,把下面清华/中科大/北邮的随便选一个复制上来就行。

打开msys64命令行,官方教程少提了一个zip包,因此安装命令如下:

pacman -S git patch unzip zip

6. Visual Studio 2019

 然后是VS,下载VS安装器,为避免麻烦,装到C盘默认路径这次我没有尝试非C盘路径,不知道找不到编译器的bug还在不在)。如果非VS用户,只需安装除必选组件外的MSVC v142 - VS 2019 C++ x64/x86生成工具(随便一个,我选的最新版本)和Windows 10 SDK(同样随便,我选的最新的)。

编译

配置编译

下载TensorFlow 2.5.0源码,进入解压后的根目录,执行

D:\tensorflow-2.5.0>python configure.py
You have bazel 3.7.2 installed.
Please specify the location of python. [Default is C:\Python39\python.exe]:


Found possible Python library paths:
  C:\Python39\lib\site-packages
Please input the desired Python library path to use.  Default is [C:\Python39\lib\site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Found CUDA 11.3 in:
    D:/CUDA/lib/x64
    D:/CUDA/include
Found cuDNN 8 in:
    D:/CUDA/lib/x64
    D:/CUDA/include


Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.5,3.7,5.0,5.2,6.0,6.1,7.0,7.5,8.0,8.6


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]: /arch:AVX2


Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]:
Eigen strong inline overridden.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
        --config=monolithic     # Config for mostly static monolithic build.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=nonccl         # Disable NVIDIA NCCL support.

这版TensorFlow编译SM 3.5会报错,查了一下貌似TensorRT不支持这么低的版本。

代码修改

启用MKL时

从2.4开始TensorFlow将MKL用到的OpenMP从直接下载二进制可执行文件变为了从LLVM项目下载开源代码并编译,带来了一系列问题,这里建议官网下载安装程序(Pre-Built Binaries)安装LLVM(记住安装中写入系统环境变量),作用有二:

  1. 用现成的libiomp5md.lib和libiomp5md.dll取代编译过程,MSVC编译LLVM的OpenMP会出错。
  2. 将LLVM用到的DLL(在LLVM安装目录\bin下)复制到msys安装目录\usr\bin下,否则在后续编译步骤中使用msys里的bash时没有复制系统环境变量的Path会报错(比如编译Lite相关内容时会报找不到api-ms-win-crt-locale-l1-1-0.dll的错)。

将libiomp5md.lib放到third_party\mkl目录下,并在third_party\mkl\mkl.BUILD的第75行后插入:

cc_import(
    name = "iomp5",
    interface_library = "libiomp5md.lib",
    system_provided = 1,
)

 然后将下方Windows编译配置修改为:

cc_library(
    name = "mkl_libs_windows",
    deps = [
        "iomp5"
    ],
    visibility = ["//visibility:public"],
)

将libiomp5md.dll放到系统环境变量里。

将third_party\llvm_openmp\BUILD第74行修改为0,取消强制使用MSVC:

omp_vars_win = {
    "MSVC": 0,
}

没有这一步会报类似于如下错误(这段报错复制自2.4.1版,但2.5.0也会报同样的错误):

ERROR: D:/output_base/external/llvm_openmp/BUILD.bazel:176:10: C++ compilation of rule '@llvm_openmp//:libiomp5md.dll' failed (Exit 1): ml64.exe failed: error executing command
  cd D:/output_base/execroot/org_tensorflow
  SET CUDA_TOOLKIT_PATH=D:/CUDA
    SET INCLUDE=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\include;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt
    SET LIB=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\lib\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\lib\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\um\x64
    SET PATH=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\\Extensions\Microsoft\IntelliCode\CLI;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\VC\VCPackages;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\TestWindow;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\TeamFoundation\Team Explorer;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Team Tools\Performance Tools\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Team Tools\Performance Tools;C:\Program Files (x86)\Microsoft Visual Studio\Shared\Common\VSPerfCollectionTools\vs2019\\x64;C:\Program Files (x86)\Microsoft Visual Studio\Shared\Common\VSPerfCollectionTools\vs2019\;C:\Program Files (x86)\Windows Kits\10\bin\10.0.18362.0\x64;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\Tools\;;C:\Windows\system32;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\VC\Linux\bin\ConnectionManagerExe
    SET PWD=/proc/self/cwd
    SET PYTHON_BIN_PATH=C:/Python39/python.exe
    SET PYTHON_LIB_PATH=C:/Python39/lib/site-packages
    SET RUNFILES_MANIFEST_ONLY=1
    SET TEMP=D:/tmp
    SET TF2_BEHAVIOR=1
    SET TF_CONFIGURE_IOS=0
    SET TF_CUDA_COMPUTE_CAPABILITIES=3.5,3.7,5.0,5.2,6.0,6.1,7.0,7.5,8.0,8.6
    SET TF_NEED_CUDA=1
    SET TMP=D:/tmp
  C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.27.29110/bin/HostX64/x64/ml64.exe -B external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py /nologo /DCOMPILER_MSVC /DNOMINMAX /D_WIN32_WINNT=0x0600 /D_CRT_SECURE_NO_DEPRECATE /D_CRT_SECURE_NO_WARNINGS /D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /bigobj /Zm500 /J /Gy /GF /EHsc /wd4351 /wd4291 /wd4250 /wd4996 /Iexternal/llvm_openmp /Ibazel-out/x64_windows-opt/bin/external/llvm_openmp /Iexternal/bazel_tools /Ibazel-out/x64_windows-opt/bin/external/bazel_tools /Iexternal/llvm_openmp/runtime/src /Ibazel-out/x64_windows-opt/bin/external/llvm_openmp/runtime/src /Iexternal/llvm_openmp/include /Ibazel-out/x64_windows-opt/bin/external/llvm_openmp/include /showIncludes /MD /O2 /DNDEBUG /W0 /D_USE_MATH_DEFINES -DWIN32_LEAN_AND_MEAN -DNOGDI /experimental:preprocessor -DTHRUST_IGNORE_CUB_VERSION_CHECK /Domp_EXPORTS /D_M_AMD64 /DOMPT_SUPPORT=0 /D_WINDOWS /D_WINNT /D_USRDLL /Fobazel-out/x64_windows-opt/bin/external/llvm_openmp/_objs/libiomp5md.dll/z_Windows_NT-586_asm.obj /c bazel-out/x64_windows-opt/bin/external/llvm_openmp/z_Windows_NT-586_asm.S
Execution platform: @local_execution_config_platform//:platform
Microsoft (R) Macro Assembler (x64) Version 14.27.29111.0
Copyright (C) Microsoft Corporation.  All rights reserved.

MASM : warning A4018:invalid command-line option : -B
 Assembling: external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(1) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(2) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(3) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(4) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(5) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(6) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(7) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(8) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(9) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(10) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(11) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(12) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(13) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(14) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(15) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(16) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(17) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(18) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(19) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(20) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(21) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(22) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(23) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(24) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(25) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(26) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(27) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(28) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(29) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(30) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(31) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(32) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(33) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(34) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(35) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(36) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(37) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(38) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(39) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(40) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(41) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(42) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(43) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(44) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(45) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(46) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(47) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(48) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(49) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(50) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(51) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(52) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(53) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(54) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(55) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(56) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(57) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(58) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(59) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(60) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(61) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(62) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(63) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(64) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(65) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(66) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(67) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(68) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(69) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(70) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(71) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(72) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(73) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(74) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(75) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(76) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(77) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(78) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(79) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(80) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(81) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(82) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(83) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(84) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(85) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(86) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(87) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(88) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(89) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(90) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(91) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(92) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(93) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(94) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(95) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(96) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(97) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(98) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(99) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(100) : error A2044:invalid character in file
external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py(101) : fatal error A1012:error count exceeds 100; stopping assembly
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 229.166s, Critical Path: 30.31s
INFO: 966 processes: 239 internal, 727 local.
FAILED: Build did NOT complete successfully

此外,最后链接时找不到DLL,需要将LLVM的libiomp5md.dll放到python.exe同目录下,否则会出现以下错误:

ERROR: D:/tensorflow-2.4.1/tensorflow/python/keras/api/BUILD:111:19: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen failed (Exit 1): bash.exe failed: error executing command
  cd D:/output_base/execroot/org_tensorflow
  SET CUDA_TOOLKIT_PATH=D:/CUDA
    SET PATH=C:\msys64\usr\bin;C:\msys64\bin;C:\Windows;C:\Windows\System32;C:\Windows\System32\WindowsPowerShell\v1.0;C:\NASM;D:\;D:\CUDA\bin;D:\CUDA\libnvvp;C:\Python39\Scripts\;C:\Python39\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.1.2\;C:\Program Files\Microsoft VS Code\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.2.0\;C:\Program Files\CMake\bin;C:\LLVM\bin;C:\Git\cmd;C:\msys64\usr\bin;C:\Users\用户名\AppData\Local\Microsoft\WindowsApps;C:\Users\用户名\.dotnet\tools
    SET PYTHON_BIN_PATH=C:/Python39/python.exe
    SET PYTHON_LIB_PATH=C:/Python39/lib/site-packages
    SET RUNFILES_MANIFEST_ONLY=1
    SET TF2_BEHAVIOR=1
    SET TF_CONFIGURE_IOS=0
    SET TF_CUDA_COMPUTE_CAPABILITIES=3.5,3.7,5.0,5.2,6.0,6.1,7.0,7.5,8.0,8.6
    SET TF_NEED_CUDA=1
  C:/msys64/usr/bin/bash.exe bazel-out/x64_windows-opt/bin/tensorflow/python/keras/api/keras_python_api_gen.genrule_script.sh
Execution platform: @local_execution_config_platform//:platform
Traceback (most recent call last):
  File "\\?\D:\tmp\Bazel.runfiles_q5x32q4w\runfiles\org_tensorflow\tensorflow\python\pywrap_tensorflow.py", line 64, in 
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模块。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "\\?\D:\tmp\Bazel.runfiles_q5x32q4w\runfiles\org_tensorflow\tensorflow\python\tools\api\generator\create_python_api.py", line 26, in 
    from tensorflow.python.tools.api.generator import doc_srcs
  File "\\?\D:\tmp\Bazel.runfiles_q5x32q4w\runfiles\org_tensorflow\tensorflow\python\__init__.py", line 39, in 
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
  File "\\?\D:\tmp\Bazel.runfiles_q5x32q4w\runfiles\org_tensorflow\tensorflow\python\pywrap_tensorflow.py", line 83, in 
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "\\?\D:\tmp\Bazel.runfiles_q5x32q4w\runfiles\org_tensorflow\tensorflow\python\pywrap_tensorflow.py", line 64, in 
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模块。


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: D:/tensorflow-2.4.1/tensorflow/tools/pip_package/BUILD:165:10 Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2 failed (Exit 1): bash.exe failed: error executing command
  cd D:/output_base/execroot/org_tensorflow
  SET CUDA_TOOLKIT_PATH=D:/CUDA
    SET PATH=C:\msys64\usr\bin;C:\msys64\bin;C:\Windows;C:\Windows\System32;C:\Windows\System32\WindowsPowerShell\v1.0;C:\NASM;D:\;D:\CUDA\bin;D:\CUDA\libnvvp;C:\Python39\Scripts\;C:\Python39\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.1.2\;C:\Program Files\Microsoft VS Code\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.2.0\;C:\Program Files\CMake\bin;C:\LLVM\bin;C:\Git\cmd;C:\msys64\usr\bin;C:\Users\用户名\AppData\Local\Microsoft\WindowsApps;C:\Users\用户名\.dotnet\tools
    SET PYTHON_BIN_PATH=C:/Python39/python.exe
    SET PYTHON_LIB_PATH=C:/Python39/lib/site-packages
    SET RUNFILES_MANIFEST_ONLY=1
    SET TF2_BEHAVIOR=1
    SET TF_CONFIGURE_IOS=0
    SET TF_CUDA_COMPUTE_CAPABILITIES=3.5,3.7,5.0,5.2,6.0,6.1,7.0,7.5,8.0,8.6
    SET TF_NEED_CUDA=1
  C:/msys64/usr/bin/bash.exe bazel-out/x64_windows-opt/bin/tensorflow/python/keras/api/keras_python_api_gen_compat_v2.genrule_script.sh
Execution platform: @local_execution_config_platform//:platform
INFO: Elapsed time: 19969.544s, Critical Path: 4113.50s
INFO: 10487 processes: 127 internal, 10360 local.
FAILED: Build did NOT complete successfully

启用Release时

修改.bazelrc的第559行:

build:release_gpu_base --repo_env=TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,compute_80,compute_86"

也就是加了个“compute_86”。 

启动编译

如果需要代理(bazel需要下载大量直连非常慢的资源),在命令提示符中输入:

git config --global http.proxy http://ip:port
git config --global https.proxy https://ip:port
set http_proxy=http://ip:port
set https_proxy=https://ip:port

如果C盘空间不够,可以单独指定临时文件夹位置(实测一次编译下来,临时文件夹需要50GB左右的空间):

set TMP=D:/tmp
set TEMP=D:/tmp

构建pip包:

bazel --output_user_root=D:/output_user_root --output_base=D:/output_base build --config=mkl --config=numa --config=monolithic --define=no_tensorflow_py_deps=true --//tensorflow/core/kernels/mlir_generated:enable_gpu=false --experimental_strict_action_env=false --config=release_gpu_windows //tensorflow/tools/pip_package:build_pip_package

这一行编译,耗时极长。

  1. 下载的bazel会因为权限问题无法写入用户文件夹,指定另外的文件夹是比较方便的操作。
  2. 编译输出最好也指定另外的文件夹,一次编译下来会占用50GB左右的空间。
  3. 之前我说可以从GitHub提前下载bazel本体文件夹的压缩包再解压以免额外时间,即指定--distdir=D:/bazel,但现在发现可能有些包的校验码对不上,还是建议去掉这个选项在线下载。
  4. --config=mkl用于启用MKL(依赖libiomp5md.dll),可以不写以不启用MKL。
  5. --experimental_strict_action_env=false用于启用系统环境变量,以防链接时找不到DLL。
  6. --config=release_gpu_windows用于启用Release编译,之前我没启用文件体积很大,不知道启用了会不会影响debug,未测试。
  7. --//tensorflow/core/kernels/mlir_generated:enable_gpu=false用于关闭MLIR的GPU编译,否则编译会出错,类似于(关键词:xxx_kernel_generator_kernel.o failed (Exit -1073741515): tf_to_kernel.exe failed):
ERROR: D:/tensorflow-2.5.0/tensorflow/core/kernels/mlir_generated/BUILD:893:23: compile tensorflow/core/kernels/mlir_generated/neg_gpu_f16_f16_kernel_generator_kernel.o failed (Exit -1073741515): tf_to_kernel.exe failed: error executing command
  cd D:/output_base/execroot/org_tensorflow
bazel-out/x64_windows-opt/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel.exe --unroll_factors=4 --tile_sizes=256 --arch=compute_50,compute_52,compute_60,compute_61,compute_70,compute_75,compute_80,compute_86 --input=bazel-out/x64_windows-opt/bin/tensorflow/core/kernels/mlir_generated/neg_gpu_f16_f16.mlir --output=bazel-out/x64_windows-opt/bin/tensorflow/core/kernels/mlir_generated/neg_gpu_f16_f16_kernel_generator_kernel.o --enable_ftz=False --cpu_codegen=False
Execution platform: @local_execution_config_platform//:platform
Target //tensorflow/tools/pip_package:build_pip_package failed to build

然后构建包,会在源码根目录下创建tfpy文件夹,成功后whl包就会放到这个文件夹下,文件夹名字随意:

bazel-bin\tensorflow\tools\pip_package\build_pip_package.exe tfpy

构建C++库

执行:

bazel --output_user_root=D:/output_user_root --output_base=D:/output_base build --config=mkl --config=numa --config=monolithic --//tensorflow/core/kernels/mlir_generated:enable_gpu=false --experimental_strict_action_env=false --config=release_gpu_windows //tensorflow:tensorflow_cc.dll
bazel --output_user_root=D:/output_user_root --output_base=D:/output_base build --config=mkl --config=numa --config=monolithic --//tensorflow/core/kernels/mlir_generated:enable_gpu=false --experimental_strict_action_env=false --config=release_gpu_windows //tensorflow:tensorflow_cc_dll_import_lib

即在bazel-bin\tensorflow目录下得到tensorflow_cc.dll和tensorflow_cc.lib。

再执行:

bazel --output_user_root=D:/output_user_root --output_base=D:/output_base build --config=mkl --config=numa --config=monolithic --//tensorflow/core/kernels/mlir_generated:enable_gpu=false --experimental_strict_action_env=false --config=release_gpu_windows //tensorflow:install_headers

即在bazel-bin\tensorflow目录下得到include文件夹,里面是需要的头文件。

安装须知

编译得到whl包后,由于源码中依赖包版本限制得比较死,pip安装过程中会各种报错(尤其是grpcio,TensorFlow指定的版本无法在Windows下成功编译),因此需要先去掉依赖安装TensorFlow本体:

pip install tensorflow-2.5.0-cp39-cp39-win_amd64.whl --no-deps

然后再安装其依赖的包,如:

pip install absl-py astunparse flatbuffers google_pasta grpcio h5py keras_preprocessing numpy opt_einsum protobuf six termcolor typing_extensions wheel wrapt gast tensorboard tensorflow_estimator

如有缺失,后续导入的时候会有明确提示。

如果启用MKL,则需要另外将libomp.dll放到系统python.exe同目录下(即便是虚拟环境也要放到系统的Python目录下)。

结果获取(还没来得及测试,有问题请评论或私信)

pip安装包:tensorflow-2.5.0-cp39-cp39-win_amd64.whl(394.34MB)

pip安装包压缩版:tensorflow-2.5.0-cp39-cp39-win_amd64.rar(250.15MB)

C++库:tensorflow-cpp-2.5.0.rar(218.10MB)

pip安装包(MKL):tensorflow-mkl-2.5.0-cp39-cp39-win_amd64.whl(393.64MB)

pip安装包压缩版(MKL):tensorflow-mkl-2.5.0-cp39-cp39-win_amd64.rar(249.58MB)

C++库(MKL):tensorflow-cpp-mkl-2.5.0.rar(217.57MB)

libomp.dll:libomp-tensorflow-2.5.0.rar(378.75KB)

pip安装包的压缩版,解压后再用ZIP打包,改后缀为“whl”即可正常安装。

你可能感兴趣的:(与Windows死磕到底的日常,tensorflow,windows,cuda,gpu,mkl)