Tensorflow(libtensorflow) 1.15.5编译小记(一)CentOS 7 x86环境编译、手工修复漏洞

缘起

公司产品需要在某客户落地,产品需要进行安全漏洞扫描,然而其中使用的libtensorflow JNI存在诸多自身和第三方包的漏洞,需要升级。然而现在并没有合适的bin文件可以使用。

第一部分:编译libtensoflow 1.15.5

由于 Tensorflow 2.x 和 Tensorflow 1.x API差异较大,通常大家都把它俩当作两个平台。由于公司目前使用的静态图模型仍然基于 Tensorflow 1.x,因此升级只能在 Tensorflow 1.x 的最新版本上操作。考虑到公司生产环境基于centos 7系列,所以构建环境也需要尽可能接近。

基础镜像

编译Tensorflow的工作在docker中进行,为此需要构建一个docker镜像。
首先,编译Tensorflow需要Python,因此这里使用了我自己构建的基础centos7-python37 docker镜像。对应的dockerfile如下:

# Base Image
FROM nvidia/cuda:10.2-cudnn7-devel-centos7

# python前置库
RUN yum -y install gcc make wget curl bzip2-devel expat-devel libffi-devel gdbm-devel xz-devel ncurses-devel readline-devel libdbi-devel sqlite-devel openssl-devel tk-devel uuid-devel xz zlib-devel \
    && yum -y clean all

# 编译安装python3.7
COPY ./Python-3.7.12.tar.gz /root/build_temp/
RUN cd /root/build_temp \
    && tar -xvf Python-3.7.12.tar.gz \
    && cd Python-3.7.12 \
    && ./configure --prefix=/usr/local/ \
        --enable-loadable-sqlite-extensions \
        --enable-option-checking=fatal \
        --enable-shared \
        --with-system-expat \
        --with-system-ffi \
        --without-ensurepip \
    && make -j 8 \
        LDFLAGS="-Wl,--strip-all" \
        PROFILE_TASK='-m test.regrtest \
            test_array \
            test_base64 \
            test_binascii \
            test_binhex \
            test_binop \
            test_bytes \
            test_c_locale_coercion \
            test_class \
            test_cmath \
            test_codecs \
            test_compile \
            test_complex \
            test_csv \
            test_decimal \
            test_dict \
            test_float \
            test_fstring \
            test_hashlib \
            test_io \
            test_iter \
            test_json \
            test_long \
            test_math \
            test_memoryview \
            test_pickle \
            test_re \
            test_set \
            test_slice \
            test_struct \
            test_threading \
            test_time \
            test_traceback \
            test_unicode' \
    && make install \
    && ln -s /usr/local/bin/python3.7 /usr/local/bin/pyen \
    && echo "/usr/local/lib/" > /etc/ld.so.conf.d/pyen.conf \
    && ldconfig \
    && cd / && rm -rf /root/build_temp/*
ENV PYTHONIOENCODING=utf-8
# 安装pip
RUN cd /root/build_temp \
    && wget -O get-pip.py https://bootstrap.pypa.io/get-pip.py \
    && export PYTHONDONTWRITEBYTECODE=1 \
    && python3 get-pip.py install --disable-pip-version-check --no-cache-dir --no-compile "pip==22.1" \
         -i https://pypi.tuna.tsinghua.edu.cn/simple \
    && cd / && rm -rf /root/build_temp && pip3 --version

接下来需要安装Tensorflow编译过程所需的其他基础工具,bazel需要JDK 8,而编译JNI必须有完整的JDK,所以java-1.8.0-openjdk-devel必不可少。

# 安装 JDK1.8
RUN yum -y install zip unzip which swig java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64 \
    && yum -y clean all

然后是git,但是注意,centos 7 官方 yum 库中的 git 由于版本太旧,编译途中就会出问题。因此需要从endpoint-dev仓库获取git 2.x版本。

# 安装 git
RUN cd /root \
    && wget https://packages.endpointdev.com/endpoint-rpmsign-7.pub \
    && rpm --import endpoint-rpmsign-7.pub \
    && yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo.x86_64.rpm \
    && yum -y install git \
    && yum -y clean all \
    && rm -f endpoint-rpmsign-7.pub

最后是安装编译用到的python库:

RUN pip3 --no-cache-dir install numpy==1.18 keras_preprocessing --no-deps -i https://mirrors.aliyun.com/pypi/simple

这样最基础的镜像就构建完毕了。
这里有人会问了,不是还有bazel需要装么?考虑到这个镜像需要支持编译多个不同版本的Tensorflow,这里暂时只安装这么多。

构建Tensoeflow 1.15.5

接下来从这个镜像创建容器,注意一定要选择一个干净的DNS,否则连接github超时,会非常麻烦。
接下来的操作,按照官方教程tensorflow/java/README.md进行。
除了官方教程,还可以参考官方的dockerfile,但是官方1.12版本之后就不打Tag了,所以只能看看ci_build 下面的dockerfile,或者DockerHub的tensorflow/tensorflow:1.12.0-devel-gpu-py3版本。主要有以下几个步骤:

  1. 下载bazel并安装:

    Tensoeflow 1.15 需要 bazel 0.24.1 以上,这里用 0.26.1:

mkdir /bazel \
    && cd /bazel \
    && wget https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-installer-linux-x86_64.sh \
    && chmod +x bazel-*.sh \
    && bash bazel-0.26.1-installer-linux-x86_64.sh \
    && cd / \
    && rm -f /bazel/bazel-0.26.1-installer-linux-x86_64.sh
  1. 从git下载Tensorflow源码:
git clone --branch=r1.15 --depth=1 https://github.com/tensorflow/tensorflow.git .

有条件的可以为git配置代理加快速度。git直接支持socks5代理。

git config --global http.proxy "socks5://地址:端口"
git config --global https.proxy "socks5://地址:端口"
  1. 配置构建
    构建完整的Tensorflow并构建python包需要gcc 7.3以上,主要是因为2.0引入的MLIR等模块需要正式版C++11和C++14的支持。但是,C版本libtensorflow.so和java版本libtensorflow_jni.so只需要gcc 4.8.5就可以
    执行./configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command “bazel shutdown”.
You have bazel 0.26.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/local/bin/python3

Found possible Python library paths:
/usr/local/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.7/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: N
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: N
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: N
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: N
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native -Wno-sign-compare]:
-march=core2 -msse3

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding “–config=<>” to your build command. See .bazelrc for more details.
–config=mkl # Build with MKL support.
–config=monolithic # Config for mostly static monolithic build.
–config=gdr # Build with GDR support.
–config=verbs # Build with libverbs support.
–config=ngraph # Build with Intel nGraph support.
–config=numa # Build with NUMA support.
–config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
–config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
–config=noaws # Disable AWS S3 filesystem support.
–config=nogcp # Disable GCP support.
–config=nohdfs # Disable HDFS support.
–config=noignite # Disable Apache Ignite support.
–config=nokafka # Disable Apache Kafka support.
–config=nonccl # Disable NVIDIA NCCL support.
Configuration finished

看dockerfile,脚本的各个配置项也可以用环境变量表示。如:

export TF_NEED_CUDA=1 
export TF_NEED_TENSORRT=0
export TF_CUDA_COMPUTE_CAPABILITIES=6.1,7.5
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_ROCM=0

配置中有一步“Please specify optimization flags to use during compilation”。需要指定编译器优化选项。
gcc4.8.5支持的-march=优化选项和新版本不同,一些新架构没有使用架构名,对应关系如下:
core2 … SSE2 SSE3
corei7-avx … SSE2 SSE3 SSE4.1 SSE4.2 AVX
core-avx2 … SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA

为了兼容OpenStack创建的虚拟机(虚拟CPU只有SSE3),所以上述配置里使用了-march=core2选项。

最后,使用如下的命令,构建libtensorflow( java / go / C++)。由于我们的业务不包括aws S3,Google Cloud Platform,也只在CPU上跑,所以用--config=noaws --config=nogcp关掉了无关选项。
Java

bazel build -c opt --config=opt --config=noaws --config=nogcp --config=nonccl //tensorflow/java:tensorflow //tensorflow/java:libtensorflow_jni

C++

bazel build -c opt --config=opt --config=noaws --config=nogcp --config=nonccl //tensorflow:tensorflow_cc

go / c

bazel build -c opt --config=opt --config=noaws --config=nogcp --config=nonccl //tensorflow:libtensorflow.so
  • 编译过程中,可以使用--local_ram_resources=8192 --jobs=8 之类的参数,限制bazel编译过程的资源占用。

  • 编译安装tensorflow时遇到错误:

    fatbinary fatal : Unknown option '-bin2c-path'
    ....
    FAILED: Build did NOT complete successfully...
    

    解决方法:

    删除tensorflow源码中,third_party/nccl/build_defs.bzl.tpl 编译脚本的 "--bin2c-path=%s" % bin2c.dirname,一行。

第二部分:更新第三方依赖项,手工修复漏洞

根据某安全引擎的扫描结果,即使基于目前1.15分支最新代码(1.15.5)编译的tensorflow中,第三方库zlib,libjpeg-turbo和ICU仍然存在漏洞,而google已经停止了1.x分支的维护,所以需要手动升级。由于所有的依赖项及构建都交由bazel托管,原有包中的configure等等都是无效的。因此需要在bazel脚本里面逐一配置。

升级zlib

zlib 1.2.11及之前版本都存在较高危险的漏洞 CVE-2018-25032,为此必须升级到1.2.12。

zlib 1.2.11在tensorflow/workspace.bzl中,修改 tensorflow/workspace.bzl :

    tf_http_archive(
        name = "zlib_archive",
        build_file = clean_dep("//third_party:zlib.BUILD"),
        sha256 = "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9",
        strip_prefix = "zlib-1.2.12",
        system_build_file = clean_dep("//third_party/systemlibs:zlib.BUILD"),
        urls = [
            "https://storage.googleapis.com/mirror.tensorflow.org/zlib.net/zlib-1.2.12.tar.gz",
            "https://zlib.net/zlib-1.2.12.tar.gz",
        ],
    )

但是,直接修改后编译过程中会报错误 ZLIB_VERNUM != PNG_ZLIB_VERNUM。 因此需要对应修改libpng 中的 zlib 版本。
修改third_party/png.BUILD
zlib的版本号是用一个十六进制数表示的。0-4位、4-8位,8-12位分别为版本号”1“”2“”11“,所以将

cmd = "sed -e 's/PNG_ZLIB_VERNUM 0/PNG_ZLIB_VERNUM 0x12b0/' $< >$@",

改为

cmd = "sed -e 's/PNG_ZLIB_VERNUM 0/PNG_ZLIB_VERNUM 0x12c0/' $< >$@",

升级libjpeg-turbo

修改third_party/jpeg/workspace.bzl,由于原来使用的是2.0.0版,升级得保持前两位不变。为了修复CVE-2020-17541,最少需要升级到2.0.4。

def repo():
    third_party_http_archive(
        name = "jpeg",
        urls = [
            "https://storage.googleapis.com/mirror.tensorflow.org/github.com/libjpeg-turbo/libjpeg-turbo/archive/2.0.4.tar.gz",
            "https://github.com/libjpeg-turbo/libjpeg-turbo/archive/2.0.4.tar.gz",
        ],
        sha256 = "7777c3c19762940cff42b3ba4d7cd5c52d1671b39a79532050c85efb99079064",
        strip_prefix = "libjpeg-turbo-2.0.4",
        build_file = "//third_party/jpeg:BUILD.bazel",
        system_build_file = "//third_party/jpeg:BUILD.system",
    )

升级ICU

icu(International Components for Unicode)66.1版本及之前都存在漏洞CVE-2020-21913,Tensorflow官方最新修复(2.5.3 2.6.3 )升级到了69.1版本,因此跟随官方版本。
修改third_party/icu/workspace.bzl

def repo():
    third_party_http_archive(
        name = "icu",
        strip_prefix = "icu-release-69-1",
        sha256 = "3144e17a612dda145aa0e4acb3caa27a5dae4e26edced64bc351c43d5004af53",
        urls = [
            "https://storage.googleapis.com/mirror.tensorflow.org/github.com/unicode-org/icu/archive/release-69-1.zip",
            "https://github.com/unicode-org/icu/archive/release-69-1.zip",
        ],
        build_file = "//third_party/icu:BUILD.bazel",
        system_build_file = "//third_party/icu:BUILD.system",
        patch_file = clean_dep("//third_party/icu:udata.patch"),
    )

但是,由于新版本代码里面用到了std::max_align_t,gcc4.8.5不支持,编译会报'max_align_t' is not a member of 'std'的错误。因此还需要把 std::max_align_t 修改为 max_align_t ,好在只有三个文件需要修改。

将以下内容加入到2.5.3版本的third_party\icu\udata.patch中。

diff -ru a/icu4c/source/common/utext.cpp b/icu4c/source/common/utext.cpp
--- a/icu4c/source/common/utext.cpp	2021-04-07 11:47:42.000000000 +0800
+++ b/icu4c/source/common/utext.cpp	2020-07-14 23:49:37.836668741 +0000
@@ -563,14 +563,14 @@
 
 //
 //  Extended form of a UText.  The purpose is to aid in computing the total size required
 //    when a provider asks for a UText to be allocated with extra storage.
 
 struct ExtendedUText {
     UText               ut;
-    std::max_align_t    extension;
+    max_align_t    extension;
 };
 
 static const UText emptyText = UTEXT_INITIALIZER;
 
 U_CAPI UText * U_EXPORT2
 utext_setup(UText *ut, int32_t extraSpace, UErrorCode *status) {
@@ -579,13 +579,13 @@
     }
 
     if (ut == NULL) {
         // We need to heap-allocate storage for the new UText
         int32_t spaceRequired = sizeof(UText);
         if (extraSpace > 0) {
-            spaceRequired = sizeof(ExtendedUText) + extraSpace - sizeof(std::max_align_t);
+            spaceRequired = sizeof(ExtendedUText) + extraSpace - sizeof(max_align_t);
         }
         ut = (UText *)uprv_malloc(spaceRequired);
         if (ut == NULL) {
             *status = U_MEMORY_ALLOCATION_ERROR;
             return NULL;
         } else {
diff -ru a/icu4c/source/common/uarrsort.cpp b/icu4c/source/common/uarrsort.cpp
--- a/icu4c/source/common/uarrsort.cpp	2021-04-07 11:47:42.000000000 +0800
+++ b/icu4c/source/common/uarrsort.cpp	2020-07-14 23:49:37.836668741 +0000
@@ -34,13 +34,13 @@
      */
     MIN_QSORT=9,
     STACK_ITEM_SIZE=200
 };
 
 static constexpr int32_t sizeInMaxAlignTs(int32_t sizeInBytes) {
-    return (sizeInBytes + sizeof(std::max_align_t) - 1) / sizeof(std::max_align_t);
+    return (sizeInBytes + sizeof(max_align_t) - 1) / sizeof(max_align_t);
 }
 
 /* UComparator convenience implementations ---------------------------------- */
 
 U_CAPI int32_t U_EXPORT2
 uprv_uint16Comparator(const void *context, const void *left, const void *right) {
@@ -138,13 +138,13 @@
 }
 
 static void
 insertionSort(char *array, int32_t length, int32_t itemSize,
               UComparator *cmp, const void *context, UErrorCode *pErrorCode) {
 
-    icu::MaybeStackArray v;
+    icu::MaybeStackArray v;
     if (sizeInMaxAlignTs(itemSize) > v.getCapacity() &&
             v.resize(sizeInMaxAlignTs(itemSize)) == nullptr) {
         *pErrorCode = U_MEMORY_ALLOCATION_ERROR;
         return;
     }
 
@@ -232,13 +232,13 @@
 }
 
 static void
 quickSort(char *array, int32_t length, int32_t itemSize,
             UComparator *cmp, const void *context, UErrorCode *pErrorCode) {
     /* allocate two intermediate item variables (x and w) */
-    icu::MaybeStackArray xw;
+    icu::MaybeStackArray xw;
     if(sizeInMaxAlignTs(itemSize)*2 > xw.getCapacity() &&
             xw.resize(sizeInMaxAlignTs(itemSize) * 2) == nullptr) {
         *pErrorCode=U_MEMORY_ALLOCATION_ERROR;
         return;
     }
 
diff -ru a/icu4c/source/tools/toolutil/toolutil.cpp b/icu4c/source/tools/toolutil/toolutil.cpp
--- a/icu4c/source/tools/toolutil/toolutil.cpp	2021-04-07 11:47:42.000000000 +0800
+++ b/icu4c/source/tools/toolutil/toolutil.cpp	2020-07-14 23:49:37.836668741 +0000
@@ -242,7 +242,7 @@
     char name[64];
     int32_t capacity, maxCapacity, size, idx;
     void *array;
-    alignas(std::max_align_t) char staticArray[1];
+    alignas(max_align_t) char staticArray[1];
 };
 
 U_CAPI UToolMemory * U_EXPORT2

然后用这个文件替换1.15.5版本的third_party\icu\udata.patch 即可。

这样,按照官方教程,就能拿到 libtensorflow_framework.so 和 libtensorflow_jni.so 咯。

你可能感兴趣的:(环境DIY,tensorflow,python,c++,docker)