公司产品需要在某客户落地,产品需要进行安全漏洞扫描,然而其中使用的libtensorflow JNI存在诸多自身和第三方包的漏洞,需要升级。然而现在并没有合适的bin文件可以使用。
由于 Tensorflow 2.x 和 Tensorflow 1.x API差异较大,通常大家都把它俩当作两个平台。由于公司目前使用的静态图模型仍然基于 Tensorflow 1.x,因此升级只能在 Tensorflow 1.x 的最新版本上操作。考虑到公司生产环境基于centos 7系列,所以构建环境也需要尽可能接近。
编译Tensorflow的工作在docker中进行,为此需要构建一个docker镜像。
首先,编译Tensorflow需要Python,因此这里使用了我自己构建的基础centos7-python37 docker镜像。对应的dockerfile如下:
# Base Image
FROM nvidia/cuda:10.2-cudnn7-devel-centos7
# python前置库
RUN yum -y install gcc make wget curl bzip2-devel expat-devel libffi-devel gdbm-devel xz-devel ncurses-devel readline-devel libdbi-devel sqlite-devel openssl-devel tk-devel uuid-devel xz zlib-devel \
&& yum -y clean all
# 编译安装python3.7
COPY ./Python-3.7.12.tar.gz /root/build_temp/
RUN cd /root/build_temp \
&& tar -xvf Python-3.7.12.tar.gz \
&& cd Python-3.7.12 \
&& ./configure --prefix=/usr/local/ \
--enable-loadable-sqlite-extensions \
--enable-option-checking=fatal \
--enable-shared \
--with-system-expat \
--with-system-ffi \
--without-ensurepip \
&& make -j 8 \
LDFLAGS="-Wl,--strip-all" \
PROFILE_TASK='-m test.regrtest \
test_array \
test_base64 \
test_binascii \
test_binhex \
test_binop \
test_bytes \
test_c_locale_coercion \
test_class \
test_cmath \
test_codecs \
test_compile \
test_complex \
test_csv \
test_decimal \
test_dict \
test_float \
test_fstring \
test_hashlib \
test_io \
test_iter \
test_json \
test_long \
test_math \
test_memoryview \
test_pickle \
test_re \
test_set \
test_slice \
test_struct \
test_threading \
test_time \
test_traceback \
test_unicode' \
&& make install \
&& ln -s /usr/local/bin/python3.7 /usr/local/bin/pyen \
&& echo "/usr/local/lib/" > /etc/ld.so.conf.d/pyen.conf \
&& ldconfig \
&& cd / && rm -rf /root/build_temp/*
ENV PYTHONIOENCODING=utf-8
# 安装pip
RUN cd /root/build_temp \
&& wget -O get-pip.py https://bootstrap.pypa.io/get-pip.py \
&& export PYTHONDONTWRITEBYTECODE=1 \
&& python3 get-pip.py install --disable-pip-version-check --no-cache-dir --no-compile "pip==22.1" \
-i https://pypi.tuna.tsinghua.edu.cn/simple \
&& cd / && rm -rf /root/build_temp && pip3 --version
接下来需要安装Tensorflow编译过程所需的其他基础工具,bazel需要JDK 8,而编译JNI必须有完整的JDK,所以java-1.8.0-openjdk-devel必不可少。
# 安装 JDK1.8
RUN yum -y install zip unzip which swig java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64 \
&& yum -y clean all
然后是git,但是注意,centos 7 官方 yum 库中的 git 由于版本太旧,编译途中就会出问题。因此需要从endpoint-dev仓库获取git 2.x版本。
# 安装 git
RUN cd /root \
&& wget https://packages.endpointdev.com/endpoint-rpmsign-7.pub \
&& rpm --import endpoint-rpmsign-7.pub \
&& yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo.x86_64.rpm \
&& yum -y install git \
&& yum -y clean all \
&& rm -f endpoint-rpmsign-7.pub
最后是安装编译用到的python库:
RUN pip3 --no-cache-dir install numpy==1.18 keras_preprocessing --no-deps -i https://mirrors.aliyun.com/pypi/simple
这样最基础的镜像就构建完毕了。
这里有人会问了,不是还有bazel需要装么?考虑到这个镜像需要支持编译多个不同版本的Tensorflow,这里暂时只安装这么多。
接下来从这个镜像创建容器,注意一定要选择一个干净的DNS,否则连接github超时,会非常麻烦。
接下来的操作,按照官方教程tensorflow/java/README.md
进行。
除了官方教程,还可以参考官方的dockerfile,但是官方1.12版本之后就不打Tag了,所以只能看看ci_build
下面的dockerfile,或者DockerHub的tensorflow/tensorflow:1.12.0-devel-gpu-py3
版本。主要有以下几个步骤:
下载bazel并安装:
Tensoeflow 1.15 需要 bazel 0.24.1 以上,这里用 0.26.1:
mkdir /bazel \
&& cd /bazel \
&& wget https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-installer-linux-x86_64.sh \
&& chmod +x bazel-*.sh \
&& bash bazel-0.26.1-installer-linux-x86_64.sh \
&& cd / \
&& rm -f /bazel/bazel-0.26.1-installer-linux-x86_64.sh
git clone --branch=r1.15 --depth=1 https://github.com/tensorflow/tensorflow.git .
有条件的可以为git配置代理加快速度。git直接支持socks5代理。
git config --global http.proxy "socks5://地址:端口"
git config --global https.proxy "socks5://地址:端口"
./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command “bazel shutdown”.
You have bazel 0.26.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/local/bin/python3Found possible Python library paths:
/usr/local/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.7/site-packages]Do you wish to build TensorFlow with XLA JIT support? [Y/n]: N
No XLA JIT support will be enabled for TensorFlow.Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
No OpenCL SYCL support will be enabled for TensorFlow.Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.Do you wish to build TensorFlow with CUDA support? [y/N]: N
No CUDA support will be enabled for TensorFlow.Do you wish to download a fresh release of clang? (Experimental) [y/N]: N
Clang will not be downloaded.Do you wish to build TensorFlow with MPI support? [y/N]: N
No MPI support will be enabled for TensorFlow.Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native -Wno-sign-compare]:
-march=core2 -msse3Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.Preconfigured Bazel build configs. You can use any of the below by adding “–config=<>” to your build command. See .bazelrc for more details.
–config=mkl # Build with MKL support.
–config=monolithic # Config for mostly static monolithic build.
–config=gdr # Build with GDR support.
–config=verbs # Build with libverbs support.
–config=ngraph # Build with Intel nGraph support.
–config=numa # Build with NUMA support.
–config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
–config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
–config=noaws # Disable AWS S3 filesystem support.
–config=nogcp # Disable GCP support.
–config=nohdfs # Disable HDFS support.
–config=noignite # Disable Apache Ignite support.
–config=nokafka # Disable Apache Kafka support.
–config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
看dockerfile,脚本的各个配置项也可以用环境变量表示。如:
export TF_NEED_CUDA=1
export TF_NEED_TENSORRT=0
export TF_CUDA_COMPUTE_CAPABILITIES=6.1,7.5
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_ROCM=0
配置中有一步“Please specify optimization flags to use during compilation”。需要指定编译器优化选项。
gcc4.8.5支持的-march=
优化选项和新版本不同,一些新架构没有使用架构名,对应关系如下:
core2 … SSE2 SSE3
corei7-avx … SSE2 SSE3 SSE4.1 SSE4.2 AVX
core-avx2 … SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
为了兼容OpenStack创建的虚拟机(虚拟CPU只有SSE3),所以上述配置里使用了-march=core2
选项。
最后,使用如下的命令,构建libtensorflow( java / go / C++)。由于我们的业务不包括aws S3,Google Cloud Platform,也只在CPU上跑,所以用--config=noaws
--config=nogcp
关掉了无关选项。
Java
bazel build -c opt --config=opt --config=noaws --config=nogcp --config=nonccl //tensorflow/java:tensorflow //tensorflow/java:libtensorflow_jni
C++
bazel build -c opt --config=opt --config=noaws --config=nogcp --config=nonccl //tensorflow:tensorflow_cc
go / c
bazel build -c opt --config=opt --config=noaws --config=nogcp --config=nonccl //tensorflow:libtensorflow.so
编译过程中,可以使用--local_ram_resources=8192 --jobs=8
之类的参数,限制bazel编译过程的资源占用。
编译安装tensorflow时遇到错误:
fatbinary fatal : Unknown option '-bin2c-path'
....
FAILED: Build did NOT complete successfully...
解决方法:
删除tensorflow源码中,third_party/nccl/build_defs.bzl.tpl
编译脚本的 "--bin2c-path=%s" % bin2c.dirname,
一行。
根据某安全引擎的扫描结果,即使基于目前1.15分支最新代码(1.15.5)编译的tensorflow中,第三方库zlib,libjpeg-turbo和ICU仍然存在漏洞,而google已经停止了1.x分支的维护,所以需要手动升级。由于所有的依赖项及构建都交由bazel托管,原有包中的configure等等都是无效的。因此需要在bazel脚本里面逐一配置。
zlib 1.2.11及之前版本都存在较高危险的漏洞 CVE-2018-25032,为此必须升级到1.2.12。
zlib 1.2.11在tensorflow/workspace.bzl
中,修改 tensorflow/workspace.bzl :
tf_http_archive(
name = "zlib_archive",
build_file = clean_dep("//third_party:zlib.BUILD"),
sha256 = "91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9",
strip_prefix = "zlib-1.2.12",
system_build_file = clean_dep("//third_party/systemlibs:zlib.BUILD"),
urls = [
"https://storage.googleapis.com/mirror.tensorflow.org/zlib.net/zlib-1.2.12.tar.gz",
"https://zlib.net/zlib-1.2.12.tar.gz",
],
)
但是,直接修改后编译过程中会报错误 ZLIB_VERNUM != PNG_ZLIB_VERNUM
。 因此需要对应修改libpng 中的 zlib 版本。
修改third_party/png.BUILD
,
zlib的版本号是用一个十六进制数表示的。0-4位、4-8位,8-12位分别为版本号”1“”2“”11“,所以将
cmd = "sed -e 's/PNG_ZLIB_VERNUM 0/PNG_ZLIB_VERNUM 0x12b0/' $< >$@",
改为
cmd = "sed -e 's/PNG_ZLIB_VERNUM 0/PNG_ZLIB_VERNUM 0x12c0/' $< >$@",
修改third_party/jpeg/workspace.bzl
,由于原来使用的是2.0.0版,升级得保持前两位不变。为了修复CVE-2020-17541,最少需要升级到2.0.4。
def repo():
third_party_http_archive(
name = "jpeg",
urls = [
"https://storage.googleapis.com/mirror.tensorflow.org/github.com/libjpeg-turbo/libjpeg-turbo/archive/2.0.4.tar.gz",
"https://github.com/libjpeg-turbo/libjpeg-turbo/archive/2.0.4.tar.gz",
],
sha256 = "7777c3c19762940cff42b3ba4d7cd5c52d1671b39a79532050c85efb99079064",
strip_prefix = "libjpeg-turbo-2.0.4",
build_file = "//third_party/jpeg:BUILD.bazel",
system_build_file = "//third_party/jpeg:BUILD.system",
)
icu(International Components for Unicode)66.1版本及之前都存在漏洞CVE-2020-21913,Tensorflow官方最新修复(2.5.3 2.6.3 )升级到了69.1版本,因此跟随官方版本。
修改third_party/icu/workspace.bzl
:
def repo():
third_party_http_archive(
name = "icu",
strip_prefix = "icu-release-69-1",
sha256 = "3144e17a612dda145aa0e4acb3caa27a5dae4e26edced64bc351c43d5004af53",
urls = [
"https://storage.googleapis.com/mirror.tensorflow.org/github.com/unicode-org/icu/archive/release-69-1.zip",
"https://github.com/unicode-org/icu/archive/release-69-1.zip",
],
build_file = "//third_party/icu:BUILD.bazel",
system_build_file = "//third_party/icu:BUILD.system",
patch_file = clean_dep("//third_party/icu:udata.patch"),
)
但是,由于新版本代码里面用到了std::max_align_t,gcc4.8.5不支持,编译会报'max_align_t' is not a member of 'std'
的错误。因此还需要把 std::max_align_t 修改为 max_align_t ,好在只有三个文件需要修改。
将以下内容加入到2.5.3版本的third_party\icu\udata.patch
中。
diff -ru a/icu4c/source/common/utext.cpp b/icu4c/source/common/utext.cpp
--- a/icu4c/source/common/utext.cpp 2021-04-07 11:47:42.000000000 +0800
+++ b/icu4c/source/common/utext.cpp 2020-07-14 23:49:37.836668741 +0000
@@ -563,14 +563,14 @@
//
// Extended form of a UText. The purpose is to aid in computing the total size required
// when a provider asks for a UText to be allocated with extra storage.
struct ExtendedUText {
UText ut;
- std::max_align_t extension;
+ max_align_t extension;
};
static const UText emptyText = UTEXT_INITIALIZER;
U_CAPI UText * U_EXPORT2
utext_setup(UText *ut, int32_t extraSpace, UErrorCode *status) {
@@ -579,13 +579,13 @@
}
if (ut == NULL) {
// We need to heap-allocate storage for the new UText
int32_t spaceRequired = sizeof(UText);
if (extraSpace > 0) {
- spaceRequired = sizeof(ExtendedUText) + extraSpace - sizeof(std::max_align_t);
+ spaceRequired = sizeof(ExtendedUText) + extraSpace - sizeof(max_align_t);
}
ut = (UText *)uprv_malloc(spaceRequired);
if (ut == NULL) {
*status = U_MEMORY_ALLOCATION_ERROR;
return NULL;
} else {
diff -ru a/icu4c/source/common/uarrsort.cpp b/icu4c/source/common/uarrsort.cpp
--- a/icu4c/source/common/uarrsort.cpp 2021-04-07 11:47:42.000000000 +0800
+++ b/icu4c/source/common/uarrsort.cpp 2020-07-14 23:49:37.836668741 +0000
@@ -34,13 +34,13 @@
*/
MIN_QSORT=9,
STACK_ITEM_SIZE=200
};
static constexpr int32_t sizeInMaxAlignTs(int32_t sizeInBytes) {
- return (sizeInBytes + sizeof(std::max_align_t) - 1) / sizeof(std::max_align_t);
+ return (sizeInBytes + sizeof(max_align_t) - 1) / sizeof(max_align_t);
}
/* UComparator convenience implementations ---------------------------------- */
U_CAPI int32_t U_EXPORT2
uprv_uint16Comparator(const void *context, const void *left, const void *right) {
@@ -138,13 +138,13 @@
}
static void
insertionSort(char *array, int32_t length, int32_t itemSize,
UComparator *cmp, const void *context, UErrorCode *pErrorCode) {
- icu::MaybeStackArray v;
+ icu::MaybeStackArray v;
if (sizeInMaxAlignTs(itemSize) > v.getCapacity() &&
v.resize(sizeInMaxAlignTs(itemSize)) == nullptr) {
*pErrorCode = U_MEMORY_ALLOCATION_ERROR;
return;
}
@@ -232,13 +232,13 @@
}
static void
quickSort(char *array, int32_t length, int32_t itemSize,
UComparator *cmp, const void *context, UErrorCode *pErrorCode) {
/* allocate two intermediate item variables (x and w) */
- icu::MaybeStackArray xw;
+ icu::MaybeStackArray xw;
if(sizeInMaxAlignTs(itemSize)*2 > xw.getCapacity() &&
xw.resize(sizeInMaxAlignTs(itemSize) * 2) == nullptr) {
*pErrorCode=U_MEMORY_ALLOCATION_ERROR;
return;
}
diff -ru a/icu4c/source/tools/toolutil/toolutil.cpp b/icu4c/source/tools/toolutil/toolutil.cpp
--- a/icu4c/source/tools/toolutil/toolutil.cpp 2021-04-07 11:47:42.000000000 +0800
+++ b/icu4c/source/tools/toolutil/toolutil.cpp 2020-07-14 23:49:37.836668741 +0000
@@ -242,7 +242,7 @@
char name[64];
int32_t capacity, maxCapacity, size, idx;
void *array;
- alignas(std::max_align_t) char staticArray[1];
+ alignas(max_align_t) char staticArray[1];
};
U_CAPI UToolMemory * U_EXPORT2
然后用这个文件替换1.15.5版本的third_party\icu\udata.patch
即可。
这样,按照官方教程,就能拿到 libtensorflow_framework.so 和 libtensorflow_jni.so 咯。