编译链接错误及解决方法记录


问题描述

在VS2005下调用PCRE.lib链接错

1>Linking...
1>RERewrite.obj : error LNK2019: unresolved external symbol __imp__pcre_free referenced in function _TA_RE_Init
1>RERewrite.obj : error LNK2019: unresolved external symbol __imp__pcre_exec referenced in function _TA_RE_Init
1>RERewrite.obj : error LNK2019: unresolved external symbol __imp__pcre_compile referenced in function _TA_RE_Init

---------------
分析与解决

If you want to statically link this program against a non-dll .a file, you must
define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and
pcre_free() exported functions will be declared __declspec(dllimport), with
unwanted results. So in this environment, uncomment the following line. */

//#define PCRE_STATIC


问题描述

在VS2005下调试PCRE包,pcre_free报错

Unhandled exception at ...... Access violation reading location 0x00000000.

---------------
分析与解决
将 pcre.h 里面的以下宏定义注释掉即可

#define VPCOMPAT (For Virtual Pascal)


问题描述

在CentOS-7下编译安装pcre(从官网ftp直接下载https://ftp.pcre.org/pub/pcre)时出现以下错误

autoreconf: 'configure.ac' or 'configure.in' is required

---------------
分析与解决
可能是aclocal、m4等工具的版本与原始配置文件不符,需要重新生成配置文件。可在pcre解压后的源码目录里执行以下命令

autoreconf -f -i

问题描述

在 Ubuntu 16.04.5 LTS 上将 tensorflow C++ 编译成 libtensorflow_cc.a

问题一:

dirname: missing operand
Try 'dirname --help' for more information.
cat: /proto_text_cc_files.txt: No such file or directory
cat: /proto_text_pb_cc_files.txt: No such file or directory
cat: /proto_text_pb_h_files.txt: No such file or directory
mkdir: cannot create directory ‘/gen’: Permission denied
cat: /tf_op_files.txt: No such file or directory
cat: /tf_pb_text_files.txt: No such file or directory
cat: /tf_proto_files.txt: No such file or directory

---------------
分析与解决

make 版本过低
可能是我选的 apt 镜像站点很久不维护,通过 apt-get install make=4.2 并不能更新
所以自己到 http://ftp.gnu.org/gnu/make/ 下载了最新的版本来安装。此问题解决

问题二:

downloading https://bitbucket.org/eigen/eigen/get/fd6845384b86.tar.gz
tensorflow/contrib/makefile/download_dependencies.sh: line 65: curl: command not found

---------------
分析与解决

sudo apt-get install curl

问题三:

In file included from tensorflow/core/lib/io/zlib_outputbuffer.cc:16:0:
./tensorflow/core/lib/io/zlib_outputbuffer.h:19:18: fatal error: zlib.h: No such file or directory

---------------
分析与解决

sudo apt-get install zlib1g-dev

问题四:

同事遇到过的问题:cmake版本过低,缺少依赖库 sudo apt-get install autoconf autogen libtool,写文件无权限


编译Tensorflow踩坑记录

环境与版本
OS: CentOS 7
CUDA: 10.0
NCCL: 2.5.6
cuDNN: 7.6
python: 3.6
bazel: 0.19.2
TensorFlow: 1.13.1

---------------
问题描述

Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ERROR: /home/storage12/tanxingjun/github/tensorflow-1.13.1/tensorflow/core/nccl/BUILD:19:1: no such package '@local_config_nccl//': Traceback (most recent call last):
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 167
                _check_nccl_version(repository_ctx, nccl_install_path, n..., ...)
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 96, in _check_nccl_version
                _find_nccl_header(repository_ctx, nccl_install_path)
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 79, in _find_nccl_header
                auto_configure_fail(("Cannot find %s" % str(header_p...)))
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/gpus/cuda_configure.bzl", line 342, in auto_configure_fail
                fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find /lib64/include/nccl.h
 and referenced by '//tensorflow/core/nccl:nccl_lib'
ERROR: Analysis of target '//tensorflow:libtensorflow_cc.so' failed; build aborted: no such package '@local_config_nccl//': Traceback (most recent call last):
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 167
                _check_nccl_version(repository_ctx, nccl_install_path, n..., ...)
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 96, in _check_nccl_version
                _find_nccl_header(repository_ctx, nccl_install_path)
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 79, in _find_nccl_header
                auto_configure_fail(("Cannot find %s" % str(header_p...)))
        File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/gpus/cuda_configure.bzl", line 342, in auto_configure_fail
                fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find /lib64/include/nccl.h

---------------
分析与解决

tensorflow/configure.py里面有这样的代码

# First check to see if NCCL is in the ldconfig.
# If its found, use that location.
if is_linux():
  ldconfig_bin = which('ldconfig') or '/sbin/ldconfig'
  nccl2_path_from_ldconfig = run_shell([ldconfig_bin, '-p'])
  nccl2_path_from_ldconfig = re.search('.*libnccl.so .* => (.*)',
                                       nccl2_path_from_ldconfig)

即 NCCL 依赖的环境变量 NCCL_INSTALL_PATH 和 NCCL_HDR_PATH 不是用户设置的,而是通过这段脚本设置的(事实上在找到问题之前我手动export过,发现并没有生效)。于是单独执行 

$ /sbin/ldconfig -p | grep nccl
        libnccl.so.2 (libc6,x86-64) => /lib64/libnccl.so.2
        libnccl.so (libc6,x86-64) => /lib64/libnccl.so

果然问题在这里:libnccl.so 的安装路径是 /usr/lib64 但这里却是 /lib64。原来是管理员在 / 目录下做了以下链接

$ ls -lh /
lrwxrwxrwx   1 root root    7 Nov 24  2018 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Nov 24  2018 lib64 -> usr/lib64

其结果是 libnccl.so 能被找到,但 nccl.h 却找不到,所以报错

解决方案:
在 /etc/ld.so.conf.d 目录下增加一个 nccl-2.5.6.conf 文件,里面只需要一行内容即可

/usr/lib64

然后执行 /sbin/ldconfig 即可

---------------
问题描述
这个错是编译TF静态库遇到的

ar: creating tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a
tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a(spectrogram.o): In function `tensorflow::Spectrogram::ProcessCoreFFT()':
spectrogram.cc:(.text+0xd8): undefined reference to `rdft'
collect2: error: ld returned 1 exit status
make[3]: *** [tensorflow/contrib/makefile/gen/bin/benchmark] Error 1
make[2]: *** [tensorflow-stamp/tensorflow_static-configure] Error 2
make[1]: *** [CMakeFiles/tensorflow_static.dir/all] Error 2
make: *** [all] Error 2

---------------
分析与解决

参考https://github.com/tensorflow/tensorflow/commit/f1f1d5172fe5bfeaeb2cf657ffc43ba744187bee
当然它是tensorflow-lite,所以需要改一下路径

Step 01.
修改~/tensorflow/tensorflow/contrib/makefile/download_dependencies.sh

 55 # fix the error: spectrogram.cc:(.text+0xd8): undefined reference to `rdft'
 56 FFT2D_URL="https://mirror.bazel.build/www.kurims.kyoto-u.ac.jp/~ooura/fft.tgz"

140 download_and_extract "${FFT2D_URL}" "${DOWNLOADS_DIR}/fft2d"

Step 02.
修改~/tensorflow/tensorflow/contrib/makefile/Makefile

718 # Add in any extra files that don't fit the patterns easily
719 TF_CC_SRCS += tensorflow/contrib/makefile/downloads/fft2d/fftsg.

---------------
问题描述

运行时错

ERROR: Creating graph in session failed...Not found: Op type not registered 'BlockLSTM' in binary running on . Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
Segmentation fault


---------------
分析与解决

contrib下定义的op默认没有链接到libtensorflow_cc.so中
添加tensorflow/BUILD文件中libtensorflow_cc.so的依赖即可

tf_cc_shared_object(
    name = "libtensorflow_cc.so",
    linkopts = select({
        "//tensorflow:darwin": [
            "-Wl,-exported_symbols_list",  # This line must be directly followed by the exported_symbols.lds file
            "$(location //tensorflow:tf_exported_symbols.lds)",
        ],
        "//tensorflow:windows": [],
        "//conditions:default": [
            "-z defs",
            "-Wl,--version-script",  #  This line must be directly followed by the version_script.lds file
            "$(location //tensorflow:tf_version_script.lds)",
        ],
    }),
    visibility = ["//visibility:public"],
    deps = [
        "//tensorflow:tf_exported_symbols.lds",
        "//tensorflow:tf_version_script.lds",
        "//tensorflow/c:c_api",
        "//tensorflow/c/eager:c_api",
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/cc:scope",
        "//tensorflow/cc/profiler",
        "//tensorflow/core:tensorflow",
        "//tensorflow/contrib/rnn:lstm_ops_kernels", # 添加LSTM
        "//tensorflow/contrib/rnn:gru_ops_kernels",  # 添加GRU
    ] + if_ngraph(["@ngraph_tf//:ngraph_tf"]),
)

---------------
问题描述

这是在编译tensorflow-1.15时遇到的报错

ls/build_defs/repo/git.bzl:252:18):
 - /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/bazel_toolchains/repositories/repositories.bzl:37:9
 - /home/storage12/tanxingjun/repository/tensorflow/github.tensorflow.1.15.3/WORKSPACE:35:1
ERROR: An error occurred during the fetch of repository 'io_bazel_rules_docker':
   Traceback (most recent call last):
        File "/tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/bazel_tools/tools/build_defs/repo/git.bzl", line 234
                _clone_or_update(ctx)
        File "/tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/bazel_tools/tools/build_defs/repo/git.bzl", line 74, in _clone_or_update
                fail(("error cloning %s:\n%s" % (ctx....)))
error cloning io_bazel_rules_docker:
+ cd /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external
+ rm -rf /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/io_bazel_rules_docker /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/io_bazel_rules_docker
+ git clone '' https://github.com/bazelbuild/rules_docker.git /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/io_bazel_rules_docker
Too many arguments.


---------------
分析与解决

https://github.com/tensorflow/tensorflow/issues/28824有解决方法:
更新rules_docker即可。将https://github.com/bazelbuild/rules_docker的最新版的README.md里面的一段内容添加到WORKSPACE开头即可。

# Download the rules_docker repository at release v0.14.3
http_archive(
    name = "io_bazel_rules_docker",
    sha256 = "6287241e033d247e9da5ff705dd6ef526bac39ae82f3d17de1b69f8cb313f9cd",
    strip_prefix = "rules_docker-0.14.3",
    urls = ["https://github.com/bazelbuild/rules_docker/releases/download/v0.14.3/rules_docker-v0.14.3.tar.gz"],
)

---------------
问题描述

/home/tanxingjun/repository/mitts-cloud/third_party/tf_infer/static/include/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:393:63: error: the value of ‘j’ is not usable in a constant expression
         if (++block_iter_state[j].count < block_iter_state[j].size) {
                                                               ^
/home/tanxingjun/repository/mitts-cloud/third_party/tf_infer/static/include/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:392:16: note: ‘int j’ is not const
       for (int j = 0; j < num_squeezed_dims; ++j) {
                ^

---------------
分析与解决

可能受其它模块影响,编译时将tensorflow相关的头文件放到其它头文件前面有可能解决。比如我的情况是受openfst影响,将tensorflow头文件移到openfst头文件之前即可避免。

---------------
问题描述

Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

---------------
分析与解决

TF、CUDA、cudnn的版本配置参照官方文档即可


问题描述

g++ (version 5.4.0) 编译 rapidxml (version 1.13) 报 print_*** 函数找不到的错误

/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:115:37: error: ‘print_children’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
                 out = print_children(out, node, flags, indent);
                                     ^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:169:22: note: ‘template OutIt rapidxml::internal::print_children(OutIt, const rapidxml::xml_node*, int, int)’ declared here, later in the translation unit
         inline OutIt print_children(OutIt out, const xml_node *node, int flags, int indent)
                      ^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:120:41: error: ‘print_element_node’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
                 out = print_element_node(out, node, flags, indent);
                                         ^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:242:22: note: ‘template OutIt rapidxml::internal::print_element_node(OutIt, const rapidxml::xml_node*, int, int)’ declared here, later in the translation unit
         inline OutIt print_element_node(OutIt out, const xml_node *node, int flags, int indent)
                      ^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:125:38: error: ‘print_data_node’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
                 out = print_data_node(out, node, flags, indent);

---------------
分析与解决

参考 http://gcc.gnu.org/gcc-4.7/porting_to.html 关于 Name lookup changes 的说明。

Name lookup changes

The C++ compiler no longer performs some extra unqualified lookups it had performed in the past, namely dependent base class scope lookups and unqualified template function lookups.

C++ programs that depended on the compiler's previous behavior may no longer compile. This can be temporarily worked around by using -fpermissive.

新标准中模板函数中调用的其它函数必须在使用之前申明,所以得修改rapidxml的代码。stackoverflow上有人给了方法。https://stackoverflow.com/questions/14113923/rapidxml-print-header-has-undefined-methods

当然也可以加 -fpermissive 临时解决。cmake编译方式下只需要在 CMakeLists.txt 中加 ADD_DEFINITIONS(-fpermissive) 即可。


问题描述

使用FFTW单精度浮点(fftw3f)时链接错,显示一堆各种找不到

synthesis.cpp:(.text+0x1a96): undefined reference to `fftwf_execute'

synthesis.cpp:(.text+0x2472): undefined reference to `fftwf_execute'

synthesis.cpp:(.text+0x1e99): undefined reference to `fftwf_execute'

common.cpp:(.text+0x16f8): undefined reference to `fftwf_plan_dft_r2c_1d'

common.cpp:(.text+0x1719): undefined reference to `fftwf_destroy_plan'

common.cpp:(.text+0x17c8): undefined reference to `fftwf_plan_dft_c2r_1d'

common.cpp:(.text+0x17e9): undefined reference to `fftwf_destroy_plan'

common.cpp:(.text+0x1885): undefined reference to `fftwf_plan_dft_1d'

common.cpp:(.text+0x18a9): undefined reference to `fftwf_destroy_plan'

common.cpp:(.text+0x1964): undefined reference to `fftwf_plan_dft_r2c_1d'

common.cpp:(.text+0x197f): undefined reference to `fftwf_plan_dft_1d'

common.cpp:(.text+0x1999): undefined reference to `fftwf_destroy_plan'

common.cpp:(.text+0x19a2): undefined reference to `fftwf_destroy_plan'

common.cpp:(.text+0x1b93): undefined reference to `fftwf_execute'

common.cpp:(.text+0x1d54): undefined reference to `fftwf_execute'

myfunctions.cpp:(.text+0x1fe1): undefined reference to `fftwf_execute'

myfunctions.cpp:(.text+0x2230): undefined reference to `fftwf_execute'

myfunctions.cpp:(.text+0x25c8): undefined reference to `fftwf_execute'

collect2: error: ld returned 1 exit status

 

 

 

 

 

 

 

 

 

 

 

---------------
分析与解决

google一下说要链接libfftw3f.a而非libfftw3.a,然后才发现编译的库文件并没有libfftw3f.a。

查询官方文档,有如下说明:

You can install single and long-double precision versions of FFTW, which replace double with float and long double

并提供了相应的配置选项:

--enable-float: Produces a single-precision version of FFTW (float) instead of the default double-precision (double).
--enable-long-double: Produces a long-double precision version of FFTW (long double) instead of the default double-precision (double).

于是重新编译fftw库

cd fftw-3.3.8
./configure --prefix= --enable-float --enable-sse2 --enable-avx2 --with-pic
make -j
make install

然后 libfftw3f.a 就出现了。链接时将-lfftw3替换成-lfftw3f即可


问题描述

configure时出现以下错误

configure: error: source directory already configured; run "make distclean" there first

编译时出现以下错误

make[4]: *** No rule to make target 'n1_2.c', needed by 'all'.  Stop.

---------------
分析与解决

以上两个问题目前都只在git版本(https://github.com/FFTW/fftw3)会出现,如果从官网(http://www.fftw.org/download.html)直接下载可能不会出现。

第一个错是因为 git 版本直接用的 bootstrap.sh 配置的,把配置项都写在它后面就可以了。如果不想在源码目录编译可以改 bootstrap.sh 把 ./configure 行注释掉并拿到其它地方执行,加上--enable-maintainer-mode选项。

第二个错可能是没装 ocaml,可以看到 bootstrap.sh 的输出里面有一行 “checking for ocamlbuild... no”,或者直接 “whereis ocamlbuild” 看看到底有没有安装。CentOS下可用以下命令安装

sudo yum install ocaml ocaml-ocamlbuild

问题描述:

autoconf报以下错误

autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: configure.ac: not using Libtool
autoreconf: running: /usr/bin/autoconf --force
configure.ac:35: error: possibly undefined macro: AC_DISABLE_SHARED
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:294: error: possibly undefined macro: AC_LIBTOOL_WIN32_DLL
configure.ac:295: error: possibly undefined macro: AC_PROG_LIBTOOL
autoreconf: /usr/bin/autoconf failed with exit status: 1
configure: WARNING: unrecognized options: --disable-shared, --with-pic
configure: error: cannot find install-sh, install.sh, or shtool in "." "./.." "./../.."

---------------
分析与解决:

安装libtool即可

sudo apt-get install libtool

问题描述

通过cmake编译cuda程序报错

Building for TensorRT version: 6.0.1.5, library version: 6.0.1
-- The CUDA compiler identification is unknown
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- broken
CMake Error at /home/work/soft/cmake-3.15.4/share/cmake-3.15/Modules/CMakeTestCUDACompiler.cmake:46 (message):
  The CUDA compiler

    "/usr/local/cuda-10.0/bin/nvcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake cmTC_fde7d/fast && /usr/bin/gmake -f CMakeFiles/cmTC_fde7d.dir/build.make CMakeFiles/cmTC_fde7d.dir/build
    gmake[1]: Entering directory `/home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_fde7d.dir/main.cu.o
    /usr/local/cuda-10.0/bin/nvcc     -x cu -c /home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_fde7d.dir/m                                                               ain.cu.o
    Linking CUDA executable cmTC_fde7d
    /home/work/soft/cmake-3.15.4/bin/cmake -E cmake_link_script CMakeFiles/cmTC_fde7d.dir/link.txt --verbose=1
    ""   CMakeFiles/cmTC_fde7d.dir/main.cu.o -o cmTC_fde7d
    Error running link command: No such file or directory
    gmake[1]: *** [cmTC_fde7d] Error 2
    gmake[1]: Leaving directory `/home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp'
    gmake: *** [cmTC_fde7d/fast] Error 2

---------------
分析与解决

将CUDA库路径加入LIBRARY_PATH

export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64

问题描述

---------------
分析与解决


你可能感兴趣的:(经验总结)