问题描述
在VS2005下调用PCRE.lib链接错
1>Linking...
1>RERewrite.obj : error LNK2019: unresolved external symbol __imp__pcre_free referenced in function _TA_RE_Init
1>RERewrite.obj : error LNK2019: unresolved external symbol __imp__pcre_exec referenced in function _TA_RE_Init
1>RERewrite.obj : error LNK2019: unresolved external symbol __imp__pcre_compile referenced in function _TA_RE_Init
---------------
分析与解决
If you want to statically link this program against a non-dll .a file, you must
define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and
pcre_free() exported functions will be declared __declspec(dllimport), with
unwanted results. So in this environment, uncomment the following line. */
//#define PCRE_STATIC
问题描述
在VS2005下调试PCRE包,pcre_free报错
Unhandled exception at ...... Access violation reading location 0x00000000.
---------------
分析与解决
将 pcre.h 里面的以下宏定义注释掉即可
#define VPCOMPAT (For Virtual Pascal)
问题描述
在CentOS-7下编译安装pcre(从官网ftp直接下载https://ftp.pcre.org/pub/pcre)时出现以下错误
autoreconf: 'configure.ac' or 'configure.in' is required
---------------
分析与解决
可能是aclocal、m4等工具的版本与原始配置文件不符,需要重新生成配置文件。可在pcre解压后的源码目录里执行以下命令
autoreconf -f -i
问题描述
在 Ubuntu 16.04.5 LTS 上将 tensorflow C++ 编译成 libtensorflow_cc.a
问题一:
dirname: missing operand
Try 'dirname --help' for more information.
cat: /proto_text_cc_files.txt: No such file or directory
cat: /proto_text_pb_cc_files.txt: No such file or directory
cat: /proto_text_pb_h_files.txt: No such file or directory
mkdir: cannot create directory ‘/gen’: Permission denied
cat: /tf_op_files.txt: No such file or directory
cat: /tf_pb_text_files.txt: No such file or directory
cat: /tf_proto_files.txt: No such file or directory
---------------
分析与解决
make 版本过低
可能是我选的 apt 镜像站点很久不维护,通过 apt-get install make=4.2 并不能更新
所以自己到 http://ftp.gnu.org/gnu/make/ 下载了最新的版本来安装。此问题解决
问题二:
downloading https://bitbucket.org/eigen/eigen/get/fd6845384b86.tar.gz
tensorflow/contrib/makefile/download_dependencies.sh: line 65: curl: command not found
---------------
分析与解决
sudo apt-get install curl
问题三:
In file included from tensorflow/core/lib/io/zlib_outputbuffer.cc:16:0:
./tensorflow/core/lib/io/zlib_outputbuffer.h:19:18: fatal error: zlib.h: No such file or directory
---------------
分析与解决
sudo apt-get install zlib1g-dev
问题四:
同事遇到过的问题:cmake版本过低,缺少依赖库 sudo apt-get install autoconf autogen libtool,写文件无权限
环境与版本
OS: CentOS 7
CUDA: 10.0
NCCL: 2.5.6
cuDNN: 7.6
python: 3.6
bazel: 0.19.2
TensorFlow: 1.13.1
---------------
问题描述
Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ERROR: /home/storage12/tanxingjun/github/tensorflow-1.13.1/tensorflow/core/nccl/BUILD:19:1: no such package '@local_config_nccl//': Traceback (most recent call last):
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 167
_check_nccl_version(repository_ctx, nccl_install_path, n..., ...)
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 96, in _check_nccl_version
_find_nccl_header(repository_ctx, nccl_install_path)
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 79, in _find_nccl_header
auto_configure_fail(("Cannot find %s" % str(header_p...)))
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/gpus/cuda_configure.bzl", line 342, in auto_configure_fail
fail(("\n%sCuda Configuration Error:%...)))
Cuda Configuration Error: Cannot find /lib64/include/nccl.h
and referenced by '//tensorflow/core/nccl:nccl_lib'
ERROR: Analysis of target '//tensorflow:libtensorflow_cc.so' failed; build aborted: no such package '@local_config_nccl//': Traceback (most recent call last):
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 167
_check_nccl_version(repository_ctx, nccl_install_path, n..., ...)
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 96, in _check_nccl_version
_find_nccl_header(repository_ctx, nccl_install_path)
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/nccl/nccl_configure.bzl", line 79, in _find_nccl_header
auto_configure_fail(("Cannot find %s" % str(header_p...)))
File "/home/storage12/tanxingjun/github/tensorflow-1.13.1/third_party/gpus/cuda_configure.bzl", line 342, in auto_configure_fail
fail(("\n%sCuda Configuration Error:%...)))
Cuda Configuration Error: Cannot find /lib64/include/nccl.h
---------------
分析与解决
tensorflow/configure.py里面有这样的代码
# First check to see if NCCL is in the ldconfig.
# If its found, use that location.
if is_linux():
ldconfig_bin = which('ldconfig') or '/sbin/ldconfig'
nccl2_path_from_ldconfig = run_shell([ldconfig_bin, '-p'])
nccl2_path_from_ldconfig = re.search('.*libnccl.so .* => (.*)',
nccl2_path_from_ldconfig)
即 NCCL 依赖的环境变量 NCCL_INSTALL_PATH 和 NCCL_HDR_PATH 不是用户设置的,而是通过这段脚本设置的(事实上在找到问题之前我手动export过,发现并没有生效)。于是单独执行
$ /sbin/ldconfig -p | grep nccl
libnccl.so.2 (libc6,x86-64) => /lib64/libnccl.so.2
libnccl.so (libc6,x86-64) => /lib64/libnccl.so
果然问题在这里:libnccl.so 的安装路径是 /usr/lib64 但这里却是 /lib64。原来是管理员在 / 目录下做了以下链接
$ ls -lh /
lrwxrwxrwx 1 root root 7 Nov 24 2018 lib -> usr/lib
lrwxrwxrwx 1 root root 9 Nov 24 2018 lib64 -> usr/lib64
其结果是 libnccl.so 能被找到,但 nccl.h 却找不到,所以报错
解决方案:
在 /etc/ld.so.conf.d 目录下增加一个 nccl-2.5.6.conf 文件,里面只需要一行内容即可
/usr/lib64
然后执行 /sbin/ldconfig 即可
---------------
问题描述
这个错是编译TF静态库遇到的
ar: creating tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a
tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a(spectrogram.o): In function `tensorflow::Spectrogram::ProcessCoreFFT()':
spectrogram.cc:(.text+0xd8): undefined reference to `rdft'
collect2: error: ld returned 1 exit status
make[3]: *** [tensorflow/contrib/makefile/gen/bin/benchmark] Error 1
make[2]: *** [tensorflow-stamp/tensorflow_static-configure] Error 2
make[1]: *** [CMakeFiles/tensorflow_static.dir/all] Error 2
make: *** [all] Error 2
---------------
分析与解决
参考https://github.com/tensorflow/tensorflow/commit/f1f1d5172fe5bfeaeb2cf657ffc43ba744187bee
当然它是tensorflow-lite,所以需要改一下路径
Step 01.
修改~/tensorflow/tensorflow/contrib/makefile/download_dependencies.sh
55 # fix the error: spectrogram.cc:(.text+0xd8): undefined reference to `rdft'
56 FFT2D_URL="https://mirror.bazel.build/www.kurims.kyoto-u.ac.jp/~ooura/fft.tgz"
140 download_and_extract "${FFT2D_URL}" "${DOWNLOADS_DIR}/fft2d"
Step 02.
修改~/tensorflow/tensorflow/contrib/makefile/Makefile
718 # Add in any extra files that don't fit the patterns easily
719 TF_CC_SRCS += tensorflow/contrib/makefile/downloads/fft2d/fftsg.
---------------
问题描述
运行时错
ERROR: Creating graph in session failed...Not found: Op type not registered 'BlockLSTM' in binary running on . Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
Segmentation fault
---------------
分析与解决
contrib下定义的op默认没有链接到libtensorflow_cc.so中
添加tensorflow/BUILD文件中libtensorflow_cc.so的依赖即可
tf_cc_shared_object(
name = "libtensorflow_cc.so",
linkopts = select({
"//tensorflow:darwin": [
"-Wl,-exported_symbols_list", # This line must be directly followed by the exported_symbols.lds file
"$(location //tensorflow:tf_exported_symbols.lds)",
],
"//tensorflow:windows": [],
"//conditions:default": [
"-z defs",
"-Wl,--version-script", # This line must be directly followed by the version_script.lds file
"$(location //tensorflow:tf_version_script.lds)",
],
}),
visibility = ["//visibility:public"],
deps = [
"//tensorflow:tf_exported_symbols.lds",
"//tensorflow:tf_version_script.lds",
"//tensorflow/c:c_api",
"//tensorflow/c/eager:c_api",
"//tensorflow/cc:cc_ops",
"//tensorflow/cc:client_session",
"//tensorflow/cc:scope",
"//tensorflow/cc/profiler",
"//tensorflow/core:tensorflow",
"//tensorflow/contrib/rnn:lstm_ops_kernels", # 添加LSTM
"//tensorflow/contrib/rnn:gru_ops_kernels", # 添加GRU
] + if_ngraph(["@ngraph_tf//:ngraph_tf"]),
)
---------------
问题描述
这是在编译tensorflow-1.15时遇到的报错
ls/build_defs/repo/git.bzl:252:18):
- /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/bazel_toolchains/repositories/repositories.bzl:37:9
- /home/storage12/tanxingjun/repository/tensorflow/github.tensorflow.1.15.3/WORKSPACE:35:1
ERROR: An error occurred during the fetch of repository 'io_bazel_rules_docker':
Traceback (most recent call last):
File "/tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/bazel_tools/tools/build_defs/repo/git.bzl", line 234
_clone_or_update(ctx)
File "/tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/bazel_tools/tools/build_defs/repo/git.bzl", line 74, in _clone_or_update
fail(("error cloning %s:\n%s" % (ctx....)))
error cloning io_bazel_rules_docker:
+ cd /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external
+ rm -rf /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/io_bazel_rules_docker /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/io_bazel_rules_docker
+ git clone '' https://github.com/bazelbuild/rules_docker.git /tmp/cache_bazel/_bazel_tanxingjun/90e8e332ec99cea3d1b4a54e52a75780/external/io_bazel_rules_docker
Too many arguments.
---------------
分析与解决
https://github.com/tensorflow/tensorflow/issues/28824有解决方法:
更新rules_docker即可。将https://github.com/bazelbuild/rules_docker的最新版的README.md里面的一段内容添加到WORKSPACE开头即可。
# Download the rules_docker repository at release v0.14.3
http_archive(
name = "io_bazel_rules_docker",
sha256 = "6287241e033d247e9da5ff705dd6ef526bac39ae82f3d17de1b69f8cb313f9cd",
strip_prefix = "rules_docker-0.14.3",
urls = ["https://github.com/bazelbuild/rules_docker/releases/download/v0.14.3/rules_docker-v0.14.3.tar.gz"],
)
---------------
问题描述
/home/tanxingjun/repository/mitts-cloud/third_party/tf_infer/static/include/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:393:63: error: the value of ‘j’ is not usable in a constant expression
if (++block_iter_state[j].count < block_iter_state[j].size) {
^
/home/tanxingjun/repository/mitts-cloud/third_party/tf_infer/static/include/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:392:16: note: ‘int j’ is not const
for (int j = 0; j < num_squeezed_dims; ++j) {
^
---------------
分析与解决
可能受其它模块影响,编译时将tensorflow相关的头文件放到其它头文件前面有可能解决。比如我的情况是受openfst影响,将tensorflow头文件移到openfst头文件之前即可避免。
---------------
问题描述
Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
---------------
分析与解决
TF、CUDA、cudnn的版本配置参照官方文档即可
问题描述
g++ (version 5.4.0) 编译 rapidxml (version 1.13) 报 print_*** 函数找不到的错误
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:115:37: error: ‘print_children’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
out = print_children(out, node, flags, indent);
^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:169:22: note: ‘template
inline OutIt print_children(OutIt out, const xml_node
^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:120:41: error: ‘print_element_node’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
out = print_element_node(out, node, flags, indent);
^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:242:22: note: ‘template
inline OutIt print_element_node(OutIt out, const xml_node
^
/home/tanxj/repository/mitts-cloud/third_party/rapidxml/rapidxml_print.hpp:125:38: error: ‘print_data_node’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
out = print_data_node(out, node, flags, indent);
---------------
分析与解决
参考 http://gcc.gnu.org/gcc-4.7/porting_to.html 关于 Name lookup changes 的说明。
The C++ compiler no longer performs some extra unqualified lookups it had performed in the past, namely dependent base class scope lookups and unqualified template function lookups.
C++ programs that depended on the compiler's previous behavior may no longer compile. This can be temporarily worked around by using -fpermissive
.
新标准中模板函数中调用的其它函数必须在使用之前申明,所以得修改rapidxml的代码。stackoverflow上有人给了方法。https://stackoverflow.com/questions/14113923/rapidxml-print-header-has-undefined-methods
当然也可以加 -fpermissive 临时解决。cmake编译方式下只需要在 CMakeLists.txt 中加 ADD_DEFINITIONS(-fpermissive) 即可。
问题描述
使用FFTW单精度浮点(fftw3f)时链接错,显示一堆各种找不到
synthesis.cpp:(.text+0x1a96): undefined reference to `fftwf_execute' synthesis.cpp:(.text+0x2472): undefined reference to `fftwf_execute' synthesis.cpp:(.text+0x1e99): undefined reference to `fftwf_execute' common.cpp:(.text+0x16f8): undefined reference to `fftwf_plan_dft_r2c_1d' common.cpp:(.text+0x1719): undefined reference to `fftwf_destroy_plan' common.cpp:(.text+0x17c8): undefined reference to `fftwf_plan_dft_c2r_1d' common.cpp:(.text+0x17e9): undefined reference to `fftwf_destroy_plan' common.cpp:(.text+0x1885): undefined reference to `fftwf_plan_dft_1d' common.cpp:(.text+0x18a9): undefined reference to `fftwf_destroy_plan' common.cpp:(.text+0x1964): undefined reference to `fftwf_plan_dft_r2c_1d' common.cpp:(.text+0x197f): undefined reference to `fftwf_plan_dft_1d' common.cpp:(.text+0x1999): undefined reference to `fftwf_destroy_plan' common.cpp:(.text+0x19a2): undefined reference to `fftwf_destroy_plan' common.cpp:(.text+0x1b93): undefined reference to `fftwf_execute' common.cpp:(.text+0x1d54): undefined reference to `fftwf_execute' myfunctions.cpp:(.text+0x1fe1): undefined reference to `fftwf_execute' myfunctions.cpp:(.text+0x2230): undefined reference to `fftwf_execute' myfunctions.cpp:(.text+0x25c8): undefined reference to `fftwf_execute' collect2: error: ld returned 1 exit status |
---------------
分析与解决
google一下说要链接libfftw3f.a而非libfftw3.a,然后才发现编译的库文件并没有libfftw3f.a。
查询官方文档,有如下说明:
You can install single and long-double precision versions of FFTW, which replace double with float and long double
并提供了相应的配置选项:
--enable-float: Produces a single-precision version of FFTW (float) instead of the default double-precision (double).
--enable-long-double: Produces a long-double precision version of FFTW (long double) instead of the default double-precision (double).
于是重新编译fftw库
cd fftw-3.3.8
./configure --prefix= --enable-float --enable-sse2 --enable-avx2 --with-pic
make -j
make install
然后 libfftw3f.a 就出现了。链接时将-lfftw3替换成-lfftw3f即可
问题描述
configure时出现以下错误
configure: error: source directory already configured; run "make distclean" there first
编译时出现以下错误
make[4]: *** No rule to make target 'n1_2.c', needed by 'all'. Stop.
---------------
分析与解决
以上两个问题目前都只在git版本(https://github.com/FFTW/fftw3)会出现,如果从官网(http://www.fftw.org/download.html)直接下载可能不会出现。
第一个错是因为 git 版本直接用的 bootstrap.sh 配置的,把配置项都写在它后面就可以了。如果不想在源码目录编译可以改 bootstrap.sh 把 ./configure 行注释掉并拿到其它地方执行,加上--enable-maintainer-mode选项。
第二个错可能是没装 ocaml,可以看到 bootstrap.sh 的输出里面有一行 “checking for ocamlbuild... no”,或者直接 “whereis ocamlbuild” 看看到底有没有安装。CentOS下可用以下命令安装
sudo yum install ocaml ocaml-ocamlbuild
问题描述:
autoconf报以下错误
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: configure.ac: not using Libtool
autoreconf: running: /usr/bin/autoconf --force
configure.ac:35: error: possibly undefined macro: AC_DISABLE_SHARED
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
configure.ac:294: error: possibly undefined macro: AC_LIBTOOL_WIN32_DLL
configure.ac:295: error: possibly undefined macro: AC_PROG_LIBTOOL
autoreconf: /usr/bin/autoconf failed with exit status: 1
configure: WARNING: unrecognized options: --disable-shared, --with-pic
configure: error: cannot find install-sh, install.sh, or shtool in "." "./.." "./../.."
---------------
分析与解决:
安装libtool即可
sudo apt-get install libtool
问题描述
通过cmake编译cuda程序报错
Building for TensorRT version: 6.0.1.5, library version: 6.0.1
-- The CUDA compiler identification is unknown
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- broken
CMake Error at /home/work/soft/cmake-3.15.4/share/cmake-3.15/Modules/CMakeTestCUDACompiler.cmake:46 (message):
The CUDA compiler
"/usr/local/cuda-10.0/bin/nvcc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp
Run Build Command(s):/usr/bin/gmake cmTC_fde7d/fast && /usr/bin/gmake -f CMakeFiles/cmTC_fde7d.dir/build.make CMakeFiles/cmTC_fde7d.dir/build
gmake[1]: Entering directory `/home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp'
Building CUDA object CMakeFiles/cmTC_fde7d.dir/main.cu.o
/usr/local/cuda-10.0/bin/nvcc -x cu -c /home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_fde7d.dir/m ain.cu.o
Linking CUDA executable cmTC_fde7d
/home/work/soft/cmake-3.15.4/bin/cmake -E cmake_link_script CMakeFiles/cmTC_fde7d.dir/link.txt --verbose=1
"" CMakeFiles/cmTC_fde7d.dir/main.cu.o -o cmTC_fde7d
Error running link command: No such file or directory
gmake[1]: *** [cmTC_fde7d] Error 2
gmake[1]: Leaving directory `/home/storage12/tanxingjun/repository/mitts-cloud.tanxj.dev/build/CMakeFiles/CMakeTmp'
gmake: *** [cmTC_fde7d/fast] Error 2
---------------
分析与解决
将CUDA库路径加入LIBRARY_PATH
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64
问题描述
---------------
分析与解决