Tensorflow2.6.0-MKL for C++

 在intel官网看到Optimize AI Applications with Intel® oneAPI Deep Neural Network...

查了一下,好像即使只要有intel,以后CPU预测速度也可以提高。所以决定试试,去年我编译的libtensorflow.so好像没有加入这一块!

试一下,看是否比去年的快。

先按下面的链接介绍安装bazel:然后一定要记得将文件/root/bin/bazel拷贝到/usr/local/bin/bazel。哦还要看你想装的tensorflow是需要哪个版本的bazel,即按1步骤后可在configure文件中看到:

Tensorflow2.6.0-MKL for C++_第1张图片

安装过程若出现如下错误:证明是现在网不好,文件下载不完全引起的,重新下载即可。

Tensorflow2.6.0-MKL for C++_第2张图片 Tensorflow2.6.0-MKL for C++_第3张图片

 安装完毕即可验证版本。如上所示。


https://www.intel.com/content/www/us/en/developer/articles/guide/optimization-for-tensorflow-installation-guide.html

https://www.intel.com/content/www/us/en/developer/articles/technical/tensorflow-optimizations-on-modern-intel-architecture.html

install bazel in https://docs.bazel.build/versions/main/install-ubuntu.html  https://github.com/bazelbuild/bazelisk/releases

1,从上面的guide中git到tensorflow(会自动到root下)并按上面的教程切换到r2.6分支,然后开始配置。
root@jumper-MS-7B47:~/tensorflow# ./configure
You have bazel 3.7.2 installed.
Please specify the location of python. [Default is /usr/local/bin/python3]: 


Found possible Python library paths:
  /usr/local/lib/python3.7/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.7/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: 
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: 
Clang will not be downloaded.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=mkl_aarch64 	# Build with oneDNN and Compute Library for the Arm Architecture (ACL).
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
	--config=v1          	# Build with TensorFlow 1 API instead of TF 2 API.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=nogcp       	# Disable GCP support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished
看这里默认是TFv1-without contrib版本。

2,接着开始编译动态库,我选择的是v2,带mkl版本
root@jumper-MS-7B47:~/tensorflow# bazel build --config=v2 --config=mkl -c opt --copt=-march=native //tensorflow:libtensorflow_cc.so
Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [v2]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=80
INFO: Reading rc options for 'build' from /root/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /root/tensorflow/.bazelrc:
  'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true
INFO: Reading rc options for 'build' from /root/tensorflow/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/local/bin/python3 --action_env PYTHON_LIB_PATH=/usr/local/lib/python3.7/site-packages --python_path=/usr/local/bin/python3
INFO: Found applicable config definition build:short_logs in file /root/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /root/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:v2 in file /root/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:mkl in file /root/tensorflow/.bazelrc: --define=build_with_mkl=true --define=enable_mkl=true --define=tensorflow_mkldnn_contraction_kernel=0 --define=build_with_openmp=true -c opt
INFO: Found applicable config definition build:linux in file /root/tensorflow/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false
INFO: Found applicable config definition build:dynamic_kernels in file /root/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
DEBUG: /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/tf_runtime/third_party/cuda/dependencies.bzl:51:10: The following command will download NVIDIA proprietary software. By using the software you agree to comply with the terms of the license agreement that accompanies the software. If you do not agree to the terms of the license agreement, do not use the software.
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1556410077 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
  /root/tensorflow/WORKSPACE:23:14: in 
  /root/tensorflow/tensorflow/workspace0.bzl:108:34: in workspace
  /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/bazel_toolchains/repositories/repositories.bzl:37:23: in repositories
Repository rule git_repository defined at:
  /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in 
INFO: Analyzed target //tensorflow:libtensorflow_cc.so (217 packages loaded, 19638 targets configured).
INFO: Found 1 target...
Target //tensorflow:libtensorflow_cc.so up-to-date:
  bazel-bin/tensorflow/libtensorflow_cc.so
INFO: Elapsed time: 3862.275s, Critical Path: 188.63s
INFO: 7274 processes: 337 internal, 6937 local.
INFO: Build completed successfully, 7274 total actions


3,把bazel_bin下的tensorflow拷贝到我的include中

4,测试发现:tensorflow_mklomp2.6/include/tensorflow/core/framework/graph.pb.h:10:10: 致命错误:google/protobuf/port_def.inc:没有那个文件或目录
于是回到编译文件夹那里重新编译:bazel build --config=v2 --config=mkl -c opt --copt=-march=native //tensorflow:libtensorflow_cc.so //tensorflow:install_headers
然后可以看到bazel_bin下生成了include文件,这样直接省略了人工收集include的过程

5,运行实例/tensorflow_mklomp2.6/include/tensorflow/core/framework/tensor.h:906:7: 错误:static assertion failed: std::string is no longer a scalar type, use tensorflow::tstring
  906 |       !std::is_same::value,
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/subdir.mk:21: recipe for target 'src/jinXingCnn.o' failed
make: *** [src/jinXingCnn.o] Error 1

solution:https://github.com/tensorflow/tensorflow/issues/43150 将工程中下列地方改成tensorflow::tstring

6,再次运行实例,出现:
2021-11-01 15:49:10.354330: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3407930000 Hz
2021-11-01 15:49:10.354590: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1eee5c0 executing computations on platform Host. Devices:
2021-11-01 15:49:10.354603: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): , 
/home/jumper/xrt/parallel/xrtCNNwithoutParallel/Release/xrtCNNwithoutParallel: relocation error: /home/jumper/xrt/parallel/xrtCNNwithoutParallel/Release/xrtCNNwithoutParallel: symbol _ZN10tensorflow15ReadBinaryProtoEPNS_3EnvERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPN6google8protobuf11MessageLiteE, version tensorflow not defined in file libtensorflow_cc.so.2 with link time reference


solution:
set environment variable for path of tensorflow lib为刚刚的库路径设置环境变量即可
我是直接打开终端 export LD_LIBRARY_PATH=/pathoftensorflowlib:$LD_LIBRARY_PATH

7,然后在终端下继续运行实例,发现之前不用MKL时8个核每个核跑到30%,使用MKL后8个核每个核都到了100%,但是速度比原来慢了非常多!

8,检查内存泄漏https://blog.csdn.net/gaussrieman123/article/details/106628042发现并不是内存泄漏导致的速度这么慢。
root@rootwd-Default-string:/home/jumper/xrt/parallel/xrtCNNTry/Release# valgrind --tool=memcheck --leak-check=yes ./xrtCNNTry
==3514== Memcheck, a memory error detector
==3514== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==3514== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==3514== Command: ./xrtCNNTry
==3514== 
==3514== Warning: set address range perms: large range [0x12db3000, 0x23a9c000) (defined)
==3514== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x26EAACD2: __kmp_affinity_determine_capable (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E8C019: __kmp_env_initialize(char const*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E77663: _INTERNAL_25_______src_kmp_runtime_cpp_7e558fa4::__kmp_do_serial_initialize() (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6C514: __kmp_get_global_thread_id_reg (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E60C99: omp_set_num_threads@@VERSION (in /usr/local/lib/libiomp5.so)
==3514==    by 0x2450BCAA: void absl::lts_20210324::base_internal::CallOnceImpl(std::atomic*, absl::lts_20210324::base_internal::SchedulingMode, void (&)(int), int&&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x2450C608: tensorflow::ThreadPoolDevice::ThreadPoolDevice(tensorflow::SessionOptions const&, std::__cxx11::basic_string, std::allocator > const&, tensorflow::gtl::IntType, tensorflow::DeviceLocality const&, tensorflow::Allocator*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x2450A42C: tensorflow::ThreadPoolDeviceFactory::CreateDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string, std::allocator > const&, std::vector >, std::allocator > > >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x24158A41: tensorflow::DeviceFactory::AddCpuDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string, std::allocator > const&, std::vector >, std::allocator > > >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x24158B09: tensorflow::DeviceFactory::AddDevices(tensorflow::SessionOptions const&, std::__cxx11::basic_string, std::allocator > const&, std::vector >, std::allocator > > >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x20C513EA: tensorflow::DirectSessionFactory::NewSession(tensorflow::SessionOptions const&, tensorflow::Session**) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==3514== 
2021-11-04 10:39:33.857830: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
==3514== Thread 11:
==3514== Conditional jump or move depends on uninitialised value(s)
==3514==    at 0x2448A57C: int tensorflow::PropagatorState::FrameState::ActivateNodesFastPathInternal(tensorflow::NodeItem const*, bool, tensorflow::PropagatorState::IterationState*, absl::lts_20210324::InlinedVector >*, absl::lts_20210324::InlinedVector >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x2448C1EF: tensorflow::PropagatorState::FrameState::ActivateNodesAndAdjustOutstanding(tensorflow::NodeItem const*, bool, tensorflow::PropagatorState::IterationState*, absl::lts_20210324::InlinedVector >*, absl::lts_20210324::InlinedVector >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x2448E9BC: tensorflow::PropagatorState::PropagateOutputs(tensorflow::PropagatorState::TaggedNode const&, absl::lts_20210324::InlinedVector >*, absl::lts_20210324::InlinedVector >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x24482498: tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::PropagatorState::TaggedNode, long) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x244833F7: std::_Function_handler::RunTask::ScheduleReady(absl::lts_20210324::InlinedVector >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::{lambda()#2}>(tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(absl::lts_20210324::InlinedVector >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::{lambda()#2}&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x1601E2D1: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)
==3514== 
==3514== Conditional jump or move depends on uninitialised value(s)
==3514==    at 0x2448A265: int tensorflow::PropagatorState::FrameState::ActivateNodesFastPathInternal(tensorflow::NodeItem const*, bool, tensorflow::PropagatorState::IterationState*, absl::lts_20210324::InlinedVector >*, absl::lts_20210324::InlinedVector >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x2448C1EF: tensorflow::PropagatorState::FrameState::ActivateNodesAndAdjustOutstanding(tensorflow::NodeItem const*, bool, tensorflow::PropagatorState::IterationState*, absl::lts_20210324::InlinedVector >*, absl::lts_20210324::InlinedVector >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x2448E9BC: tensorflow::PropagatorState::PropagateOutputs(tensorflow::PropagatorState::TaggedNode const&, absl::lts_20210324::InlinedVector >*, absl::lts_20210324::InlinedVector >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x24482498: tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::PropagatorState::TaggedNode, long) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x244833F7: std::_Function_handler::RunTask::ScheduleReady(absl::lts_20210324::InlinedVector >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::{lambda()#2}>(tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(absl::lts_20210324::InlinedVector >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::{lambda()#2}&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x1601E2D1: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)
==3514== 
VEX temporary storage exhausted.
Pool = TEMP,  start 0x38f8e668 curr 0x39418210 end 0x394531a7 (size 5000000)

vex: the `impossible' happened:
   VEX temporary storage exhausted.
Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile.
vex storage: T total 10668761528 bytes allocated
vex storage: P total 640 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==3514==    at 0x38083F48: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x38084064: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x380842A1: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x380842CA: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x3809F682: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x38145428: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x38145494: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x3816A8A7: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x3814342F: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x380A1C0B: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x380D296B: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x380D45CF: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x380E3946: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x380E3E1A: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0x3810C62D: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==3514==    by 0xDEADBEEFDEADBEEE: ???
==3514==    by 0xDEADBEEFDEADBEEE: ???
==3514==    by 0xDEADBEEFDEADBEEE: ???

sched status:
  running_tid=25

Thread 1: status = VgTs_WaitSys (lwpid 3514)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x212DB9E2: nsync::nsync_mu_semaphore_p_with_deadline(nsync::nsync_semaphore_s_*, timespec) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x212DB1C0: nsync::nsync_sem_wait_with_cancel_(nsync::waiter*, timespec, nsync::nsync_note_s_*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x212D8651: nsync::nsync_cv_wait_with_deadline_generic(nsync::nsync_cv_s_*, void*, void (*)(void*), void (*)(void*), timespec, nsync::nsync_note_s_*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x212D8B32: nsync::nsync_cv_wait_with_deadline(nsync::nsync_cv_s_*, nsync::nsync_mu_s_*, timespec, nsync::nsync_note_s_*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x24407F3A: tensorflow::Executor::Run(tensorflow::Executor::Args const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x20C57823: tensorflow::DirectSession::RunInternal(long, tensorflow::RunOptions const&, tensorflow::CallFrameInterface*, tensorflow::DirectSession::ExecutorsAndKeys*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x20C5A3B7: tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector, std::allocator >, tensorflow::Tensor>, std::allocator, std::allocator >, tensorflow::Tensor> > > const&, std::vector, std::allocator >, std::allocator, std::allocator > > > const&, std::vector, std::allocator >, std::allocator, std::allocator > > > const&, std::vector >*, tensorflow::RunMetadata*, tensorflow::thread::ThreadPoolOptions const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x20C46CE2: tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector, std::allocator >, tensorflow::Tensor>, std::allocator, std::allocator >, tensorflow::Tensor> > > const&, std::vector, std::allocator >, std::allocator, std::allocator > > > const&, std::vector, std::allocator >, std::allocator, std::allocator > > > const&, std::vector >*, tensorflow::RunMetadata*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x20C571AE: tensorflow::DirectSession::Run(std::vector, std::allocator >, tensorflow::Tensor>, std::allocator, std::allocator >, tensorflow::Tensor> > > const&, std::vector, std::allocator >, std::allocator, std::allocator > > > const&, std::vector, std::allocator >, std::allocator, std::allocator > > > const&, std::vector >*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x406457: jinXingCnn::jinXingSingleInference(cv::Mat&, cv::Mat&, int&, float&) (in /home/jumper/xrt/parallel/xrtCNNTry/Release/xrtCNNTry)
==3514==    by 0x404560: main (in /home/jumper/xrt/parallel/xrtCNNTry/Release/xrtCNNTry)

Thread 2: status = VgTs_WaitSys (lwpid 3515)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 3: status = VgTs_WaitSys (lwpid 3516)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 4: status = VgTs_WaitSys (lwpid 3517)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 5: status = VgTs_WaitSys (lwpid 3518)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 6: status = VgTs_WaitSys (lwpid 3519)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 7: status = VgTs_WaitSys (lwpid 3520)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 8: status = VgTs_WaitSys (lwpid 3521)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 9: status = VgTs_WaitSys (lwpid 3522)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 10: status = VgTs_WaitSys (lwpid 3523)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E550: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 11: status = VgTs_WaitSys (lwpid 3524)
==3514==    at 0x2611AA26: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.S:87)
==3514==    by 0x26EA7ED8: __kmp_resume_64 (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E448D4: _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E4622C: __kmp_fork_barrier(int, int) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E71457: __kmp_fork_call (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E62E2A: __kmp_GOMP_fork_call (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E64C07: GOMP_parallel@@VERSION (in /usr/local/lib/libiomp5.so)
==3514==    by 0x1E641C4D: dnnl::impl::cpu::x64::jit_avx2_convolution_fwd_t::execute_forward(dnnl::impl::exec_ctx_t const&) const (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1D92B9E8: dnnl::impl::cpu::x64::jit_avx2_convolution_fwd_t::execute(dnnl::impl::exec_ctx_t const&) const (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1D8C0EFB: dnnl_primitive::execute(dnnl::impl::exec_ctx_t&) const (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1D8C1453: dnnl::impl::primitive_execute(dnnl_primitive const*, dnnl::impl::exec_ctx_t&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1D8C1702: dnnl_primitive_execute (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1C5C0E31: tensorflow::MklConvFwdPrimitive::Execute(float const*, float const*, float const*, std::shared_ptr) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1C67F4C2: tensorflow::MklConvOp::Compute(tensorflow::OpKernelContext*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x24482B9F: tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::PropagatorState::TaggedNode, long) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x24475E8D: std::_Function_handler::RunTask::*)(tensorflow::PropagatorState::TaggedNode, long)> (tensorflow::(anonymous namespace)::ExecutorState*, tensorflow::PropagatorState::TaggedNode, long)> >(std::_Bind::*)(tensorflow::PropagatorState::TaggedNode, long)> (tensorflow::(anonymous namespace)::ExecutorState*, tensorflow::PropagatorState::TaggedNode, long)>&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x1601E2D1: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 12: status = VgTs_WaitSys (lwpid 3525)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x253050FB: std::condition_variable::wait(std::unique_lock&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3514==    by 0x1601DE3A: Eigen::ThreadPoolTempl::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601E650: Eigen::ThreadPoolTempl::WorkerLoop(int) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x1601A637: std::_Function_handler)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_cc.so.2.6.0)
==3514==    by 0x2490EE58: tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) (in /home/jumper/workspace/tensorflow_mklomp2.6/lib/libtensorflow_framework.so.2.6.0)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 13: status = VgTs_WaitSys (lwpid 3527)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x269A3DCC: tbb::internal::rml::private_worker::run() (in /usr/local/lib/libtbb.so)
==3514==    by 0x269A3E08: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/local/lib/libtbb.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 14: status = VgTs_WaitSys (lwpid 3528)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x269A3DCC: tbb::internal::rml::private_worker::run() (in /usr/local/lib/libtbb.so)
==3514==    by 0x269A3E08: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/local/lib/libtbb.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 15: status = VgTs_WaitSys (lwpid 3529)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x269A3DCC: tbb::internal::rml::private_worker::run() (in /usr/local/lib/libtbb.so)
==3514==    by 0x269A3E08: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/local/lib/libtbb.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 16: status = VgTs_WaitSys (lwpid 3530)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x269A3DCC: tbb::internal::rml::private_worker::run() (in /usr/local/lib/libtbb.so)
==3514==    by 0x269A3E08: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/local/lib/libtbb.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 17: status = VgTs_WaitSys (lwpid 3531)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x269A3DCC: tbb::internal::rml::private_worker::run() (in /usr/local/lib/libtbb.so)
==3514==    by 0x269A3E08: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/local/lib/libtbb.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 18: status = VgTs_WaitSys (lwpid 3532)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x269A3DCC: tbb::internal::rml::private_worker::run() (in /usr/local/lib/libtbb.so)
==3514==    by 0x269A3E08: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/local/lib/libtbb.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 19: status = VgTs_WaitSys (lwpid 3533)
==3514==    at 0x25C405D9: syscall (syscall.S:38)
==3514==    by 0x269A3DCC: tbb::internal::rml::private_worker::run() (in /usr/local/lib/libtbb.so)
==3514==    by 0x269A3E08: tbb::internal::rml::private_worker::thread_routine(void*) (in /usr/local/lib/libtbb.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 20: status = VgTs_WaitSys (lwpid 3538)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x26EA5CE8: __kmp_suspend_64 (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E44CEA: _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E4622C: __kmp_fork_barrier(int, int) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6F81F: __kmp_launch_thread (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26EA1FA3: _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 21: status = VgTs_WaitSys (lwpid 3539)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x26EA5CE8: __kmp_suspend_64 (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E44CEA: _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E4622C: __kmp_fork_barrier(int, int) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6F81F: __kmp_launch_thread (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26EA1FA3: _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 22: status = VgTs_WaitSys (lwpid 3540)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x26EA5CE8: __kmp_suspend_64 (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E44CEA: _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E4622C: __kmp_fork_barrier(int, int) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6F81F: __kmp_launch_thread (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26EA1FA3: _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 23: status = VgTs_WaitSys (lwpid 3541)
==3514==    at 0x2611AA26: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.S:87)
==3514==    by 0x26EA7ED8: __kmp_resume_64 (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E448D4: _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E4622C: __kmp_fork_barrier(int, int) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6F81F: __kmp_launch_thread (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26EA1FA3: _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 24: status = VgTs_WaitSys (lwpid 3542)
==3514==    at 0x2611D26D: __lll_lock_wait (lowlevellock.S:135)
==3514==    by 0x26118C50: __pthread_mutex_cond_lock (pthread_mutex_lock.c:80)
==3514==    by 0x2611A3EF: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:259)
==3514==    by 0x26EA5CE8: __kmp_suspend_64 (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E44CEA: _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E4622C: __kmp_fork_barrier(int, int) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6F81F: __kmp_launch_thread (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26EA1FA3: _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 25: status = VgTs_Runnable (lwpid 3543)
==3514==    at 0x411F1AE: ???
==3514==    by 0x26EA1AC2: __kmp_invoke_microtask (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E70256: __kmp_invoke_task_func (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6F8D4: __kmp_launch_thread (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26EA1FA3: _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)

Thread 26: status = VgTs_WaitSys (lwpid 3544)
==3514==    at 0x2611A360: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:185)
==3514==    by 0x26EA5CE8: __kmp_suspend_64 (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E44CEA: _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E4622C: __kmp_fork_barrier(int, int) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26E6F81F: __kmp_launch_thread (in /usr/local/lib/libiomp5.so)
==3514==    by 0x26EA1FA3: _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker(void*) (in /usr/local/lib/libiomp5.so)
==3514==    by 0x261146B9: start_thread (pthread_create.c:333)
==3514==    by 0x25C4651C: clone (clone.S:109)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

root@rootwd-Default-string:/home/jumper/xrt/parallel/xrtCNNTry/Release# 


9,这里又说直接使用intel_tensorflow:https://blog.csdn.net/gaussrieman123/article/details/109031115?spm=1001.2014.3001.5501

为什么这么慢呢?编译的这个版本明明没问题了,也没博主说的内存泄漏。到底什么原因导致的?

以下显示当时编译也是没有任何问题的。

Tensorflow2.6.0-MKL for C++_第4张图片Tensorflow2.6.0-MKL for C++_第5张图片而且设置也无问题,跑起来的时候核都飞起了。

Tensorflow2.6.0-MKL for C++_第6张图片但就是时间很慢很慢:

2021-11-04 09:20:09.222941: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
image 0  contoursnumber: 1 time: 165.404 ms!
image 1  contoursnumber: 1 time: 6.88905 ms!
image 2  contoursnumber: 1 time: 3.54466 ms!
image 3  contoursnumber: 1 time: 3.56257 ms!
image 4  contoursnumber: 1 time: 8.56913 ms!
image 5  contoursnumber: 1 time: 24.2727 ms!
image 6  contoursnumber: 1 time: 11.8442 ms!
image 7  contoursnumber: 2 time: 30.6798 ms!
image 8  contoursnumber: 1 time: 18.9718 ms!
image 9  contoursnumber: 1 time: 181.086 ms!
image 10  contoursnumber: 1 time: 34.2604 ms!
image 11  contoursnumber: 4 time: 94.7321 ms!
image 12  contoursnumber: 1 time: 2.78627 ms!
image 13  contoursnumber: 2 time: 17.1375 ms!
image 14  contoursnumber: 1 time: 10.5116 ms!
image 15  contoursnumber: 3 time: 69.1845 ms!
image 16  contoursnumber: 2 time: 54.063 ms!
image 17  contoursnumber: 5 time: 70.7947 ms!
image 18  contoursnumber: 3 time: 127.469 ms!
image 19  contoursnumber: 4 time: 137.692 ms!
image 20  contoursnumber: 5 time: 39.2106 ms!
image 21  contoursnumber: 10 time: 70.2001 ms!
image 22  contoursnumber: 6 time: 58.2006 ms!
image 23  contoursnumber: 9 time: 145.273 ms!
image 24  contoursnumber: 16 time: 164.113 ms!
image 25  contoursnumber: 14 time: 140.407 ms!
image 26  contoursnumber: 19 time: 368.37 ms!
image 27  contoursnumber: 16 time: 177.934 ms!
image 28  contoursnumber: 14 time: 253.712 ms!
image 29  contoursnumber: 16 time: 68.7066 ms!
image 30  contoursnumber: 20 time: 308.638 ms!
image 31  contoursnumber: 35 time: 621.751 ms!
image 32  contoursnumber: 35 time: 294.543 ms!
image 33  contoursnumber: 33 time: 190.038 ms!
image 34  contoursnumber: 33 time: 384.437 ms!
image 35  contoursnumber: 35 time: 641.447 ms!
image 36  contoursnumber: 30 time: 117.715 ms!
image 37  contoursnumber: 43 time: 118.032 ms!
image 38  contoursnumber: 29 time: 138.433 ms!
image 39  contoursnumber: 40 time: 250.087 ms!
image 40  contoursnumber: 60 time: 543.546 ms!
image 41  contoursnumber: 59 time: 604.045 ms!
image 42  contoursnumber: 68 time: 363.982 ms!
image 43  contoursnumber: 60 time: 473.302 ms!
image 44  contoursnumber: 83 time: 786.114 ms!
image 45  contoursnumber: 71 time: 238.955 ms!
image 46  contoursnumber: 83 time: 560.018 ms!
image 47  contoursnumber: 85 time: 1055.17 ms!
image 48  contoursnumber: 77 time: 612.51 ms!
image 49  contoursnumber: 82 time: 871.528 ms!
image 50  contoursnumber: 95 time: 503.243 ms!
image 51  contoursnumber: 107 time: 789.133 ms!
image 52  contoursnumber: 101 time: 751.478 ms!
image 53  contoursnumber: 95 time: 521.219 ms!
image 54  contoursnumber: 74 time: 348.509 ms!
image 55  contoursnumber: 95 time: 700.569 ms!
image 56  contoursnumber: 86 time: 563.311 ms!
image 57  contoursnumber: 94 time: 696.34 ms!
image 58  contoursnumber: 104 time: 678.648 ms!
image 59  contoursnumber: 93 time: 689.167 ms!
image 60  contoursnumber: 128 time: 785.795 ms!
image 61  contoursnumber: 104 time: 370.197 ms!
image 62  contoursnumber: 101 time: 695.368 ms!
image 63  contoursnumber: 99 time: 386.249 ms!
image 64  contoursnumber: 103 time: 654.749 ms!
image 65  contoursnumber: 111 time: 350.074 ms!
image 66  contoursnumber: 129 time: 1702.72 ms!
image 67  contoursnumber: 109 time: 544.824 ms!
image 68  contoursnumber: 121 time: 1259.43 ms!
image 69  contoursnumber: 116 time: 634.254 ms!
image 70  contoursnumber: 112 time: 1078.5 ms!
image 71  contoursnumber: 113 time: 448.014 ms!
image 72  contoursnumber: 115 time: 578.946 ms!
image 73  contoursnumber: 104 time: 866.64 ms!
image 74  contoursnumber: 95 time: 839.735 ms!
image 75  contoursnumber: 108 time: 622.835 ms!
image 76  contoursnumber: 77 time: 498.654 ms!
image 77  contoursnumber: 115 time: 1157.03 ms!
image 78  contoursnumber: 103 time: 2103.04 ms!
image 79  contoursnumber: 91 time: 421.994 ms!
image 80  contoursnumber: 84 time: 532.462 ms!
image 81  contoursnumber: 88 time: 732.916 ms!
image 82  contoursnumber: 72 time: 553.692 ms!
image 83  contoursnumber: 74 time: 278.869 ms!
image 84  contoursnumber: 75 time: 370.129 ms!
image 85  contoursnumber: 72 time: 568.95 ms!
image 86  contoursnumber: 71 time: 935.075 ms!
image 87  contoursnumber: 58 time: 611.403 ms!
image 88  contoursnumber: 61 time: 608.242 ms!
image 89  contoursnumber: 68 time: 638.952 ms!
image 90  contoursnumber: 49 time: 679.073 ms!
image 91  contoursnumber: 52 time: 197.024 ms!
image 92  contoursnumber: 57 time: 213.327 ms!
image 93  contoursnumber: 60 time: 145.864 ms!
image 94  contoursnumber: 51 time: 178.088 ms!
image 95  contoursnumber: 20 time: 90.9068 ms!
image 96  contoursnumber: 29 time: 575.21 ms!
image 97  contoursnumber: 21 time: 142.668 ms!
image 98  contoursnumber: 10 time: 155.449 ms!
image 99  contoursnumber: 7 time: 56.3241 ms!
image 100  contoursnumber: 4 time: 112.157 ms!
image 101  contoursnumber: 3 time: 48.2224 ms!
image 102  contoursnumber: 1 time: 4.49101 ms!
image 103  contoursnumber: 1 time: 9.43159 ms!
image 104  contoursnumber: 2 time: 31.763 ms!
image 105  contoursnumber: 2 time: 259.153 ms!
image 106  contoursnumber: 1 time: 8.34897 ms!
image 107  contoursnumber: 1 time: 14.1501 ms!
image 108  contoursnumber: 1 time: 13.4157 ms!
image 109  contoursnumber: 1 time: 15.2861 ms!
image 110  contoursnumber: 1 time: 30.0232 ms!
mean time 385.456 ms
max time 2103.04 ms
root@rootwd-Default-string:/home/jumper/xrt/parallel/xrtCNNTry/Release# 

比我去年什么都没有搞的cpu版本还慢很多,下面是去年的速度:

image 0  contoursnumber: 1 time: 215.328 ms!
image 1  contoursnumber: 1 time: 3.9097 ms!
image 2  contoursnumber: 1 time: 4.38048 ms!
image 3  contoursnumber: 1 time: 4.9417 ms!
image 4  contoursnumber: 1 time: 3.99377 ms!
image 5  contoursnumber: 1 time: 4.54973 ms!
image 6  contoursnumber: 1 time: 4.42919 ms!
image 7  contoursnumber: 2 time: 8.74767 ms!
image 8  contoursnumber: 1 time: 4.49718 ms!
image 9  contoursnumber: 1 time: 4.54444 ms!
image 10  contoursnumber: 1 time: 4.26188 ms!
image 11  contoursnumber: 4 time: 14.6775 ms!
image 12  contoursnumber: 1 time: 3.85717 ms!
image 13  contoursnumber: 2 time: 7.7923 ms!
image 14  contoursnumber: 1 time: 4.02385 ms!
image 15  contoursnumber: 3 time: 11.6701 ms!
image 16  contoursnumber: 2 time: 7.56668 ms!
image 17  contoursnumber: 5 time: 17.2285 ms!
image 18  contoursnumber: 3 time: 10.9037 ms!
image 19  contoursnumber: 4 time: 15.8927 ms!
image 20  contoursnumber: 5 time: 17.9607 ms!
image 21  contoursnumber: 10 time: 34.51 ms!
image 22  contoursnumber: 6 time: 21.092 ms!
image 23  contoursnumber: 9 time: 32.4407 ms!
image 24  contoursnumber: 16 time: 55.0001 ms!
image 25  contoursnumber: 14 time: 47.8659 ms!
image 26  contoursnumber: 19 time: 67.684 ms!
image 27  contoursnumber: 16 time: 56.6169 ms!
image 28  contoursnumber: 14 time: 47.0008 ms!
image 29  contoursnumber: 16 time: 54.8117 ms!
image 30  contoursnumber: 20 time: 67.6562 ms!
image 31  contoursnumber: 35 time: 117.237 ms!
image 32  contoursnumber: 35 time: 110.993 ms!
image 33  contoursnumber: 33 time: 110.121 ms!
image 34  contoursnumber: 33 time: 106.439 ms!
image 35  contoursnumber: 35 time: 118.31 ms!
image 36  contoursnumber: 30 time: 98.951 ms!
image 37  contoursnumber: 43 time: 141.166 ms!
image 38  contoursnumber: 29 time: 95.6124 ms!
image 39  contoursnumber: 40 time: 131.273 ms!
image 40  contoursnumber: 60 time: 196.478 ms!
image 41  contoursnumber: 59 time: 198.934 ms!
image 42  contoursnumber: 68 time: 216.352 ms!
image 43  contoursnumber: 60 time: 193.67 ms!
image 44  contoursnumber: 83 time: 259.158 ms!
image 45  contoursnumber: 71 time: 236.651 ms!
image 46  contoursnumber: 83 time: 262.82 ms!
image 47  contoursnumber: 85 time: 280.913 ms!
image 48  contoursnumber: 77 time: 247.691 ms!
image 49  contoursnumber: 82 time: 264.615 ms!
image 50  contoursnumber: 95 time: 298.288 ms!
image 51  contoursnumber: 107 time: 352.973 ms!
image 52  contoursnumber: 101 time: 333.127 ms!
image 53  contoursnumber: 95 time: 304.485 ms!
image 54  contoursnumber: 74 time: 242.383 ms!
image 55  contoursnumber: 95 time: 302.55 ms!
image 56  contoursnumber: 86 time: 281.847 ms!
image 57  contoursnumber: 94 time: 314.126 ms!
image 58  contoursnumber: 104 time: 344.507 ms!
image 59  contoursnumber: 93 time: 299.361 ms!
image 60  contoursnumber: 128 time: 403.624 ms!
image 61  contoursnumber: 104 time: 338.79 ms!
image 62  contoursnumber: 101 time: 349.402 ms!
image 63  contoursnumber: 99 time: 317.887 ms!
image 64  contoursnumber: 103 time: 337.589 ms!
image 65  contoursnumber: 111 time: 350.287 ms!
image 66  contoursnumber: 129 time: 405.237 ms!
image 67  contoursnumber: 109 time: 370.085 ms!
image 68  contoursnumber: 121 time: 403.513 ms!
image 69  contoursnumber: 116 time: 376.87 ms!
image 70  contoursnumber: 112 time: 377.656 ms!
image 71  contoursnumber: 113 time: 376.028 ms!
image 72  contoursnumber: 115 time: 388.223 ms!
image 73  contoursnumber: 104 time: 337.376 ms!
image 74  contoursnumber: 95 time: 308.087 ms!
image 75  contoursnumber: 108 time: 334.705 ms!
image 76  contoursnumber: 77 time: 250.793 ms!
image 77  contoursnumber: 115 time: 378.777 ms!
image 78  contoursnumber: 103 time: 330.838 ms!
image 79  contoursnumber: 91 time: 298.235 ms!
image 80  contoursnumber: 84 time: 264.395 ms!
image 81  contoursnumber: 88 time: 303.579 ms!
image 82  contoursnumber: 72 time: 234.702 ms!
image 83  contoursnumber: 74 time: 236.552 ms!
image 84  contoursnumber: 75 time: 243.109 ms!
image 85  contoursnumber: 72 time: 232.462 ms!
image 86  contoursnumber: 71 time: 231.71 ms!
image 87  contoursnumber: 58 time: 188.411 ms!
image 88  contoursnumber: 61 time: 195.348 ms!
image 89  contoursnumber: 68 time: 237.083 ms!
image 90  contoursnumber: 49 time: 157.995 ms!
image 91  contoursnumber: 52 time: 167.288 ms!
image 92  contoursnumber: 57 time: 188.229 ms!
image 93  contoursnumber: 60 time: 193.937 ms!
image 94  contoursnumber: 51 time: 158.905 ms!
image 95  contoursnumber: 20 time: 72.3785 ms!
image 96  contoursnumber: 29 time: 93.3377 ms!
image 97  contoursnumber: 21 time: 69.2373 ms!
image 98  contoursnumber: 10 time: 36.3054 ms!
image 99  contoursnumber: 7 time: 26.6407 ms!
image 100  contoursnumber: 4 time: 15.0212 ms!
image 101  contoursnumber: 3 time: 11.5187 ms!
image 102  contoursnumber: 1 time: 4.57232 ms!
image 103  contoursnumber: 1 time: 4.56704 ms!
image 104  contoursnumber: 2 time: 7.91162 ms!
image 105  contoursnumber: 2 time: 8.47972 ms!
image 106  contoursnumber: 1 time: 4.10723 ms!
image 107  contoursnumber: 1 time: 4.14542 ms!
image 108  contoursnumber: 1 time: 4.46583 ms!
image 109  contoursnumber: 1 time: 4.3269 ms!
image 110  contoursnumber: 1 time: 4.10413 ms!

无法接受。难道不能编译v2版本?还是说真的只能编译intel-tensorflow版本?

终于找啊找啊找,找到了原因:原来是没有按intel介绍的export这些命令:于是我加上


https://cloud.tencent.com/document/product/213/55669

https://github.com/IntelAI/models/blob/master/docs/general/tensorflow/GeneralBestPractices.md

lscpu | grep "Core(s) per socket" | cut -d':' -f2 | xargs
export OMP_NUM_THREADS= # 
export KMP_AFFINITY="granularity=fine,verbose,compact,1,0"
export KMP_BLOCKTIME=1
export KMP_SETTINGS=1
export TF_NUM_INTRAOP_THREADS= # 
export TF_NUM_INTEROP_THREADS=1
export TF_ENABLE_MKL_NATIVE_FORMAT=0

然后再次运行,看到了效果啊!!!!

root@rootwd-Default-string:/home/jumper/xrt/parallel/xrtCNNTry/Release# ./xrtCNNTry 

libgomp: Invalid value for environment variable OMP_NUM_THREADS
OMP: Warning #227: OMP_NUM_THREADS: Invalid symbols found. Check the value "".

User settings:

   KMP_AFFINITY=granularity=fine,verbose,compact,1,0
   KMP_BLOCKTIME=1
   KMP_SETTINGS=1
   OMP_NUM_THREADS=

Effective settings:

   KMP_ABORT_DELAY=0
   KMP_ADAPTIVE_LOCK_PROPS='1,1024'
   KMP_ALIGN_ALLOC=64
   KMP_ALL_THREADPRIVATE=128
   KMP_ATOMIC_MODE=2
   KMP_BLOCKTIME=1
   KMP_CPUINFO_FILE: value is not defined
   KMP_DETERMINISTIC_REDUCTION=false
   KMP_DEVICE_THREAD_LIMIT=2147483647
   KMP_DISP_NUM_BUFFERS=7
   KMP_DUPLICATE_LIB_OK=false
   KMP_FORCE_REDUCTION: value is not defined
   KMP_FOREIGN_THREADS_THREADPRIVATE=true
   KMP_FORKJOIN_BARRIER='2,2'
   KMP_FORKJOIN_BARRIER_PATTERN='hyper,hyper'
   KMP_FORKJOIN_FRAMES=true
   KMP_FORKJOIN_FRAMES_MODE=3
   KMP_GTID_MODE=3
   KMP_HANDLE_SIGNALS=false
   KMP_HOT_TEAMS_MAX_LEVEL=1
   KMP_HOT_TEAMS_MODE=0
   KMP_INIT_AT_FORK=true
   KMP_INIT_WAIT=2048
   KMP_ITT_PREPARE_DELAY=0
   KMP_LIBRARY=throughput
   KMP_LOCK_KIND=queuing
   KMP_MALLOC_POOL_INCR=1M
   KMP_NEXT_WAIT=1024
   KMP_NUM_LOCKS_IN_BLOCK=1
   KMP_PLAIN_BARRIER='2,2'
   KMP_PLAIN_BARRIER_PATTERN='hyper,hyper'
   KMP_REDUCTION_BARRIER='1,1'
   KMP_REDUCTION_BARRIER_PATTERN='hyper,hyper'
   KMP_SCHEDULE='static,balanced;guided,iterative'
   KMP_SETTINGS=true
   KMP_SPIN_BACKOFF_PARAMS='4096,100'
   KMP_STACKOFFSET=64
   KMP_STACKPAD=0
   KMP_STACKSIZE=4M
   KMP_STORAGE_MAP=false
   KMP_TASKING=2
   KMP_TASKLOOP_MIN_TASKS=0
   KMP_TASK_STEALING_CONSTRAINT=1
   KMP_TEAMS_THREAD_LIMIT=8
   KMP_TOPOLOGY_METHOD=all
   KMP_USER_LEVEL_MWAIT=false
   KMP_VERSION=false
   KMP_WARNINGS=true
   OMP_CANCELLATION=false
   OMP_DEFAULT_DEVICE=0
   OMP_DISPLAY_ENV=false
   OMP_DYNAMIC=false
   OMP_MAX_ACTIVE_LEVELS=2147483647
   OMP_MAX_TASK_PRIORITY=0
   OMP_NESTED=false
   OMP_NUM_THREADS: value is not defined
   OMP_PLACES: value is not defined
   OMP_PROC_BIND='intel'
   OMP_SCHEDULE='static'
   OMP_STACKSIZE=4M
   OMP_THREAD_LIMIT=2147483647
   OMP_WAIT_POLICY=PASSIVE
   KMP_AFFINITY='verbose,warnings,respect,granularity=fine,compact,1,0'

OMP: Info #209: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #207: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #211: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3124 thread 0 bound to OS proc set {0}
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3135 thread 1 bound to OS proc set {1}
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3136 thread 2 bound to OS proc set {2}
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3137 thread 3 bound to OS proc set {3}
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3138 thread 4 bound to OS proc set {4}
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3139 thread 5 bound to OS proc set {5}
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3140 thread 6 bound to OS proc set {6}
OMP: Info #247: KMP_AFFINITY: pid 3115 tid 3141 thread 7 bound to OS proc set {7}
image 0  contoursnumber: 1 time: 500.349 ms!
image 1  contoursnumber: 1 time: 8.03288 ms!
image 2  contoursnumber: 1 time: 9.97024 ms!
image 3  contoursnumber: 1 time: 2.0015 ms!
image 4  contoursnumber: 1 time: 3.70273 ms!
image 5  contoursnumber: 1 time: 1.80889 ms!
image 6  contoursnumber: 1 time: 1.80585 ms!
image 7  contoursnumber: 2 time: 3.36128 ms!
image 8  contoursnumber: 1 time: 1.83141 ms!
image 9  contoursnumber: 1 time: 1.84443 ms!
image 10  contoursnumber: 1 time: 1.99455 ms!
image 11  contoursnumber: 4 time: 5.84807 ms!
image 12  contoursnumber: 1 time: 1.80329 ms!
image 13  contoursnumber: 2 time: 3.22477 ms!
image 14  contoursnumber: 1 time: 1.82 ms!
image 15  contoursnumber: 3 time: 4.60704 ms!
image 16  contoursnumber: 2 time: 3.2597 ms!
image 17  contoursnumber: 5 time: 7.34827 ms!
image 18  contoursnumber: 3 time: 4.52042 ms!
image 19  contoursnumber: 4 time: 5.95517 ms!
image 20  contoursnumber: 5 time: 7.40705 ms!
image 21  contoursnumber: 10 time: 13.9073 ms!
image 22  contoursnumber: 6 time: 8.53849 ms!
image 23  contoursnumber: 9 time: 12.786 ms!
image 24  contoursnumber: 16 time: 22.1717 ms!
image 25  contoursnumber: 14 time: 35.0056 ms!
image 26  contoursnumber: 19 time: 25.7021 ms!
image 27  contoursnumber: 16 time: 21.7129 ms!
image 28  contoursnumber: 14 time: 19.0482 ms!
image 29  contoursnumber: 16 time: 21.5907 ms!
image 30  contoursnumber: 20 time: 26.9468 ms!
image 31  contoursnumber: 35 time: 46.3722 ms!
image 32  contoursnumber: 35 time: 46.4969 ms!
image 33  contoursnumber: 33 time: 59.5481 ms!
image 34  contoursnumber: 33 time: 44.0666 ms!
image 35  contoursnumber: 35 time: 46.5968 ms!
image 36  contoursnumber: 30 time: 40.0138 ms!
image 37  contoursnumber: 43 time: 56.8861 ms!
image 38  contoursnumber: 29 time: 38.9458 ms!
image 39  contoursnumber: 40 time: 68.6904 ms!
image 40  contoursnumber: 60 time: 79.8226 ms!
image 41  contoursnumber: 59 time: 79.163 ms!
image 42  contoursnumber: 68 time: 89.7483 ms!
image 43  contoursnumber: 60 time: 79.693 ms!
image 44  contoursnumber: 83 time: 116.971 ms!
image 45  contoursnumber: 71 time: 93.7241 ms!
image 46  contoursnumber: 83 time: 109.284 ms!
image 47  contoursnumber: 85 time: 132.655 ms!
image 48  contoursnumber: 77 time: 101.51 ms!
image 49  contoursnumber: 82 time: 108.38 ms!
image 50  contoursnumber: 95 time: 124.985 ms!
image 51  contoursnumber: 107 time: 156.501 ms!
image 52  contoursnumber: 101 time: 133.039 ms!
image 53  contoursnumber: 95 time: 127.139 ms!
image 54  contoursnumber: 74 time: 125.535 ms!
image 55  contoursnumber: 95 time: 134.212 ms!
image 56  contoursnumber: 86 time: 113.548 ms!
image 57  contoursnumber: 94 time: 152.801 ms!
image 58  contoursnumber: 104 time: 144.552 ms!
image 59  contoursnumber: 93 time: 127.866 ms!
image 60  contoursnumber: 128 time: 197.675 ms!
image 61  contoursnumber: 104 time: 144.514 ms!
image 62  contoursnumber: 101 time: 141.193 ms!
image 63  contoursnumber: 99 time: 160.731 ms!
image 64  contoursnumber: 103 time: 141.909 ms!
image 65  contoursnumber: 111 time: 151.289 ms!
image 66  contoursnumber: 129 time: 203.25 ms!
image 67  contoursnumber: 109 time: 150.636 ms!
image 68  contoursnumber: 121 time: 165.343 ms!
image 69  contoursnumber: 116 time: 180.688 ms!
image 70  contoursnumber: 112 time: 155.768 ms!
image 71  contoursnumber: 113 time: 156.376 ms!
image 72  contoursnumber: 115 time: 180.892 ms!
image 73  contoursnumber: 104 time: 144.901 ms!
image 74  contoursnumber: 95 time: 133.066 ms!
image 75  contoursnumber: 108 time: 170.592 ms!
image 76  contoursnumber: 77 time: 110.238 ms!
image 77  contoursnumber: 115 time: 158.94 ms!
image 78  contoursnumber: 103 time: 163.685 ms!
image 79  contoursnumber: 91 time: 125.024 ms!
image 80  contoursnumber: 84 time: 114.479 ms!
image 81  contoursnumber: 88 time: 120.804 ms!
image 82  contoursnumber: 72 time: 125.362 ms!
image 83  contoursnumber: 74 time: 103.037 ms!
image 84  contoursnumber: 75 time: 103.719 ms!
image 85  contoursnumber: 72 time: 100.35 ms!
image 86  contoursnumber: 71 time: 119.934 ms!
image 87  contoursnumber: 58 time: 87.3322 ms!
image 88  contoursnumber: 61 time: 88.1855 ms!
image 89  contoursnumber: 68 time: 97.5548 ms!
image 90  contoursnumber: 49 time: 95.7405 ms!
image 91  contoursnumber: 52 time: 76.9936 ms!
image 92  contoursnumber: 57 time: 83.1304 ms!
image 93  contoursnumber: 60 time: 86.7529 ms!
image 94  contoursnumber: 51 time: 75.4882 ms!
image 95  contoursnumber: 20 time: 33.4222 ms!
image 96  contoursnumber: 29 time: 45.5786 ms!
image 97  contoursnumber: 21 time: 33.7253 ms!
image 98  contoursnumber: 10 time: 21.7852 ms!
image 99  contoursnumber: 7 time: 17.7007 ms!
image 100  contoursnumber: 4 time: 13.0195 ms!
image 101  contoursnumber: 3 time: 12.3081 ms!
image 102  contoursnumber: 1 time: 9.95971 ms!
image 103  contoursnumber: 1 time: 9.39383 ms!
image 104  contoursnumber: 2 time: 3.38718 ms!
image 105  contoursnumber: 2 time: 3.18975 ms!
image 106  contoursnumber: 1 time: 1.76616 ms!
image 107  contoursnumber: 1 time: 1.80187 ms!
image 108  contoursnumber: 1 time: 2.79997 ms!
image 109  contoursnumber: 1 time: 1.85328 ms!
image 110  contoursnumber: 1 time: 1.7689 ms!
mean time 75.6548 ms
max time 500.349 ms
root@rootwd-Default-string:/home/jumper/xrt/parallel/xrtCNNTry/Release# 

看吧比去年的CPU版本快了很多很多很多啊!!!开心。对了我的电脑是i7-6700。

下次在i9上试下。在Intel Core i9-9900K上试了下:

i9-9900K上的设置:copy tensorflow_mklomp2.6

0,vi /etc/ld.so.conf
/home/jumper/workspace/tensorflow_mklomp2.6/lib

1,error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

solution:https://oneapi-src.github.io/oneDNN/dev_guide_build.html

	root@ubuntu:~/oneDNN/build# cmake ..
	-- DNNL_LIBRARY_NAME: dnnl
	-- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE) 
	-- Could NOT find Doxyrest (missing:  DOXYREST_EXECUTABLE) 
	-- Could NOT find PythonInterp (missing:  PYTHON_EXECUTABLE) (Required is at least version "2.7")
	-- Could NOT find Sphinx (missing:  SPHINX_EXECUTABLE) 
	-- Enabled workload: TRAINING
	-- Enabled primitives: ALL
	-- Enabled primitive CPU ISA: ALL
	-- Primitive cache is enabled
	-- Configuring done
	-- Generating done
	-- Build files have been written to: /root/oneDNN/build

	solution:apt install python

		 https://www.gxlcms.com/mysql-319320.html:
					wget http://sphinxsearch.com/files/sphinx-2.2.5-release.tar.gz
							  
					tar zxvf filename.tar.gz
                                        ./configure --prefix=/usr/local/sphinx
					checking MySQL include files... configure: error: missing include files.

					******************************************************************************
					ERROR: cannot find MySQL include files.

					Check that you do have MySQL include files installed.
					The package name is typically 'mysql-devel'.

					If include files are installed on your system, but you are still getting
					this message, you should do one of the following:

					1) either specify includes location explicitly, using --with-mysql-includes;
					2) or specify MySQL installation root location explicitly, using --with-mysql;
					3) or make sure that the path to 'mysql_config' program is listed in
					   your PATH environment variable.

					To disable MySQL support, use --without-mysql option.
					******************************************************************************
				        solution:apt install libmysqlclient-dev
					
					make && make install
	solution:apt install python-sphinx
		 apt install doxygen
	root@ubuntu:~/oneDNN/build# cmake ..
	-- DNNL_LIBRARY_NAME: dnnl
	-- Could NOT find Doxyrest (missing:  DOXYREST_EXECUTABLE) 

2,https://blog.csdn.net/ak18888/article/details/102409781

cp /opt/intel/mkldnn/lib/libmkldnn.so.1.0 /usr/local/lib
cp /opt/intel/mkldnn/lib/libmkldnn.so.1 /usr/local/lib
cp /opt/intel/mkldnn/lib/libmkldnn.so /usr/local/lib

error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

3,https://www.cnblogs.com/lion-zheng/p/9434467.html
最终还是将另一台电脑里装的mkl-dnn-master的external下的libmklml_intel.so、libiomp5.so拷贝到/usr/local/lib下。然后没报错了

然后测试发现时间并不怎么快啊:

root@ubuntu:/home/jumper/xrt/projects# ./xrtCNNTry

libgomp: Invalid value for environment variable OMP_NUM_THREADS
OMP: Warning #230: OMP_NUM_THREADS: Invalid symbols found. Check the value "".

User settings:

   KMP_AFFINITY=granularity=fine,verbose,compact,1,0
   KMP_BLOCKTIME=1
   KMP_SETTINGS=1
   OMP_NUM_THREADS=

Effective settings:

   KMP_ABORT_DELAY=0
   KMP_ADAPTIVE_LOCK_PROPS='1,1024'
   KMP_ALIGN_ALLOC=64
   KMP_ALL_THREADPRIVATE=128
   KMP_ATOMIC_MODE=2
   KMP_BLOCKTIME=1
   KMP_CPUINFO_FILE: value is not defined
   KMP_DETERMINISTIC_REDUCTION=false
   KMP_DEVICE_THREAD_LIMIT=2147483647
   KMP_DISP_HAND_THREAD=false
   KMP_DISP_NUM_BUFFERS=7
   KMP_DUPLICATE_LIB_OK=false
   KMP_FORCE_REDUCTION: value is not defined
   KMP_FOREIGN_THREADS_THREADPRIVATE=true
   KMP_FORKJOIN_BARRIER='2,2'
   KMP_FORKJOIN_BARRIER_PATTERN='hyper,hyper'
   KMP_FORKJOIN_FRAMES=true
   KMP_FORKJOIN_FRAMES_MODE=3
   KMP_GTID_MODE=3
   KMP_HANDLE_SIGNALS=false
   KMP_HOT_TEAMS_MAX_LEVEL=1
   KMP_HOT_TEAMS_MODE=0
   KMP_INIT_AT_FORK=true
   KMP_INIT_WAIT=2048
   KMP_ITT_PREPARE_DELAY=0
   KMP_LIBRARY=throughput
   KMP_LOCK_KIND=queuing
   KMP_MALLOC_POOL_INCR=1M
   KMP_NEXT_WAIT=1024
   KMP_NUM_LOCKS_IN_BLOCK=1
   KMP_PLAIN_BARRIER='2,2'
   KMP_PLAIN_BARRIER_PATTERN='hyper,hyper'
   KMP_REDUCTION_BARRIER='1,1'
   KMP_REDUCTION_BARRIER_PATTERN='hyper,hyper'
   KMP_SCHEDULE='static,balanced;guided,iterative'
   KMP_SETTINGS=true
   KMP_SPIN_BACKOFF_PARAMS='4096,100'
   KMP_STACKOFFSET=64
   KMP_STACKPAD=0
   KMP_STACKSIZE=4M
   KMP_STORAGE_MAP=false
   KMP_TASKING=2
   KMP_TASKLOOP_MIN_TASKS=0
   KMP_TASK_STEALING_CONSTRAINT=1
   KMP_TEAMS_THREAD_LIMIT=16
   KMP_TOPOLOGY_METHOD=all
   KMP_USER_LEVEL_MWAIT=false
   KMP_VERSION=false
   KMP_WARNINGS=true
   OMP_AFFINITY_FORMAT='OMP: pid %P tid %T thread %n bound to OS proc set {%a}'
   OMP_ALLOCATOR=omp_default_mem_alloc
   OMP_CANCELLATION=false
   OMP_DEFAULT_DEVICE=0
   OMP_DISPLAY_AFFINITY=false
   OMP_DISPLAY_ENV=false
   OMP_DYNAMIC=false
   OMP_MAX_ACTIVE_LEVELS=2147483647
   OMP_MAX_TASK_PRIORITY=0
   OMP_NESTED=false
   OMP_NUM_THREADS: value is not defined
   OMP_PLACES: value is not defined
   OMP_PROC_BIND='intel'
   OMP_SCHEDULE='static'
   OMP_STACKSIZE=4M
   OMP_TARGET_OFFLOAD=DEFAULT
   OMP_THREAD_LIMIT=2147483647
   OMP_TOOL=enabled
   OMP_TOOL_LIBRARIES: value is not defined
   OMP_WAIT_POLICY=PASSIVE
   KMP_AFFINITY='verbose,warnings,respect,granularity=fine,compact,1,0'

OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-15
OMP: Info #156: KMP_AFFINITY: 16 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 2 threads/core (8 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 3 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 0 core 4 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 0 core 5 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 0 core 6 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 0 core 7 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3813 thread 0 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3829 thread 2 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3828 thread 1 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3830 thread 3 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3831 thread 4 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3833 thread 6 bound to OS proc set 6
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3832 thread 5 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3834 thread 7 bound to OS proc set 7
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3835 thread 8 bound to OS proc set 8
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3837 thread 10 bound to OS proc set 10
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3836 thread 9 bound to OS proc set 9
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3838 thread 11 bound to OS proc set 11
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3839 thread 12 bound to OS proc set 12
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3841 thread 14 bound to OS proc set 14
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3840 thread 13 bound to OS proc set 13
OMP: Info #250: KMP_AFFINITY: pid 3796 tid 3842 thread 15 bound to OS proc set 15
image 0  contoursnumber: 1 time: 545.958 ms!
image 1  contoursnumber: 1 time: 2.75895 ms!
image 2  contoursnumber: 1 time: 2.81572 ms!
image 3  contoursnumber: 1 time: 2.60694 ms!
image 4  contoursnumber: 1 time: 2.629 ms!
image 5  contoursnumber: 1 time: 2.65074 ms!
image 6  contoursnumber: 1 time: 2.69729 ms!
image 7  contoursnumber: 2 time: 5.03756 ms!
image 8  contoursnumber: 1 time: 2.62776 ms!
image 9  contoursnumber: 1 time: 2.65478 ms!
image 10  contoursnumber: 1 time: 2.67608 ms!
image 11  contoursnumber: 4 time: 9.5571 ms!
image 12  contoursnumber: 1 time: 2.58764 ms!
image 13  contoursnumber: 2 time: 5.02788 ms!
image 14  contoursnumber: 1 time: 2.64267 ms!
image 15  contoursnumber: 3 time: 7.22725 ms!
image 16  contoursnumber: 2 time: 4.88518 ms!
image 17  contoursnumber: 5 time: 11.7531 ms!
image 18  contoursnumber: 3 time: 7.15299 ms!
image 19  contoursnumber: 4 time: 9.43651 ms!
image 20  contoursnumber: 5 time: 11.7737 ms!
image 21  contoursnumber: 10 time: 23.266 ms!
image 22  contoursnumber: 6 time: 14.0629 ms!
image 23  contoursnumber: 9 time: 20.9185 ms!
image 24  contoursnumber: 16 time: 36.8501 ms!
image 25  contoursnumber: 14 time: 32.2419 ms!
image 26  contoursnumber: 19 time: 42.6967 ms!
image 27  contoursnumber: 16 time: 36.4316 ms!
image 28  contoursnumber: 14 time: 31.6114 ms!
image 29  contoursnumber: 16 time: 36.3981 ms!
image 30  contoursnumber: 20 time: 45.1732 ms!
image 31  contoursnumber: 35 time: 79.0561 ms!
image 32  contoursnumber: 35 time: 78.977 ms!
image 33  contoursnumber: 33 time: 74.5243 ms!
image 34  contoursnumber: 33 time: 74.8152 ms!
image 35  contoursnumber: 35 time: 79.1064 ms!
image 36  contoursnumber: 30 time: 67.8377 ms!
image 37  contoursnumber: 43 time: 96.9035 ms!
image 38  contoursnumber: 29 time: 65.5133 ms!
image 39  contoursnumber: 40 time: 90.5936 ms!
image 40  contoursnumber: 60 time: 142.718 ms!
image 41  contoursnumber: 59 time: 132.777 ms!
image 42  contoursnumber: 68 time: 153.493 ms!
image 43  contoursnumber: 60 time: 135.566 ms!
image 44  contoursnumber: 83 time: 187.208 ms!
image 45  contoursnumber: 71 time: 160.202 ms!
image 46  contoursnumber: 83 time: 187.224 ms!
image 47  contoursnumber: 85 time: 192.007 ms!
image 48  contoursnumber: 77 time: 173.49 ms!
image 49  contoursnumber: 82 time: 184.844 ms!
image 50  contoursnumber: 95 time: 214.119 ms!
image 51  contoursnumber: 107 time: 241.767 ms!
image 52  contoursnumber: 101 time: 227.876 ms!
image 53  contoursnumber: 95 time: 263.539 ms!
image 54  contoursnumber: 74 time: 166.959 ms!
image 55  contoursnumber: 95 time: 214.315 ms!
image 56  contoursnumber: 86 time: 194.752 ms!
image 57  contoursnumber: 94 time: 212.681 ms!
image 58  contoursnumber: 104 time: 241.592 ms!
image 59  contoursnumber: 93 time: 210.203 ms!
image 60  contoursnumber: 128 time: 289.51 ms!
image 61  contoursnumber: 104 time: 237.612 ms!
image 62  contoursnumber: 101 time: 229.251 ms!
image 63  contoursnumber: 99 time: 224.104 ms!
image 64  contoursnumber: 103 time: 234.302 ms!
image 65  contoursnumber: 111 time: 251.253 ms!
image 66  contoursnumber: 129 time: 292.019 ms!
image 67  contoursnumber: 109 time: 246.509 ms!
image 68  contoursnumber: 121 time: 350.947 ms!
image 69  contoursnumber: 116 time: 262.582 ms!
image 70  contoursnumber: 112 time: 254.463 ms!
image 71  contoursnumber: 113 time: 385.545 ms!
image 72  contoursnumber: 115 time: 925.221 ms!
image 73  contoursnumber: 104 time: 234.835 ms!
image 74  contoursnumber: 95 time: 214.467 ms!
image 75  contoursnumber: 108 time: 244.214 ms!
image 76  contoursnumber: 77 time: 174.85 ms!
image 77  contoursnumber: 115 time: 261.896 ms!
image 78  contoursnumber: 103 time: 233.937 ms!
image 79  contoursnumber: 91 time: 207.413 ms!
image 80  contoursnumber: 84 time: 191.105 ms!
image 81  contoursnumber: 88 time: 199.925 ms!
image 82  contoursnumber: 72 time: 162.594 ms!
image 83  contoursnumber: 74 time: 168.641 ms!
image 84  contoursnumber: 75 time: 178.051 ms!
image 85  contoursnumber: 72 time: 164.123 ms!
image 86  contoursnumber: 71 time: 161.951 ms!
image 87  contoursnumber: 58 time: 132.326 ms!
image 88  contoursnumber: 61 time: 138.941 ms!
image 89  contoursnumber: 68 time: 155.192 ms!
image 90  contoursnumber: 49 time: 112.29 ms!
image 91  contoursnumber: 52 time: 118.127 ms!
image 92  contoursnumber: 57 time: 129.372 ms!
image 93  contoursnumber: 60 time: 136.79 ms!
image 94  contoursnumber: 51 time: 115.553 ms!
image 95  contoursnumber: 20 time: 46.1896 ms!
image 96  contoursnumber: 29 time: 65.7994 ms!
image 97  contoursnumber: 21 time: 47.9674 ms!
image 98  contoursnumber: 10 time: 22.9276 ms!
image 99  contoursnumber: 7 time: 16.0952 ms!
image 100  contoursnumber: 4 time: 9.60946 ms!
image 101  contoursnumber: 3 time: 7.26509 ms!
image 102  contoursnumber: 1 time: 2.59449 ms!
image 103  contoursnumber: 1 time: 2.64759 ms!
image 104  contoursnumber: 2 time: 5.07191 ms!
image 105  contoursnumber: 2 time: 5.10199 ms!
image 106  contoursnumber: 1 time: 2.58517 ms!
image 107  contoursnumber: 1 time: 2.64865 ms!
image 108  contoursnumber: 1 time: 2.71979 ms!
image 109  contoursnumber: 1 time: 2.69414 ms!
image 110  contoursnumber: 1 time: 2.72165 ms!
mean time 124.108 ms
max time 925.221 ms

然后再试试别的设置:

root@ubuntu:/home/jumper/xrt/projects# export KMP_ABORT_DELAY=0
root@ubuntu:/home/jumper/xrt/projects# export KMP_ADAPTIVE_LOCK_PROPS='1,1024'
root@ubuntu:/home/jumper/xrt/projects# export KMP_ALIGN_ALLOC=64
root@ubuntu:/home/jumper/xrt/projects# export KMP_ALL_THREADPRIVATE=128
root@ubuntu:/home/jumper/xrt/projects# export KMP_ATOMIC_MODE=2
root@ubuntu:/home/jumper/xrt/projects# export KMP_BLOCKTIME=1
root@ubuntu:/home/jumper/xrt/projects# export KMP_DETERMINISTIC_REDUCTION=false
root@ubuntu:/home/jumper/xrt/projects# export KMP_DEVICE_THREAD_LIMIT=2147483647
root@ubuntu:/home/jumper/xrt/projects# export KMP_DISP_HAND_THREAD=false
root@ubuntu:/home/jumper/xrt/projects# export KMP_DISP_NUM_BUFFERS=7
root@ubuntu:/home/jumper/xrt/projects# export KMP_DUPLICATE_LIB_OK=false
root@ubuntu:/home/jumper/xrt/projects# export KMP_FOREIGN_THREADS_THREADPRIVATE=true
root@ubuntu:/home/jumper/xrt/projects# export KMP_FORKJOIN_BARRIER='2,2'
root@ubuntu:/home/jumper/xrt/projects# export KMP_FORKJOIN_BARRIER_PATTERN='hyper,hyper'
root@ubuntu:/home/jumper/xrt/projects# export KMP_FORKJOIN_FRAMES=true
root@ubuntu:/home/jumper/xrt/projects# export KMP_FORKJOIN_FRAMES_MODE=3
root@ubuntu:/home/jumper/xrt/projects# export KMP_GTID_MODE=3
root@ubuntu:/home/jumper/xrt/projects# export KMP_HANDLE_SIGNALS=false
root@ubuntu:/home/jumper/xrt/projects# export KMP_HOT_TEAMS_MAX_LEVEL=1
root@ubuntu:/home/jumper/xrt/projects# export KMP_HOT_TEAMS_MODE=0
root@ubuntu:/home/jumper/xrt/projects# export KMP_INIT_AT_FORK=true
root@ubuntu:/home/jumper/xrt/projects# export KMP_INIT_WAIT=2048
root@ubuntu:/home/jumper/xrt/projects# export KMP_TEAMS_THREAD_LIMIT=16
root@ubuntu:/home/jumper/xrt/projects# export OMP_ALLOCATOR=omp_default_mem_alloc
root@ubuntu:/home/jumper/xrt/projects# export OMP_MAX_ACTIVE_LEVELS=2147483647
root@ubuntu:/home/jumper/xrt/projects# export OMP_MAX_TASK_PRIORITY=0
root@ubuntu:/home/jumper/xrt/projects# export OMP_NUM_THREADS=8
root@ubuntu:/home/jumper/xrt/projects# export KMP_AFFINITY='verbose,warnings,respect,granularity=fine,compact,1,0'
root@ubuntu:/home/jumper/xrt/projects# ./xrtCNNTry

User settings:

   KMP_ABORT_DELAY=0
   KMP_ADAPTIVE_LOCK_PROPS=1,1024
   KMP_AFFINITY=verbose,warnings,respect,granularity=fine,compact,1,0
   KMP_ALIGN_ALLOC=64
   KMP_ALL_THREADPRIVATE=128
   KMP_ATOMIC_MODE=2
   KMP_BLOCKTIME=1
   KMP_DETERMINISTIC_REDUCTION=false
   KMP_DEVICE_THREAD_LIMIT=2147483647
   KMP_DISP_HAND_THREAD=false
   KMP_DISP_NUM_BUFFERS=7
   KMP_DUPLICATE_LIB_OK=false
   KMP_FOREIGN_THREADS_THREADPRIVATE=true
   KMP_FORKJOIN_BARRIER=2,2
   KMP_FORKJOIN_BARRIER_PATTERN=hyper,hyper
   KMP_FORKJOIN_FRAMES=true
   KMP_FORKJOIN_FRAMES_MODE=3
   KMP_GTID_MODE=3
   KMP_HANDLE_SIGNALS=false
   KMP_HOT_TEAMS_MAX_LEVEL=1
   KMP_HOT_TEAMS_MODE=0
   KMP_INIT_AT_FORK=true
   KMP_INIT_WAIT=2048
   KMP_SETTINGS=1
   KMP_TEAMS_THREAD_LIMIT=16
   OMP_ALLOCATOR=omp_default_mem_alloc
   OMP_MAX_ACTIVE_LEVELS=2147483647
   OMP_MAX_TASK_PRIORITY=0
   OMP_NUM_THREADS=8

Effective settings:

   KMP_ABORT_DELAY=0
   KMP_ADAPTIVE_LOCK_PROPS='1,1024'
   KMP_ALIGN_ALLOC=64
   KMP_ALL_THREADPRIVATE=128
   KMP_ATOMIC_MODE=2
   KMP_BLOCKTIME=1
   KMP_CPUINFO_FILE: value is not defined
   KMP_DETERMINISTIC_REDUCTION=false
   KMP_DEVICE_THREAD_LIMIT=2147483647
   KMP_DISP_HAND_THREAD=false
   KMP_DISP_NUM_BUFFERS=7
   KMP_DUPLICATE_LIB_OK=false
   KMP_FORCE_REDUCTION: value is not defined
   KMP_FOREIGN_THREADS_THREADPRIVATE=true
   KMP_FORKJOIN_BARRIER='2,2'
   KMP_FORKJOIN_BARRIER_PATTERN='hyper,hyper'
   KMP_FORKJOIN_FRAMES=true
   KMP_FORKJOIN_FRAMES_MODE=3
   KMP_GTID_MODE=3
   KMP_HANDLE_SIGNALS=false
   KMP_HOT_TEAMS_MAX_LEVEL=1
   KMP_HOT_TEAMS_MODE=0
   KMP_INIT_AT_FORK=true
   KMP_INIT_WAIT=4096
   KMP_ITT_PREPARE_DELAY=0
   KMP_LIBRARY=throughput
   KMP_LOCK_KIND=queuing
   KMP_MALLOC_POOL_INCR=1M
   KMP_NEXT_WAIT=1024
   KMP_NUM_LOCKS_IN_BLOCK=1
   KMP_PLAIN_BARRIER='2,2'
   KMP_PLAIN_BARRIER_PATTERN='hyper,hyper'
   KMP_REDUCTION_BARRIER='1,1'
   KMP_REDUCTION_BARRIER_PATTERN='hyper,hyper'
   KMP_SCHEDULE='static,balanced;guided,iterative'
   KMP_SETTINGS=true
   KMP_SPIN_BACKOFF_PARAMS='4096,100'
   KMP_STACKOFFSET=64
   KMP_STACKPAD=0
   KMP_STACKSIZE=4M
   KMP_STORAGE_MAP=false
   KMP_TASKING=2
   KMP_TASKLOOP_MIN_TASKS=0
   KMP_TASK_STEALING_CONSTRAINT=1
   KMP_TEAMS_THREAD_LIMIT=16
   KMP_TOPOLOGY_METHOD=all
   KMP_USER_LEVEL_MWAIT=false
   KMP_VERSION=false
   KMP_WARNINGS=true
   OMP_AFFINITY_FORMAT='OMP: pid %P tid %T thread %n bound to OS proc set {%a}'
   OMP_ALLOCATOR=omp_default_mem_alloc
   OMP_CANCELLATION=false
   OMP_DEFAULT_DEVICE=0
   OMP_DISPLAY_AFFINITY=false
   OMP_DISPLAY_ENV=false
   OMP_DYNAMIC=false
   OMP_MAX_ACTIVE_LEVELS=2147483647
   OMP_MAX_TASK_PRIORITY=0
   OMP_NESTED=false
   OMP_NUM_THREADS='8'
   OMP_PLACES: value is not defined
   OMP_PROC_BIND='intel'
   OMP_SCHEDULE='static'
   OMP_STACKSIZE=4M
   OMP_TARGET_OFFLOAD=DEFAULT
   OMP_THREAD_LIMIT=2147483647
   OMP_TOOL=enabled
   OMP_TOOL_LIBRARIES: value is not defined
   OMP_WAIT_POLICY=PASSIVE
   KMP_AFFINITY='verbose,warnings,respect,granularity=fine,compact,1,0'

OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-15
OMP: Info #156: KMP_AFFINITY: 16 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 2 threads/core (8 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 1 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 2 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 3 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 0 core 4 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 0 core 5 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 0 core 6 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 0 core 7 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7168 thread 0 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7187 thread 1 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7188 thread 2 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7189 thread 3 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7190 thread 4 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7191 thread 5 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7192 thread 6 bound to OS proc set 6
OMP: Info #250: KMP_AFFINITY: pid 7151 tid 7193 thread 7 bound to OS proc set 7
image 0  contoursnumber: 1 time: 540.462 ms!
image 1  contoursnumber: 1 time: 2.43491 ms!
image 2  contoursnumber: 1 time: 2.34395 ms!
image 3  contoursnumber: 1 time: 2.29904 ms!
image 4  contoursnumber: 1 time: 2.29803 ms!
image 5  contoursnumber: 1 time: 2.26987 ms!
image 6  contoursnumber: 1 time: 2.31498 ms!
image 7  contoursnumber: 2 time: 4.33249 ms!
image 8  contoursnumber: 1 time: 2.30232 ms!
image 9  contoursnumber: 1 time: 2.26087 ms!
image 10  contoursnumber: 1 time: 2.26373 ms!
image 11  contoursnumber: 4 time: 8.23919 ms!
image 12  contoursnumber: 1 time: 2.22784 ms!
image 13  contoursnumber: 2 time: 4.24408 ms!
image 14  contoursnumber: 1 time: 2.24563 ms!
image 15  contoursnumber: 3 time: 6.15239 ms!
image 16  contoursnumber: 2 time: 4.1945 ms!
image 17  contoursnumber: 5 time: 10.9899 ms!
image 18  contoursnumber: 3 time: 6.09401 ms!
image 19  contoursnumber: 4 time: 8.16248 ms!
image 20  contoursnumber: 5 time: 10.1609 ms!
image 21  contoursnumber: 10 time: 19.7754 ms!
image 22  contoursnumber: 6 time: 11.9569 ms!
image 23  contoursnumber: 9 time: 17.9706 ms!
image 24  contoursnumber: 16 time: 31.5628 ms!
image 25  contoursnumber: 14 time: 27.4714 ms!
image 26  contoursnumber: 19 time: 36.654 ms!
image 27  contoursnumber: 16 time: 30.9366 ms!
image 28  contoursnumber: 14 time: 27.241 ms!
image 29  contoursnumber: 16 time: 31.1286 ms!
image 30  contoursnumber: 20 time: 38.6733 ms!
image 31  contoursnumber: 35 time: 67.5998 ms!
image 32  contoursnumber: 35 time: 67.9385 ms!
image 33  contoursnumber: 33 time: 97.487 ms!
image 34  contoursnumber: 33 time: 63.8534 ms!
image 35  contoursnumber: 35 time: 67.7263 ms!
image 36  contoursnumber: 30 time: 57.8766 ms!
image 37  contoursnumber: 43 time: 83.2774 ms!
image 38  contoursnumber: 29 time: 56.219 ms!
image 39  contoursnumber: 40 time: 77.5566 ms!
image 40  contoursnumber: 60 time: 115.673 ms!
image 41  contoursnumber: 59 time: 114.054 ms!
image 42  contoursnumber: 68 time: 131.358 ms!
image 43  contoursnumber: 60 time: 123.464 ms!
image 44  contoursnumber: 83 time: 162.16 ms!
image 45  contoursnumber: 71 time: 139.174 ms!
image 46  contoursnumber: 83 time: 162.967 ms!
image 47  contoursnumber: 85 time: 166.877 ms!
image 48  contoursnumber: 77 time: 164.705 ms!
image 49  contoursnumber: 82 time: 158.783 ms!
image 50  contoursnumber: 95 time: 183.512 ms!
image 51  contoursnumber: 107 time: 206.621 ms!
image 52  contoursnumber: 101 time: 194.661 ms!
image 53  contoursnumber: 95 time: 183.088 ms!
image 54  contoursnumber: 74 time: 142.938 ms!
image 55  contoursnumber: 95 time: 183.331 ms!
image 56  contoursnumber: 86 time: 166.222 ms!
image 57  contoursnumber: 94 time: 181.357 ms!
image 58  contoursnumber: 104 time: 200.238 ms!
image 59  contoursnumber: 93 time: 179.249 ms!
image 60  contoursnumber: 128 time: 246.765 ms!
image 61  contoursnumber: 104 time: 200.685 ms!
image 62  contoursnumber: 101 time: 194.998 ms!
image 63  contoursnumber: 99 time: 191.143 ms!
image 64  contoursnumber: 103 time: 198.842 ms!
image 65  contoursnumber: 111 time: 214.208 ms!
image 66  contoursnumber: 129 time: 248.685 ms!
image 67  contoursnumber: 109 time: 210.33 ms!
image 68  contoursnumber: 121 time: 233.23 ms!
image 69  contoursnumber: 116 time: 224.134 ms!
image 70  contoursnumber: 112 time: 216.384 ms!
image 71  contoursnumber: 113 time: 218.136 ms!
image 72  contoursnumber: 115 time: 221.682 ms!
image 73  contoursnumber: 104 time: 200.762 ms!
image 74  contoursnumber: 95 time: 183.564 ms!
image 75  contoursnumber: 108 time: 207.921 ms!
image 76  contoursnumber: 77 time: 148.792 ms!
image 77  contoursnumber: 115 time: 221.789 ms!
image 78  contoursnumber: 103 time: 203.962 ms!
image 79  contoursnumber: 91 time: 175.547 ms!
image 80  contoursnumber: 84 time: 162.185 ms!
image 81  contoursnumber: 88 time: 169.608 ms!
image 82  contoursnumber: 72 time: 138.911 ms!
image 83  contoursnumber: 74 time: 142.687 ms!
image 84  contoursnumber: 75 time: 145.751 ms!
image 85  contoursnumber: 72 time: 139.121 ms!
image 86  contoursnumber: 71 time: 137.16 ms!
image 87  contoursnumber: 58 time: 111.89 ms!
image 88  contoursnumber: 61 time: 117.719 ms!
image 89  contoursnumber: 68 time: 131.638 ms!
image 90  contoursnumber: 49 time: 94.6617 ms!
image 91  contoursnumber: 52 time: 100.442 ms!
image 92  contoursnumber: 57 time: 110.199 ms!
image 93  contoursnumber: 60 time: 115.842 ms!
image 94  contoursnumber: 51 time: 98.3639 ms!
image 95  contoursnumber: 20 time: 38.7647 ms!
image 96  contoursnumber: 29 time: 56.1158 ms!
image 97  contoursnumber: 21 time: 40.8237 ms!
image 98  contoursnumber: 10 time: 19.4679 ms!
image 99  contoursnumber: 7 time: 13.7955 ms!
image 100  contoursnumber: 4 time: 7.94654 ms!
image 101  contoursnumber: 3 time: 6.07278 ms!
image 102  contoursnumber: 1 time: 2.28709 ms!
image 103  contoursnumber: 1 time: 2.31357 ms!
image 104  contoursnumber: 2 time: 4.27889 ms!
image 105  contoursnumber: 2 time: 4.27259 ms!
image 106  contoursnumber: 1 time: 2.32682 ms!
image 107  contoursnumber: 1 time: 2.37074 ms!
image 108  contoursnumber: 1 time: 2.33745 ms!
image 109  contoursnumber: 1 time: 2.34833 ms!
image 110  contoursnumber: 1 time: 2.31133 ms!
mean time 99.8847 ms
max time 540.462 ms
root@ubuntu:/home/jumper/xrt/projects# 

不同的设置有不同的结果。不知道怎么设置才更优,是否哪里有什么设置诀窍?还要再研究下。

对CPU超线程与推理性能的一些理解_sandmangu的专栏-CSDN博客

下一步是研究mkldnn-tbb以及 intel低精度AI等等。也许比上面的速度还快。另外看到这个作者干货|基于CPU的深度学习推理部署优化实践_weixin_34407348的博客-CSDN博客写的MKLDNN与Openvino的对比,我觉得后续可以测试下是否后者更快。另外这里也讲了使用MKLDNN设置OMP_NUM_THREADS =cpu核;KMP_BLOCKTIME = 10; KMP_AFFINITY=granularity=fine, verbose, compact,1,0设置这三个选项非常重要,而且我实际操作发现第一个选项并不是核数设置越多越快,还是多实践。

//

另外大家看下这篇 tensorflow C++ Mask RCNN图像分割,cv::dnn不能并行?openvino? - 秦时明月卫庄 - 博客园

我看了github上所有涉及M-RCNN C++ inference的实例,实在不知道问题在哪里,实例我放在这里:https://download.csdn.net/download/wd1603926823/54178267

#include "linkuang.h"
using namespace std;
using namespace cv;

linkuang::linkuang(int imgsize,float lesswaste_prob,float morewaste_prob) {
	// TODO Auto-generated constructor stub
	standard_rows=imgsize;
	standard_cols=imgsize;

	lesswastethresh=lesswaste_prob;
	morewastethresh=morewaste_prob;
	getimgsindex=0;
	showtestindex=0;

	std::string graphpath="/home/jumper/xrt/reference/cnnmodel/model.pb";

	///CNN initiation--
	tensorflow::Status status = NewSession(tensorflow::SessionOptions(), &session);
	if (!status.ok())
	{
		throw std::runtime_error("ERROR: linkuang CNN NewSession() init failed...");
	}

	tensorflow::GraphDef graphdef;
	tensorflow::Status status_load = ReadBinaryProto(tensorflow::Env::Default(), graphpath, &graphdef);
	if (!status_load.ok())
	{
		std::cout << status_load.ToString() <Create(graphdef);
	if (!status_create.ok())
	{
		std::cerr <().data();
//	Mat cnninputImg(standard_rows, standard_cols, CV_32FC1, imgdata);
//	standardinput.convertTo(cnninputImg,CV_32FC1);
//	cnninputImg=cnninputImg/255;

	auto outputMap =resized_tensor.tensor();//获取tensor指针
	for(int r=0;r(r)[c])/255;
		}
	}

	//CNN input
	std::vector > inputs;
	std::string Input1Name = "input_1_1";//"input_1_1:0";
	inputs.push_back(std::make_pair(Input1Name, resized_tensor));

	//CNN predict
	std::vector outputs;
	std::string output="output_1";//"output_1:0";

	tensorflow::Status status_run = session->Run({{Input1Name,resized_tensor}}, {output}, {}, &outputs);
	if (!status_run.ok()) {
	   std::cout <<"ERROR: RUN failed in real inference()..."<< status_run.ToString() << "\n";
	   return -1;
	}

	int flag=getOutputImg(outputs[0],outputimg);
	if(flag!=0)
	{
		std::cout <<"ERROR: RUN failed in getCnnRealLabel()..."<(r)[c]<<"\t";
		}
		cout<();        //

//	  cout<(r)[c]=255*value;
		}
		cout<();        // Tensor Shape: [batch_size, target_class_num]
	  int output_dim = probabilities.shape().dim_size(1);  // Get the target_class_num from 1st dimension

	  output_class_id=0;
	  float primerprob=tmap(0, 0);
	  if(tmap(0, 1)>primerprob)
	  {
		  primerprob=tmap(0, 1);
		  output_class_id=1;
	  }

	  output_prob=primerprob;

	  return 0;
}


linkuang::~linkuang() {
	// TODO Auto-generated destructor stub
	tensorflow::Status freestatus=session->Close();
	if (!freestatus.ok())
	{
		throw std::runtime_error("ERROR: close session...");
	}

这个写法为什么不对呢?!

你可能感兴趣的:(AI,High,Themes,AI,Basises,tensorflow,c++)