最近在跑一篇论文时,最后roslaunch tracking_slam tb3_test.launch
时总是报以下错误:
Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
CUDA版本:
cudnn版本:
显卡驱动:
opencv版本:
先来回忆一下caffe-segnet-cudnn5.1的安装过程
git clone https://github.com/TimoSaemann/caffe-segnet-cudnn5.git
sudo cp Makefile.config.example Makefile.config
sudo gedit Makefile.config
启用cudnn:
将
#USE_CUDNN := 1
修改成:
USE_CUDNN := 1
设置opencv 版本:
将
#OPENCV_VERSION := 3
修改为:
OPENCV_VERSION := 3
启用python 接口:
将
#WITH_PYTHON_LAYER := 1
修改为
WITH_PYTHON_LAYER := 1
修改 python 路径:
将
#WITH_PYTHON_LAYER := 1
修改为
WITH_PYTHON_LAYER := 1
修改 caffe-segnet目录下的 Makefile 文件:
将:
NVCCFLAGS +=-ccbin=$(CXX) -Xcompiler-fPIC $(COMMON_FLAGS)
替换为:
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
将:
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
改为:
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial
修改 /usr/local/cuda/include/host_config.h 文件 :
将
#error-- unsupported GNU version! gcc versions later than 5.0 are not supported!
改为
//#error-- unsupported GNU version! gcc versions later than 5.0 are not supported!
开始编译,在caffe-segnet目录下执行 :
make all -j4
以上步骤都正常,一些小trick均已解决。但是在下面的测试中报错:
测试编译是否成功:
sudo ldconfig /usr/local/cuda/lib64
sudo make test -j4
sudo make runtest -j4
错误:
F0927 16:20:13.189004 4091 math_functions.cu:394] Check failed: status == CURAND_STATUS_SUCCESS (201 vs. 0) CURAND_STATUS_LAUNCH_FAILURE
*** Check failure stack trace: ***
@ 0x7f6a1abc15cd google::LogMessage::Fail()
@ 0x7f6a1abc3433 google::LogMessage::SendToLog()
@ 0x7f6a1abc115b google::LogMessage::Flush()
@ 0x7f6a1abc3e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f6a0f5da3b4 caffe::caffe_gpu_rng_uniform<>()
@ 0x7f6a0f6008f3 caffe::PoolingLayer<>::Forward_gpu()
@ 0x47a436 caffe::Layer<>::Forward()
@ 0x480092 caffe::GradientChecker<>::CheckGradientSingle()
@ 0x52a592 caffe::GPUStochasticPoolingLayerTest_TestGradient_Test<>::TestBody()
@ 0x8f9923 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x8f373a testing::Test::Run()
@ 0x8f3888 testing::TestInfo::Run()
@ 0x8f3965 testing::TestCase::Run()
@ 0x8f4b7f testing::internal::UnitTestImpl::RunAllTests()
@ 0x8f4e93 testing::UnitTest::Run()
@ 0x46f22d main
@ 0x7f6a0e775840 __libc_start_main
@ 0x476ca9 _start
@ (nil) (unknown)
Makefile:526: recipe for target ‘runtest’ failed
make: *** [runtest] 已放弃 (core dumped)
xx@xx-OMEN-by-HP-Laptop:~/catkin_ws/src/tracking_slam/caffe-segnet-cudnn5$ make test -j4
make: Nothing to be done for ‘test’.
尝试解决1:
去cuda官网安装cuda 8.0的补丁,但还是不行。
尝试2:
直接cd进入caffe-segnet-cudnn5.1文件夹,执行以下命令:
cd ../caffe-segnet-cudnn5/
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4
roslaunch tracking_slam tb3_test.launch
rosbag play hd3_2018-12-14-16-29-16.bag
有一个数据集视频显示在rviz左边小框,但并没有建图。
报错:
F0928 17:40:46.945029 6136 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
尝试3:
重新卸载并下载cuda 8.0和cudnn 5.1,进入caffe-segnet-cudnn5.1目录:
make clean
make all -j4
sudo ldconfig /usr/local/cuda/lib64
sudo make test -j4
sudo make runtest -j4
依然报 Check failed: status == CURAND_STATUS_SUCCESS (201 vs. 0) CURAND_STATUS_LAUNCH_FAILURE
这个问题。
根据CURAND_STATUS_SUCCESS (201 vs. 0) CURAND_STATUS_LAUNCH_FAILURE #1400
sudo apt-get remove --auto-remove nvidia-cuda-toolkit
修改后,再一次:
make clean
make all -j4
sudo make test -j4
sudo make runtest -j4
xx@xx-OMEN-by-HP-Laptop:~/catkin_ws/src/tracking_slam/caffe-segnet-cudnn5$ sudo make runtest -j4
.build_release/tools/caffe
.build_release/tools/caffe: error while loading shared libraries: libcudart.so.7.5: cannot open shared object file: No such file or directory
Makefile:526: recipe for target 'runtest' failed
make: *** [runtest] Error 127
还是报错,当然在这过程中,我注意到在make test和make runtest后,总会有以下警告信息:
/usr/bin/ld: warning: libcudart.so.7.5, needed by /usr/local/lib/libopencv_core.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppc.so.7.5, needed by /usr/local/lib/libopencv_core.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppi.so.7.5, needed by /usr/local/lib/libopencv_core.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnpps.so.7.5, needed by /usr/local/lib/libopencv_core.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libcufft.so.7.5, needed by /usr/local/lib/libopencv_core.so, not found (try using -rpath or -rpath-link)
LD .build_release/src/caffe/test/test_upgrade_proto.o
............
个人觉得是 /usr/local/lib/libopencv_core.so这个文件需要cuda 7.5,0的一些文件、库,即cuda 8.0和opencv 3.1.0不兼容。明天把opencv换成2.4.13的再试试。
首先根据之前一篇博客卸载opencv 3.1.0:ubuntu卸载及重新安装opencv(解决CUDA10与opencv 3.1.0版本不兼容问题)
然后安装opencv 2.4.13:
sudo apt-get install build-essential -y
sudo apt-get install libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev -y
sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev -y
sudo apt-get install libgtk2.0-dev -y
sudo apt-get install pkg-config -y
~/catkin_ws/src
,从GitHub下载opencv 2.4.13,这不是一个 git repository,使用wget。将下载的OpenCV解压~/catkin_ws/src
目录下:wget https://github.com/Itseez/opencv/archive/2.4.13.zip
unzip 2.4.13.zip
cd opencv-2.4.13/
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local ..
make -j4
sudo make install
sudo gedit /etc/ld.so.conf.d/opencv.conf
/etc/ld.so.conf.d/并没有opencv.conf,所以相当于得自己添加了一个opencv.conf文件,加入:
/usr/local/lib
保存退出。
sudo ldconfig
sudo gedit /etc/bash.bashrc
末尾加入:
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig export PKG_CONFIG_PATH
保存退出。
使配置生效:
sudo s
输入root密码
source /etc/bash.bashrc
Ctrl+d #(退出root)
sudo updatedb #更新database
然后进入caffe-segnet-cudnn5目录下,修改 Makefile.config 文件内容:
将 OPENCV_VERSION := 3
用#号注释掉(个人觉得既然下载回opencv2了,就使用默认的opencv版本),不然make all -j4时会报以下错误:
/usr/bin/ld: 找不到 -lopencv_imgcodecs
Makefile:566: recipe for target '.build_release/lib/libcaffe.so.1.0.0-rc3' failed make: *** [.build_release/lib/libcaffe.so.1.0.0-rc3] Error 1
然后:
make clean
make all -j4
sudo ldconfig /usr/local/cuda/lib64
sudo make test -j4
sudo make runtest -j4
roslaunch tracking_slam tb3_test.launch
F0929 13:53:44.471045 22307 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
[ERROR] [1632894824.504076604]: PluginlibFactory: The plugin for class 'octomap_rviz_plugin/ColorOccupancyGrid' failed to load. Error: According to the loaded plugin descriptions the class octomap_rviz_plugin/ColorOccupancyGrid with base class type rviz::Display does not exist. Declared types are rviz/Axes rviz/Camera rviz/DepthCloud rviz/Effort rviz/FluidPressure rviz/Grid rviz/GridCells rviz/Illuminance rviz/Image rviz/InteractiveMarkers rviz/LaserScan rviz/Map rviz/Marker rviz/MarkerArray rviz/Odometry rviz/Path rviz/PointCloud rviz/PointCloud2 rviz/PointStamped rviz/Polygon rviz/Pose rviz/PoseArray rviz/PoseWithCovariance rviz/Range rviz/RelativeHumidity rviz/RobotModel rviz/TF rviz/Temperature rviz/WrenchStamped rviz_plugin_tutorials/Imu
[tracking_slam_node-1] process has died [pid 22084, exit code -6, cmd /home/xx/catkin_ws/devel/lib/tracking_slam/tracking_slam_node __name:=tracking_slam_node __log:=/home/xx/.ros/log/8d07e664-20e9-11ec-8a5c-887873831b6b/tracking_slam_node-1.log].
log file: /home/xx/.ros/log/8d07e664-20e9-11ec-8a5c-887873831b6b/tracking_slam_node-1*.log
解决:octomap_rviz_plugins
根据in ubuntu 18.04, using melodic ROS, can not install octomap_rviz_plugins #15,下载 rviz_plugins 并将文件夹放在您的 ros 工作区/src 中,然后catkin_make插件将被安装。
2. 再次:
roslaunch tracking_slam tb3_test.launch
又报错:
cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
[tracking_slam_node-2] process has died [pid 31579, exit code -6, cmd /home/xx/catkin_ws/devel/lib/tracking_slam/tracking_slam_node __name:=tracking_slam_node __log:=/home/xx/.ros/log/7afad10a-20f5-11ec-8a5c-887873831b6b/tracking_slam_node-2.log].
log file: /home/xx/.ros/log/7afad10a-20f5-11ec-8a5c-887873831b6b/tracking_slam_node-2*.log
3. 再次尝试:
卸载cudnn 5.1.10,换成版本更低的cudnn 5,重新编译一边所有文件,再次:
roslaunch tracking_slam tb3_test.launch
依旧报错:
F0929 18:55:04.884161 16169 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
[ INFO] [1632912905.796436179]: Stereo is NOT SUPPORTED
[ INFO] [1632912905.796709093]: OpenGl version: 4.5 (GLSL 4.5).
0x16baca0 void QWindowPrivate::setTopLevelScreen(QScreen*, bool) ( QScreen(0x91b090) ): Attempt to set a screen on a child window.
0x16bba20 void QWindowPrivate::setTopLevelScreen(QScreen*, bool) ( QScreen(0x91b090) ): Attempt to set a screen on a child window.
0x16c9340 void QWindowPrivate::setTopLevelScreen(QScreen*, bool) ( QScreen(0x91b090) ): Attempt to set a screen on a child window.
0x16bb580 void QWindowPrivate::setTopLevelScreen(QScreen*, bool) ( QScreen(0x91b090) ): Attempt to set a screen on a child window.
[tracking_slam_node-1] process has died [pid 15938, exit code -6, cmd /home/xx/catkin_ws/devel/lib/tracking_slam/tracking_slam_node __name:=tracking_slam_node __log:=/home/xx/.ros/log/a3eb18e0-2113-11ec-8a5c-887873831b6b/tracking_slam_node-1.log].
log file: /home/xx/.ros/log/a3eb18e0-2113-11ec-8a5c-887873831b6b/tracking_slam_node-1*.log
4. 加入engine: CAFFE
Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #3
在conv.prototxt中加入:
engine: CAFFE
5. 再次尝试:
根据CUDNN_STATUS_SUCCESS(4 对 0) CUDNN_STATUS_INTERNAL_ERROR #6873:
sudo rm -rf ~/.nv/
还是报错:
F0929 21:19:53.521602 6044 io.cpp:54] Check failed: fd != -1 (-1 vs. -1) File not found: /home/xx/catkin_ws/src/tracking_slam/config/segnet/segnet_pascal.caffemodel
*** Check failure stack trace: ***
[tracking_slam_node-1] process has died [pid 5799, exit code -6, cmd /home/xx/catkin_ws/devel/lib/tracking_slam/tracking_slam_node __name:=tracking_slam_node __log:=/home/xx/.ros/log/a3eb18e0-2113-11ec-8a5c-887873831b6b/tracking_slam_node-1.log].
log file: /home/xx/.ros/log/a3eb18e0-2113-11ec-8a5c-887873831b6b/tracking_slam_node-1*.log
这次错误全网找不到相关错误了。
6. 网上也有说是电脑显存不足的原因,但查了查显存,还有很多:
caffe的环境配置没问题:
******************* Caffe Configuration Summary *******************
-- General:
-- Version : 1.0.0-rc3
-- Git : abcf30d-dirty
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
-- Debug CXX flags : -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
-- Build type : Release
--
-- BUILD_SHARED_LIBS : ON
-- BUILD_python : ON
-- BUILD_matlab : OFF
-- BUILD_docs : ON
-- CPU_ONLY : OFF
-- USE_OPENCV : ON
-- USE_LEVELDB : ON
-- USE_LMDB : ON
-- ALLOW_LMDB_NOLOCK : OFF
--
-- Dependencies:
-- BLAS : Yes (Atlas)
-- Boost : Yes (ver. 1.58)
-- glog : Yes
-- gflags : Yes
-- protobuf : Yes (ver. 2.6.1)
-- lmdb : Yes (ver. 0.9.17)
-- LevelDB : Yes (ver. 1.18)
-- Snappy : Yes (ver. 1.1.3)
-- OpenCV : Yes (ver. 3.3.1)
-- CUDA : Yes (ver. 8.0)
--
-- NVIDIA CUDA:
-- Target GPU(s) : Auto
-- GPU arch(s) : sm_61
-- cuDNN : Yes (ver. 5.0.5)
--
-- Python:
-- Interpreter : /usr/bin/python2.7 (ver. 2.7.12)
-- Libraries : /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.12)
-- NumPy : /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.11.0)
--
-- Documentaion:
-- Doxygen : No
-- config_file :
--
-- Install:
-- Install path : /home/xx/catkin_ws/src/tracking_slam/caffe-segnet-cudnn5/build/install
--
回来了,我应该是caffe的makefile文件、 Makefile.config 文件改太多了,我不作修改,直接:
cd ../caffe-segnet-cudnn5/
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4
根据报错信息修改相关路径即可。
reference:
caffe-segnet-cudnn5安装