1.问题描述
今天用faster-rcnn跑数据模型报错:
prepared the input data
F0531 13:41:25.938465 12409 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f23e6c985cd google::LogMessage::Fail()
@ 0x7f23e6c9a433 google::LogMessage::SendToLog()
@ 0x7f23e6c9815b google::LogMessage::Flush()
@ 0x7f23e6c9ae1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f23e7d89420 caffe::SyncedMemory::to_gpu()
@ 0x7f23e7d883e9 caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f23e7d8a782 caffe::Blob<>::mutable_gpu_data()
@ 0x7f23e7dc5ee8 caffe::BaseConvolutionLayer<>::forward_gpu_gemm()
@ 0x7f23e7f15616 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x7f23e7ebb7b2 caffe::Net<>::ForwardFromTo()
@ 0x7f23e7ebb906 caffe::Net<>::ForwardPrefilled()
@ 0x7f23e84be176 Detector::Detect()
@ 0x442990 FasterDetect()
@ 0x440de4 PredictContainer2()
@ 0x442dd1 imagedeal()
@ 0x442e67 main
@ 0x7f23e5989830 __libc_start_main
@ 0x4404e9 _start
@ (nil) (unknown)
已放弃 (核心已转储)
2.解决过程
这种错误先看了下cpu内存,top命令查看了下发现内存够用,然后以为是图像数据有问题,或者矩阵运算有内存越界等错误,忙了大半天,没有找出原因,后来研究了下caffe,运算好像是在显示芯片上面运算的(GPU),查看下内存命令:
john@ubun:~/Project/Serverbk$nvidia-smi
Fri May 31 13:41:32 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:65:00.0 On | N/A |
| 0% 58C P2 62W / 250W | 8413MiB / 11169MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1047 G /usr/lib/xorg/Xorg 221MiB |
| 0 1822 G compiz 61MiB |
| 0 2798 C ./Server 189MiB |
| 0 3424 C ./Imagedeal 6981MiB |
| 0 3513 C ./Server 189MiB |
| 0 3563 C ./Imagedeal 189MiB |
| 0 4186 C ./Server 189MiB |
| 0 9778 C ./Server 189MiB |
| 0 10094 G /usr/lib/firefox/firefox 2MiB |
| 0 11819 C ./Imagedeal 189MiB |
+-----------------------------------------------------------------------------+
然后杀了下进程:
john@ubun:~/Project/Serverbk$kill %
[8]+ 已停止 ./Imagedeal
john@ubun:~/Project/Serverbk$
[8]+ 已终止 ./Imagedeal
john@ubun:~/Project/Serverbk$
john@ubun:~/Project/Serverbk$
john@ubun:~/Project/Serverbk$
john@ubun:~/Project/Serverbk$nvidia-smi
Fri May 31 13:49:36 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:65:00.0 On | N/A |
| 0% 51C P0 63W / 250W | 282MiB / 11169MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1047 G /usr/lib/xorg/Xorg 214MiB |
| 0 1822 G compiz 62MiB |
| 0 10094 G /usr/lib/firefox/firefox 2MiB |
+-----------------------------------------------------------------------------+
问题解决!
最后再写个sh脚本,让程序运行前先杀一下后台程序!