存在严重问题:不再使用GPU去训练,此时使用的是CPU!!!
最近使用yolov3训练模型,依然使用之前的配置和环境,但是却出现以下错误,百思不得其解。看过了很多博客,研究了好久……
直到今天,在寻找一个类似报错的时候,看到某篇博客的评论区提出的解决方案,困扰我许久的问题终于解决了!!!!
E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "E:/Project/keras-yolo3-person&vehicle&aeroplane/train.py", line 190, in
_main()
File "E:/Project/keras-yolo3-person&vehicle&aeroplane/train.py", line 84, in _main
callbacks=[logging, checkpoint, reduce_lr, early_stopping])
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\engine\training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
run_metadata_ptr)
File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=86528, n=32, k=64
[[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/conv2d_3/convolution_grad/Conv2DBackpropInput"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]]
[[{{node yolo_loss/while_2/strided_slice_1/stack_1/_4337}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_11657_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_2/strided_slice_1/stack_2/_4125)]]
添加代码:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0'
存在问题:不再使用GPU去训练,此时使用的是CPU。