【tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed】错误解决方案

存在严重问题:不再使用GPU去训练,此时使用的是CPU!!!


最近使用yolov3训练模型,依然使用之前的配置和环境,但是却出现以下错误,百思不得其解。看过了很多博客,研究了好久……

直到今天,在寻找一个类似报错的时候,看到某篇博客的评论区提出的解决方案,困扰我许久的问题终于解决了!!!!

报错问题:

E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "E:/Project/keras-yolo3-person&vehicle&aeroplane/train.py", line 190, in 
    _main()
  File "E:/Project/keras-yolo3-person&vehicle&aeroplane/train.py", line 84, in _main
    callbacks=[logging, checkpoint, reduce_lr, early_stopping])
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\engine\training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
    run_metadata_ptr)
  File "D:\ProgramData\Anaconda3\envs\keras-yolo3-cp36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=86528, n=32, k=64
	 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/conv2d_3/convolution_grad/Conv2DBackpropInput"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]]
	 [[{{node yolo_loss/while_2/strided_slice_1/stack_1/_4337}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_11657_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_2/strided_slice_1/stack_2/_4125)]]

解决方案

添加代码:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0'

存在问题:不再使用GPU去训练,此时使用的是CPU。

你可能感兴趣的:(Python,gpu,tensorflow)