cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

 

cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

 

 

1、博主百度无解后逛了逛Pytorch的社区,发现上面也有很多朋友碰到了该问题,他们提供的解决办法是:

sudo rm -rf ~/.nv
    删除掉nvidia缓存,并且有人说奏效,也有人说It doesn't work for me...

2、于是乎,博主发现了解决该问题的方法:(特别是实验室服务器单机多卡的情况)

    僧多粥少,大家一般都只用一块GPU,TensorFlow时代的做法是,

import os
os.environ['CUDA_ENABLE_DEVICES'] = '0'
    但是该方法在Pytorch代码并不适用,正确的操作方法是:

import torch
torch.cuda.set_device(0)

 

import os
import pynvml
import torch

pynvml.nvmlInit()


def usegpu(need_gpu_count=1):
    nouse = []
    for index in range(pynvml.nvmlDeviceGetCount()):
        # 这里的0是GPU id
        handle = pynvml.nvmlDeviceGetHandleByIndex(index)
        meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
        used = meminfo.used / meminfo.total
        if used < 0.5:
            nouse.append(index)
    if len(nouse) >= need_gpu_count:
        os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, nouse[:need_gpu_count]))
        for i in nouse[:need_gpu_count]:
            torch.cuda.set_device(i)
        print("use gpu:"+','.join(map(str, nouse[:need_gpu_count])))
        return nouse[:need_gpu_count]
    elif len(nouse) > 0:
        os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, nouse))
        print("use gpu:" +','.join(map(str, nouse)))
        return len(nouse)
    else:
        return 0


if __name__ == '__main__':

    gpus = usegpu(need_gpu_count=2)

    if gpus:
        print("use gpu ok")
    else:
        print("no gpu is valid")

 

你可能感兴趣的:(cuda)