一直使用colaboratory进行训练,突然有一天报错:
CUDA status Error: file: ./src/blas_[kernels.cu](http://kernels.cu) : () : line: 859 : build time: Dec 1 2021 - 08:09:47
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
darknet: ./src/utils.c:331: error: Assertion `0' failed.
卡了很久,后找到解决办法。
参考:https://github.com/pjreddie/darknet/issues/1523
因为colaboratory使用的显卡都是随机的,不同的显卡需要修改Makefile文件。
运行:
# This cell ensures you have the correct architecture for your respective GPU
# If you command is not found, look through these GPUs, find the respective
# GPU and add them to the archTypes dictionary
# Tesla V100
# ARCH= -gencode arch=compute_70,code=[sm_70,compute_70]
# Tesla K80
# ARCH= -gencode arch=compute_37,code=sm_37
# GeForce RTX 2080 Ti, RTX 2080, RTX 2070, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Tesla T4, XNOR Tensor Cores
# ARCH= -gencode arch=compute_75,code=[sm_75,compute_75]
# Jetson XAVIER
# ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]
# GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4
# ARCH= -gencode arch=compute_61,code=sm_61
# GP100/Tesla P100 - DGX-1
# ARCH= -gencode arch=compute_60,code=sm_60
# For Jetson TX1, Tegra X1, DRIVE CX, DRIVE PX - uncomment:
# ARCH= -gencode arch=compute_53,code=[sm_53,compute_53]
# For Jetson Tx2 or Drive-PX2 uncomment:
# ARCH= -gencode arch=compute_62,code=[sm_62,compute_62]
import os
os.environ['GPU_TYPE'] = str(os.popen('nvidia-smi --query-gpu=name --format=csv,noheader').read())
def getGPUArch(argument):
try:
argument = argument.strip()
# All Colab GPUs
archTypes = {
"Tesla V100-SXM2-16GB": "-gencode arch=compute_70,code=[sm_70,compute_70]",
"Tesla K80": "-gencode arch=compute_37,code=sm_37",
"Tesla T4": "-gencode arch=compute_75,code=[sm_75,compute_75]",
"Tesla P40": "-gencode arch=compute_61,code=sm_61",
"Tesla P4": "-gencode arch=compute_61,code=sm_61",
"Tesla P100-PCIE-16GB": "-gencode arch=compute_60,code=sm_60"
}
return archTypes[argument]
except KeyError:
return "GPU must be added to GPU Commands"
os.environ['ARCH_VALUE'] = getGPUArch(os.environ['GPU_TYPE'])
print("GPU Type: " + os.environ['GPU_TYPE'])
print("ARCH Value: " + os.environ['ARCH_VALUE'])
查看使用的GPU,以及修改的内容,在Makefile的ARCH
的位置添加输出的内容。重新编译darknet即可。
不清楚的可以对比原有的Makefile和上一个链接中给出的Makefile不同的地方。