在运行yolov1的项目时提示RuntimeError: CUDA error: unknown error
错误,详细错误提示如下:
root@bcc5071417cf:/home/zhou/pytorch/yolo_1_pytorch# python train.py
epoch = 0
Traceback (most recent call last):
File "train.py", line 75, in
train()
File "train.py", line 71, in train
train_step(epochs, model, train_loader, test_loader, optimizer, classes, device=device)
File "train.py", line 23, in train_step
loss_dict = model(img, gt_info)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhou/pytorch/yolo_1_pytorch/yolo/yolov1.py", line 103, in forward
output = self.local_layer(output)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhou/pytorch/yolo_1_pytorch/yolo/darknet.py", line 65, in forward
out = self.conv(x)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
将batch_size改小,从batch_size=16, 修改为2之后,解决了问题。
train_cfg['batch_size'] = 2