Ubuntu系统yolov5训练报错集合

问题1:
TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
解决问题:
根据错误提示找到
/home/xx/anaconda3/envs/deepshare/lib/python3.7/site-packages/torch/tensor.py文件的第621行
或者直接单击错误提示栏进入

 def __array__(self, dtype=None):
        if dtype is None:
            return self.numpy()
        else:
            return self.numpy().astype(dtype, copy=False)

中的

return self.numpy()

改为:

return self.cpu().detach().numpy()

再次运行训练指令即可正常运行,问题解决

参考:https://blog.csdn.net/qq_44703886/article/details/117231542
同理:

Traceback (most recent call last):
  File "train.py", line 456, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 314, in train
    results, maps, times = test.test(opt.data,
  File "/media/xx/新加卷/yolov5-master-ubuntu/test.py", line 193, in test
    plot_images(img, output_to_target(output, width, height), paths, str(f), names)  # predictions
  File "/media/xx/新加卷/yolov5-master-ubuntu/utils/general.py", line 942, in output_to_target
    return np.array(targets)
  File "/home/xx/anaconda3/envs/yolov5/lib/python3.8/site-packages/torch/_tensor.py", line 643, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

问题2:
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
ERROR解决问题:
145行加上 with torch.no_grad():
具体更改为:

def _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency
        # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
        m = self.model[-1]  # Detect() module
        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            with torch.no_grad():
                b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
                b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

参考:https://blog.51cto.com/u_15194128/2795983

问题 3:如果报显卡和pytorch不兼容
例如:显卡A6000算力高,而你的torch版本支持的算力达不到。
解决方法:可以升级torch版本

问题4:RuntimeError: Unable to find a valid cuDNN algorithm to run

Traceback (most recent call last):
  File "train.py", line 586, in <module>
    main(opt)
  File "train.py", line 485, in main
    train(opt.hyp, opt, device)
  File "train.py", line 315, in train
    scaler.scale(loss).backward()
  File "/home/image522/anaconda3/envs/yolov5_v5.0/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/image522/anaconda3/envs/yolov5_v5.0/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

解决方法: 修改batch-size大小

你可能感兴趣的:(python)