yolov7训练自己的数据集,踩坑不定时更新

_pickle.UnpicklingError: STACK_GLOBAL requires str

解决办法:link
我直接把yolo5里面的data.yaml复制到yolo7里面,但是路径地下有两个缓存文件–train.cache,val.cache ,把这俩文件删掉就可以了
2.

//RuntimeError: CUDA out of memory. Tried to allocate 2.96 GiB (GPU 0; 11.76 GiB total capacity; 1.31 GiB already allocated; 2.96 GiB free; 7.25 GiB reserved in total by PyTorch)

解决办法:https://blog.csdn.net/qq_37555071/article/details/108346569
nvidia-smi
taskkill -PID 7392 -F(我电脑上没装taskkill)
kill 进程号
然后把训练命令的batch改为8,worker改为1

3.在改用yolor时报错,在commen.py中缺少某些未定义的包,在github上找到后复制粘贴过去,代码如下:主要是少了BottleneckCSPF,SPPCSPmoudle,BottleneckCSP2这三个class!

#TODOadd bottleneckcspf moudle@lsy
class BottleneckCSPF(nn.Module):
    # CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super(BottleneckCSPF, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)
        #self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)
        self.cv4 = Conv(2 * c_, c2, 1, 1)
        self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)
        self.act = nn.SiLU()
        self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)])

    def forward(self, x):
        y1 = self.m(self.cv1(x))
        y2 = self.cv2(x)
        return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim=1))))
#todo@lsy add SPPCSPmoudle
class SPPCSP(nn.Module):
    # CSP SPP https://github.com/WongKinYiu/CrossStagePartialNetworks
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
        super(SPPCSP, self).__init__()
        c_ = int(2 * c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)
        self.cv3 = Conv(c_, c_, 3, 1)
        self.cv4 = Conv(c_, c_, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
        self.cv5 = Conv(4 * c_, c_, 1, 1)
        self.cv6 = Conv(c_, c_, 3, 1)
        self.bn = nn.BatchNorm2d(2 * c_)
        self.act = nn.SiLU()
        self.cv7 = Conv(2 * c_, c2, 1, 1)

    def forward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
        y2 = self.cv2(x)
        return self.cv7(self.act(self.bn(torch.cat((y1, y2), dim=1))))
#todo@lsy
class BottleneckCSP2(nn.Module):
    # CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super(BottleneckCSP2, self).__init__()
        c_ = int(c2)  # hidden channels

4.在想要替换yolor模型时,仅仅将yaml替换成了yolor-d6.yaml,这个文储存了模型框架的信息,预训练权重仍然用的yolov7.pt,一是因为忘记改了,二是脑子里完全没有这个与训练权重的概念,训练了3个epoch后发现mAP竟然是类似3e-5这种超级小的数,群里有大佬解释了一下,说这种现象很正常,多训练几个epoch就好了,也有大佬解释,python里面小于1e-4的浮点数都会自动转科学计数法,要转校书输出得自己动手动改成字符串。保险起见还是把与训练权重替换为了yolor的文件。。。
5.tensorboard localhost 拒绝了我们的连接请求。

tensorboard --logdir /usr/project/yolov7/runs/train/yolov79 --port=8008

你可能感兴趣的:(python,深度学习,pytorch)