pytorch训练遇到的问题

1.RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 12.05 MiB free; 6.27 GiB reserved in total by PyTorch)

我是在训练代码的网络前加上with torch.no_grad(),如下:

        with torch.no_grad():
            y1 = net(images)

with torch.no_grad - disables tracking of gradients in autograd.
model.eval() changes the forward() behaviour of the module it is called upon
eg, it disables dropout and has batch norm use the entire population statistics

model.eval()和with torch.no_grad()的区别

在PyTorch中进行validation时,会使用model.eval()切换到测试模式,在该模式下,

主要用于通知dropout层和batchnorm层在train和val模式间切换
在train模式下,dropout网络层会按照设定的参数p设置保留激活单元的概率(保留概率=p); batchnorm层会继续计算数据的mean和var等参数并更新。
在val模式下,dropout层会让所有的激活单元都通过,而batchnorm层会停止计算和更新mean和var,直接使用在训练阶段已经学出的mean和var值。
该模式不会影响各层的gradient计算行为,即gradient计算和存储与training模式一样,只是不进行反传(backprobagation)
with torch.no_grad()则主要是用于停止autograd模块的工作,以起到加速和节省显存的作用,具体行为就是停止gradient计算,从而节省了GPU算力和显存,但是并不会影响dropout和batchnorm层的行为。
原文链接:https://blog.csdn.net/songyu0120/article/details/103884586

2.RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

出现这种问题是因为我们需要对变量求梯度,但是系统默认的是False, 也就是不对这个变量求梯度。解决方法:loss.requires_grad = True

        with torch.no_grad():
            y1 = net(images)
        loss_c, loss_s, loss = criterion(y1, labels, training_mask)
        # Backward
        loss.requires_grad = True
        optimizer.zero_grad()
        loss.backward()

参考:https://blog.csdn.net/CVAIDL/article/details/106363761?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase

3.train.py:182: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.

targets = [Variable(ann.cuda(), volatile=True) for ann in targets]

改成

            with torch.no_grad():
                targets = [Variable(ann.cuda()) for ann in targets]

你可能感兴趣的:(pytorch训练遇到的问题)