pytorch 1.6 严重性能问题 对不连续张量运算缓慢(no contiguous tensor)

更新远程机pytorch 1.3.1 到 1.6版本,没想到慢到爆炸。醉了。
找了半天才发现这东西的锅。

提了个issue,不知道能不能解决。

issue:https://github.com/pytorch/pytorch/issues/44943

下面为测试代码,速度差距可能不是很明显,在我的机器上实验,差别只有2s。
但在我实际训练代码和模型里面 加和不加 contiguous ,速度差距达到了 10x 以上,太坑了,醉了。

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import time

device = 'cuda:0'

torch.set_grad_enabled(False)

class Net2(nn.Module):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.layer1 = nn.Conv2d(3, 128, 3, 1, 1)
        self.layer2 = nn.Sequential(*([nn.Conv2d(128, 128, 3, 1, 1)]*20))

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        return x

net = Net2().to(device)

def calc_time(use_contiguous=True, ep=10):
    t1 = time.time()

    batch_im = np.random.uniform(size=[10, 256, 256, 3]).astype(np.float32)

    for i in range(ep):
        batch_tensors = torch.tensor(batch_im, dtype=torch.float32, device=device).permute(0, 3, 1, 2)
        if use_contiguous:
            batch_tensors = batch_tensors.contiguous()
        out = net(batch_tensors)
        out = out.softmax(dim=1)
        out = F.interpolate(out, size=[512, 512], mode='area')
        out.cpu().numpy()

    t2 = time.time()
    print(t2-t1)

# warm start
calc_time(use_contiguous=False, ep=1)
# calc time
print('calc time')
calc_time(use_contiguous=True)
calc_time(use_contiguous=False)
calc_time(use_contiguous=True)
calc_time(use_contiguous=False)

你可能感兴趣的:(深度学习的经验,pytorch,bug)