Pytorch学习笔记--多GPU并行训练时nn.ParameterList()为空的问题

目录

1--前言

2--报错代码

3--解决方法


1--前言

        最近在复现一篇 Paper,其开源代码使用了 nn.DataParallel() 进行多 GPU 并行训练,同时使用 nn.ParameterList() 来构建参数列表;这时会出现一个 Pytorch 的 Bug,即 nn.ParameterList()在 forward 阶段会出现 empty的错误,报错如下:

① UserWarning: nn.ParameterList is being used with DataParallel but this is not supported. This list will appear empty for the models replicated on each GPU except the original one.

② warnings.warn("nn.ParameterList is being used with DataParallel but this is not ".

③ UserWarning: Setting attributes on ParameterList is not supported.

④ warnings.warn("Setting attributes on ParameterList is not supported.").

        更详细的问题描述参考 pytorch 官方 github 的 issue:nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

2--报错代码

        报错代码如下:

...

self.st_gcn_networks = nn.ModuleList((
    st_gcn(in_channels, 64, kernel_size, 1, residual=False, **kwargs0),
    st_gcn(64, 64, kernel_size, 1, **kwargs),
    st_gcn(64, 64, kernel_size, 1, **kwargs),
    st_gcn(64, 64, kernel_size, 1, **kwargs),
    st_gcn(64, 128, kernel_size, 2, **kwargs),
    st_gcn(128, 128, kernel_size, 1, **kwargs),
    st_gcn(128, 128, kernel_size, 1, **kwargs),
    st_gcn(128, 256, kernel_size, 2, **kwargs),
    st_gcn(256, 256, kernel_size, 1, **kwargs),
    st_gcn(256, 256, kernel_size, 1, **kwargs),
))

...

self.edge_importance = nn.ParameterList([
    nn.Parameter(torch.ones(self.A.size()))
    for i in self.st_gcn_networks
])

        在 forward 阶段执行以下代码会直接跳过,因为通过 nn.ParameterList() 实现的 self.edge_importance 为空:

for gcn, importance in zip(self.st_gcn_networks, self.edge_importance):
    x, _ = gcn(x, self.A * importance)

3--解决方法

        ① 如前言所述,这是由于多 GPU 并行训练导致的问题,一个简单的方法就是使用单个GPU进行训练,但这也可能会出现显存不足的问题;

        ② 博主针对这个问题的解决方法:不使用 nn.ParameterList() 列表的方式存储参数(可能属于是笨方法);

修正代码:

...

self.st_gcn_networks = nn.ModuleList((
    st_gcn(in_channels, 64, kernel_size, 1, residual=False, **kwargs0),
    st_gcn(64, 64, kernel_size, 1, **kwargs),
    st_gcn(64, 64, kernel_size, 1, **kwargs),
    st_gcn(64, 64, kernel_size, 1, **kwargs),
    st_gcn(64, 128, kernel_size, 2, **kwargs),
    st_gcn(128, 128, kernel_size, 1, **kwargs),
    st_gcn(128, 128, kernel_size, 1, **kwargs),
    st_gcn(128, 256, kernel_size, 2, **kwargs),
    st_gcn(256, 256, kernel_size, 1, **kwargs),
    st_gcn(256, 256, kernel_size, 1, **kwargs),
))

...

self.edge_importance0 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance1 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance2 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance3 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance4 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance5 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance6 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance7 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance8 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance9 = nn.Parameter(torch.ones(self.A.size()))

        forward 阶段:

x, _ = self.st_gcn_networks[0](x, self.A * self.edge_importance0)
x, _ = self.st_gcn_networks[1](x, self.A * self.edge_importance1)
x, _ = self.st_gcn_networks[2](x, self.A * self.edge_importance2)
x, _ = self.st_gcn_networks[3](x, self.A * self.edge_importance3)
x, _ = self.st_gcn_networks[4](x, self.A * self.edge_importance4)
x, _ = self.st_gcn_networks[5](x, self.A * self.edge_importance5)
x, _ = self.st_gcn_networks[6](x, self.A * self.edge_importance6)
x, _ = self.st_gcn_networks[7](x, self.A * self.edge_importance7)
x, _ = self.st_gcn_networks[8](x, self.A * self.edge_importance8)
x, _ = self.st_gcn_networks[9](x, self.A * self.edge_importance9)

        ③ 其它解决方法:将 nn.ParameterList() 里的参数注册到模型中,具体方法如下:

ParameterList()与nn.DataParallel

Pytorch 程序单卡到多卡

你可能感兴趣的:(Pytorch学习笔记,pytorch,学习,bug)