目录
1--前言
2--报错代码
3--解决方法
最近在复现一篇 Paper,其开源代码使用了 nn.DataParallel() 进行多 GPU 并行训练,同时使用 nn.ParameterList() 来构建参数列表;这时会出现一个 Pytorch 的 Bug,即 nn.ParameterList()在 forward 阶段会出现 empty的错误,报错如下:
① UserWarning: nn.ParameterList is being used with DataParallel but this is not supported. This list will appear empty for the models replicated on each GPU except the original one.
② warnings.warn("nn.ParameterList is being used with DataParallel but this is not ".
③ UserWarning: Setting attributes on ParameterList is not supported.
④ warnings.warn("Setting attributes on ParameterList is not supported.").
更详细的问题描述参考 pytorch 官方 github 的 issue:nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035
报错代码如下:
...
self.st_gcn_networks = nn.ModuleList((
st_gcn(in_channels, 64, kernel_size, 1, residual=False, **kwargs0),
st_gcn(64, 64, kernel_size, 1, **kwargs),
st_gcn(64, 64, kernel_size, 1, **kwargs),
st_gcn(64, 64, kernel_size, 1, **kwargs),
st_gcn(64, 128, kernel_size, 2, **kwargs),
st_gcn(128, 128, kernel_size, 1, **kwargs),
st_gcn(128, 128, kernel_size, 1, **kwargs),
st_gcn(128, 256, kernel_size, 2, **kwargs),
st_gcn(256, 256, kernel_size, 1, **kwargs),
st_gcn(256, 256, kernel_size, 1, **kwargs),
))
...
self.edge_importance = nn.ParameterList([
nn.Parameter(torch.ones(self.A.size()))
for i in self.st_gcn_networks
])
在 forward 阶段执行以下代码会直接跳过,因为通过 nn.ParameterList() 实现的 self.edge_importance 为空:
for gcn, importance in zip(self.st_gcn_networks, self.edge_importance):
x, _ = gcn(x, self.A * importance)
① 如前言所述,这是由于多 GPU 并行训练导致的问题,一个简单的方法就是使用单个GPU进行训练,但这也可能会出现显存不足的问题;
② 博主针对这个问题的解决方法:不使用 nn.ParameterList() 列表的方式存储参数(可能属于是笨方法);
修正代码:
...
self.st_gcn_networks = nn.ModuleList((
st_gcn(in_channels, 64, kernel_size, 1, residual=False, **kwargs0),
st_gcn(64, 64, kernel_size, 1, **kwargs),
st_gcn(64, 64, kernel_size, 1, **kwargs),
st_gcn(64, 64, kernel_size, 1, **kwargs),
st_gcn(64, 128, kernel_size, 2, **kwargs),
st_gcn(128, 128, kernel_size, 1, **kwargs),
st_gcn(128, 128, kernel_size, 1, **kwargs),
st_gcn(128, 256, kernel_size, 2, **kwargs),
st_gcn(256, 256, kernel_size, 1, **kwargs),
st_gcn(256, 256, kernel_size, 1, **kwargs),
))
...
self.edge_importance0 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance1 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance2 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance3 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance4 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance5 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance6 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance7 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance8 = nn.Parameter(torch.ones(self.A.size()))
self.edge_importance9 = nn.Parameter(torch.ones(self.A.size()))
forward 阶段:
x, _ = self.st_gcn_networks[0](x, self.A * self.edge_importance0)
x, _ = self.st_gcn_networks[1](x, self.A * self.edge_importance1)
x, _ = self.st_gcn_networks[2](x, self.A * self.edge_importance2)
x, _ = self.st_gcn_networks[3](x, self.A * self.edge_importance3)
x, _ = self.st_gcn_networks[4](x, self.A * self.edge_importance4)
x, _ = self.st_gcn_networks[5](x, self.A * self.edge_importance5)
x, _ = self.st_gcn_networks[6](x, self.A * self.edge_importance6)
x, _ = self.st_gcn_networks[7](x, self.A * self.edge_importance7)
x, _ = self.st_gcn_networks[8](x, self.A * self.edge_importance8)
x, _ = self.st_gcn_networks[9](x, self.A * self.edge_importance9)
③ 其它解决方法:将 nn.ParameterList() 里的参数注册到模型中,具体方法如下:
ParameterList()与nn.DataParallel
Pytorch 程序单卡到多卡