Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):
训练脚本是通过构建Conv2d的单算子网络,对输入张量计算二维卷积。脚本如下:
01 class Net(nn.Cell):
02 def __init__(self,in_channels,out_channels,kernel_size):
03 super(Net, self).__init__()
04 self.in_channels = in_channels
05 self.out_channels = out_channels
06 self.kernel_size = kernel_size
07 self.conv2d = nn.Conv2d(self.in_channels,self.out_channels,self.kernel_size)
08
09 def construct(self, x):
10 result = self.conv2d (x)
11 return result
12
13 net = Net(in_channels=1, out_channels =240, ,kernel_size =4)
14 x = Tensor(np.ones([3, 3, 1024, 640]), mindspore.float32)
15 out = net(x)
16 print('out',out.shape)
这里报错信息如下:
Traceback (most recent call last):
File "demo.py", line 15, in <module>
out = net(x)
…
RuntimeError: mindspore/core/ops/conv2d.cc:185 Conv2dInferShape] For 'Conv2D', 'C_in' of input 'x' shape divide by parameter 'group' should be equal to 'C_in' of input 'weight' shape: 1, but got 'C_in' of input 'x' shape: 3, and 'group': 1
The function call stack (See file 'rank_0/om/analyze_fail.dat' for more details):
\# 0 In file demo.py(10)
result = self.conv2d (x)
原因分析
在MindSpore 1.6版本中,在construct中创建和使用Tensor。而在脚本中第10行代码发现了在construct中创建tensor对象并使用。
接着看报错信息,在RuntimeError中,写到*‘C_in’ of input ‘x’ shape divide by parameter ‘group’ should be equal to ‘C_in’ of input ‘weight’ shape: 1, but got ‘C_in’ of input ‘x’ shape: 3, and ‘group’: 1*,意思是输入x shape中C_in 除以 group 必须要等于输入weight shape的C_in,即x_shape[C_in] / group 必须要 == w_shape[C_in] ,但是用户给的w_shape[C_in] 值是1,但是x_shape[C_in] / group 却==3,这个w_shape[C_in]就是权重的channels维的大小,也就是你传的in_channels属性值,检查一下是不是把nn.Conv2d初始化时的in_channels属性设置成1了,在官网中对C_in和in_channels也做了几乎一样的描述。`
检查代码发现,13行代码in_channels确实不等于14行C_in值,此时将in_channels设置为数据相同的C_in值即可。
基于上面已知的原因,很容易做出如下修改:
此时执行成功,输出如下:
out: (3, 240, 1024, 640)
定位报错问题的步骤:
1、 找到报错相关的用户代码行:net = Net(in_channels=1, out_channels =240, ,kernel_size =4)、x = Tensor(np.ones([3, 3, 1024, 640]), mindspore.float32) ;
2、 根据日志报错信息中的关键字,缩小分析问题的范围: ‘C_in’ of input ‘x’ shape divide by parameter ‘group’ should be equal to ‘C_in’ of input ‘weight’ shape: 1, but got ‘C_in’ of input ‘x’ shape: 3, and ‘group’: 1;
3、需要重点关注变量定义、初始化的正确性。