Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):
训练脚本是通过构建Abs的单算子网络,对输入两个张量做Sub运算后再计算Abs。脚本如下:
class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.abs = ops.Abs()
def construct(self, x1,x2):
output = self.abs(x1 - x2)
return output
net = Net()
x1 = Tensor(np.ones((2, 5), dtype=np.float32), mindspore.float32)
x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
out = net(x1,x2)
print('out',out.shape)
这里报错信息如下:
The function call stack (See file 'rank_0/om/analyze_fail.dat' for more details):
\# 0 In file demo.py(7)
output = self.abs(x1 - x2)
^
Traceback (most recent call last):
File "demo.py", line 13, in <module>
out = net(x1,x2)
ValueError: For 'Sub', x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape<i> = 1 or -1 or y.shape<i> = 1 or -1 or x.shape<i> = y.shape<i>, but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5].
原因分析
对于MindSpore 1.6版本,在construct中创建和使用Tensor。如脚本中第10行代码所示。
接着看报错信息,在ValueError中,写到ValueError: For ‘Sub’, x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape,意思是abs的两个操作对象不能进行broadcast,broadcast的要求是x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape,而x.shape = y.shape要求两个参数的shape完全相等,在进一步的报错信息中也有写到but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5],显然,x和y的第一个维度不等,这就是问题出现的原因了。关于BroadCast,在官网做了输入限制,对输入的Tensor要求shape必须相同。在其他的双输入算子中,有一定量算子用到了BroadCast操作,也应当注意这点。
此时执行成功,输出如下:
out: (3, 5)
示例2:
class Net(nn.Cell):
def __init__(self):
super(Net, self).__init__()
self.abs = ops.Abs()
def construct(self, x1,x2):
output = self.abs(x1 - x2)
return output
net = Net()
x1 = Tensor(np.ones((5), dtype=np.float32), mindspore.float32)
x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
out = net(x1,x2)
print('out',out.shape)
此时执行成功,输出如下:
out: (3, 5)
定位报错问题的步骤:
1、找到报错的用户代码行:output = self.abs(x1 - x2);
2、 根据日志报错信息中的关键字,缩小分析问题的范围:x.shape: [2, 5], y.shape: [3, 5];
3、需要重点关注变量定义、初始化的正确性。