MindSpore报错ValueError: For ‘xxx‘,x和y的shape不支持 broadcast

1 报错描述

1.1 系统环境

Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 基本信息

1.2.1 脚本

训练脚本是通过构建Abs的单算子网络,对输入两个张量做Sub运算后再计算Abs。脚本如下:

 class Net(nn.Cell):
   def __init__(self):
     super(Net, self).__init__()
     self.abs = ops.Abs()

   def construct(self, x1,x2):
     output = self.abs(x1 - x2)
     return output
 net = Net()
 x1 = Tensor(np.ones((2, 5), dtype=np.float32), mindspore.float32)
 x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
 out = net(x1,x2)
 print('out',out.shape)

2 报错

这里报错信息如下:

The function call stack (See file 'rank_0/om/analyze_fail.dat' for more details):
\# 0 In file demo.py(7)
​    output = self.abs(x1 - x2)
​             ^
Traceback (most recent call last):
 File "demo.py", line 13, in <module>
  out = net(x1,x2)
ValueError: For 'Sub', x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape<i> = 1 or -1 or y.shape<i> = 1 or -1 or x.shape<i> = y.shape<i>, but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5].

原因分析

对于MindSpore 1.6版本,在construct中创建和使用Tensor。如脚本中第10行代码所示。

接着看报错信息,在ValueError中,写到ValueError: For ‘Sub’, x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape,意思是abs的两个操作对象不能进行broadcast,broadcast的要求是x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape,而x.shape = y.shape要求两个参数的shape完全相等,在进一步的报错信息中也有写到but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5],显然,x和y的第一个维度不等,这就是问题出现的原因了。关于BroadCast,在官网做了输入限制,对输入的Tensor要求shape必须相同。在其他的双输入算子中,有一定量算子用到了BroadCast操作,也应当注意这点。

3 解决方法

基于上面已知的原因,很容易做出如下修改:
示例1:
MindSpore报错ValueError: For ‘xxx‘,x和y的shape不支持 broadcast_第1张图片

此时执行成功,输出如下:

out: (3, 5)

示例2:

class Net(nn.Cell):
   def __init__(self):super(Net, self).__init__()
​     self.abs = ops.Abs()

   def construct(self, x1,x2):
​     output = self.abs(x1 - x2)return output
 net = Net()
 x1 = Tensor(np.ones((5), dtype=np.float32), mindspore.float32)
 x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
 out = net(x1,x2)
 print('out',out.shape)

此时执行成功,输出如下:

out: (3, 5)

4 总结

定位报错问题的步骤:

1、找到报错的用户代码行:output = self.abs(x1 - x2);

2、 根据日志报错信息中的关键字,缩小分析问题的范围:x.shape: [2, 5], y.shape: [3, 5];

3、需要重点关注变量定义、初始化的正确性。

5 参考文档

5.1 broadcast方法

你可能感兴趣的:(机器学习,神经网络,python)