1 报错描述
1.1 系统环境
Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):
1.2 基本信息
1.2.1 脚本
训练脚本是通过构简单的算子网络,对输入两个张量做Add运算后再调用Tensor
Summary。脚本如下:
01 class SummaryNet(nn.Cell):
02 def __init__(self,):
03 super(SummaryNet, self).__init__()
04 self.summary = ops.TensorSummary()
05 self.add = ops.Add()
06
07 def construct(self, x, y):
08 x = self.add(x, y)
09 name = "x"
10 self.summary(name, x.sum())
11 return x
12
13 x = Tensor(np.array([1, 2, 3]).astype(np.float32))
14 y = Tensor(np.array([4, 5, 6]).astype(np.float32))
15 summary_net = SummaryNet()(x, y)
16 print("out: ", summary_net)
1.2.2 报错
这里报错信息如下:
Traceback (most recent call last):
File "C:/Users/l30026544/PycharmProjects/q2_map/new/173735.py", line 22, in
summary_net = SummaryNet()(x, y)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\cell.py", line 586, in call
out = self.compile_and_run(*args)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\cell.py", line 964, in compile_and_run
self.compile(*inputs)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\cell.py", line 937, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\common\api.py", line 1006, in compile
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
ValueError: mindspore\core\utils\check_convert_utils.cc:397 CheckInteger] For primitive[TensorSummary], the v rank must be greater than or equal to 1, but got 0.
WARNING: Logging before InitGoogleLogging() is written to STDERR
[CRITICAL] CORE(6472,1,?):2022-6-17 15:47:53 [mindspore\core\utils\check_convert_utils.cc:397] CheckInteger] For primitive[TensorSummary], the v rank must be greater than or equal to 1, but got 0.
原因分析
我们看报错信息,在ValueError中,写到ValueError: For primitive[TensorSummary], the v rank must be greater than or equal to 1, but got 0.
,意思是对于TensorSummary,参数v的秩必须大于等于1, 但是得到了0. 因此需要检查一下擦传入TensorSummary的v的秩是不是符合要求的。检查脚本的第8行发现对x和y进行了求和操作, 结果是一个scalar(常数), 因此报错。关于TensorSummary,在官网做了输入限制,对输入的Tensor要求rank必须大于等于1。 如果需要对标量数据进行搜集, 可以使用ScalarSummary算子。
2 解决方法
基于上面已知的原因,很容易做出如下修改:
01 class SummaryNet(nn.Cell):
02 def __init__(self,):
03 super(SummaryNet, self).__init__()
04 self.summary = ops.ScalarSummary()
05 self.add = ops.Add()
06
07 def construct(self, x, y):
08 x = self.add(x, y)
09 name = "x"
10 self.summary(name, x.sum())
11 return x
12
13 x = Tensor(np.array([1, 2, 3]).astype(np.float32))
14 y = Tensor(np.array([4, 5, 6]).astype(np.float32))
15 summary_net = SummaryNet()(x, y)
16 print("out: ", summary_net)
此时执行成功,输出如下:
out: [5. 7. 9.]
3 总结
定位报错问题的步骤:
1、找到报错的用户代码行: summary_net = SummaryNet()(x, y);
2、 根据日志报错信息中的关键字,缩小分析问题的范围 For primitive[TensorSummary], the v rank must be greater than or equal to 1, but got 0. ;
3、需要重点关注变量定义、初始化的正确性。
4 参考文档
4.1 TensorSummary算子API接口