Hardware Environment(Ascend/GPU/CPU): GPU
Software Environment:
– MindSpore version (source or binary): 1.6.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):
训练脚本是通过构建Concat单算子网络,在特定的轴连接张量的例子。脚本如下:
1 context.set_context(mode=context.GRAPH_MODE, device_target="Ascend")
2 class Net(nn.Cell):
3 def __init__(self):
4 super(Net, self).__init__()
5 self.concat = ops.Concat()
6
7 def construct(self, x):
8 n = 1000
9 input_x = ()
10 for i in range(n):
11 input_x += (x,)
12 output = self.concat(input_x)
13 return output
14
15 net = Net()
16 input_x1 = Tensor(np.random.rand(1, 4, 16, 16), mindspore.float32)
17 output = net(input_x1)
18 print(f"输出结果:{output.shape}")报错
这里报错信息如下:
Traceback (most recent call last):
File "demo.py", line 17, in <module>
output = net(input_x1)
File " /lib/python3.7/site-packages/mindspore/nn/cell.py", line 542, in __call__
out = self.compile_and_run(*args)
File " lib/python3.7/site-packages/mindspore/nn/cell.py", line 872, in compile_and_run
self.compile(*inputs)
File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 857, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/lib/python3.7/site-packages/mindspore/common/api.py", line 712, in compile
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: mindspore/ccsrc/pipeline/jit/static_analysis/evaluator.cc:100 EnterStackFrame] Exceed function call depth limit 1000, (function call depth: 1001, simulate call depth: 999).
It's always happened with complex construction of code or infinite recursion or loop.
Please check the code if it's has the infinite recursion or call 'context.set_context(max_call_depth=value)' to adjust this value.
If max_call_depth is set larger, the system max stack depth should be set larger too to avoid stack overflow.
For more details, please refer to the FAQ at https://www.mindspore.cn.
The function call stack (See file 'demo/rank_0/om/analyze_fail.dat' for more details):
\# 0 In file demo.py(10)
for i in range(n):
\# 1 In file /lib/python3.7/site-packages/mindspore/_extends/parse/standard_method.py(1441)
return it.__ms_hasnext__()
\# 2 In file /lib/python3.7/site-packages/mindspore/_extends/parse/standard_method.py(1860)
return len(xs) > 0
原因分析
在MindSpore1.6版本,在construct中创建和使用Tensor。如脚本中第15行代码所示。
接着看报错信息,在RuntimeError中,写到Exceed function call depth limit 1000, (function call depth: 1001, simulate call depth: 999),意思是超过函数调用深度限制1000,(函数调用深度:1001,模拟调用深度:999),这是由于函数默认调用最大限度设置为1000,继续看报错信息,写到Please check the code if it’s has the infinite recursion or call ‘context.set_context(max_call_depth=value)’ to adjust this value,即检查代码是否存在无限递归计算,或者调用context.set_context(max_call_depth=value),调整函数深度限制的默认设置。
基于上面已知的原因,很容易做出如下修改:
1 context.set_context(mode=context.GRAPH_MODE, max_call_depth=20000,device_target="Ascend")
2 class Net(nn.Cell):
3 def __init__(self):
4 super(Net, self).__init__()
5 self.concat = ops.Concat()
6
7 def construct(self, x):
8 n = 1000
9 input_x = ()
10 for i in range(n):
11 input_x += (x,)
12 output = self.concat(input_x)
13 return output
14
15 net = Net()
16 input_x1 = Tensor(np.random.rand(1, 196, 80, 38), mindspore.float32)
17 output = net(input_x1)
18 print(f"输出结果:{output.shape}")
此时执行成功,输出如下:
输出结果:(1000, 196, 80, 38)
定位报错问题的步骤:
1、 找到报错的用户代码行:output = net(input_x1);
2、 根据日志报错信息中的关键字,缩小分析问题的范围: Exceed function call depth limit 1000, (function call depth: 1001, simulate call depth: 999);
3、 根据报错提示信息进行修改默认设置,call ‘context.set_context(max_call_depth=value)’ to adjust this value;
4、 需要重点关注变量定义、初始化的正确性。