1 报错描述
1.1 系统环境
Environment(Ascend/GPU/CPU): GPU-GTX3090(24G)
Software Environment:
– MindSpore version (source or binary): 1.7.0
– Python version (e.g., Python 3.7.5): 3.8.13
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
– CUDA version : 11.0
1.2 基本信息
1.2.1脚本
此代码是ConvLSTM从PyTorch迁移到MindSpore的一部分,下面为报错部分
loss = train_network(data, label)
1.2.2报错
部分个人信息做遮挡处理
[WARNING] ME(124028:139969934345984,MainProcess):2022-07-23-20:21:12.940.089 [mindspore/run_check/_check_version.py:140]
MindSpore version 1.7.0 and cuda version 11.0.221 does not match, please refer to the installation guide for version ma
tching information: https://www.mindspore.cn/install
[CRITICAL] ANALYZER(124028,7f4d4a374700,python):2022-07-23-20:21:21.559.937 [mindspore/ccsrc/frontend/operator/composite
/multitype_funcgraph.cc:160] GenerateFromTypes] The 'sub' operation does not support the type [kMetaTypeNone, Tensor[Flo
at32]].
The supported types of overload function sub
is: [Tensor, List], [Tensor, Tuple], [List, Tensor], [Tuple, Tensor], [Te
nsor, Number], [Number, Tensor], [Tensor, Tensor], [Number, Number].
Traceback (most recent call last):
File "main.py", line 194, in
train()
File "main.py", line 142, in train
loss = train_network(data, label)
File "/home/xxxlab/anaconda2/envs/mindspore/lib/python3.8/site-packages/mindspore/nn/cell.py", line 586, in call
out = self.compile_and_run(*args)
File "/home/xxxlab/anaconda2/envs/mindspore/lib/python3.8/site-packages/mindspore/nn/cell.py", line 964, in compile_an
d_run
self.compile(*inputs)
File "/home/xxxlab/anaconda2/envs/mindspore/lib/python3.8/site-packages/mindspore/nn/cell.py", line 937, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/home/xxxlab/anaconda2/envs/mindspore/lib/python3.8/site-packages/mindspore/common/api.py", line 1006, in compil
e
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
RuntimeError: mindspore/ccsrc/frontend/operator/composite/multitype_funcgraph.cc:160 GenerateFromTypes] The 'sub' operat
ion does not support the type [kMetaTypeNone, Tensor[Float32]].
The supported types of overload function sub
is: [Tensor, List], [Tensor, Tuple], [List, Tensor], [Tuple, Tensor], [Te
nsor, Number], [Number, Tensor], [Tensor, Tensor], [Number, Number].
The function call stack (See file '/home/xxxlab/zrj/mindspore/ConvLSTM-PyTorch/conv/rank_0/om/analyze_fail.dat' for more
details):
0 In file /home/xxxlab/anaconda2/envs/mindspore/lib/python3.8/site-packages/mindspore/nn/wrap/cell_wrapper.py(373)
loss = self.network(*inputs)
^
1 In file /home/xxxlab/anaconda2/envs/mindspore/lib/python3.8/site-packages/mindspore/nn/wrap/cell_wrapper.py(112)
return self._loss_fn(out, label)
^
2 In file /home/xxxlab/anaconda2/envs/mindspore/lib/python3.8/site-packages/mindspore/nn/loss/loss.py(313)
x = F.square(logits - labels)
2 原因分析以及解决办法
原因直至Mindspore的loss,一开始我也很纳闷,mindspore的源代码我也不能修改,kMetaTypeNone又是什么类型呢?后来参考这篇文章知道了Mindspore分动静态图模式,默认好像是静态图模式,也就是所有的模型参数都要事先确定下来,不然不能构建静态图。
关于静态和动态图的区别,可以参考mindspore官方文档。具体而言,从我的角度就是静态图就是一开始建议完整个模型的计算图,这样子这“张”计算图就可以被重复利用了,不用每次都重新计算,提高计算速度,但这样显而易见的缺点就是可扩展性差。
但是我的模型需要我根据输入进行调整,在对这个报错修改后很多其他地方如MUL操作,也接连出现kMetaTypeNone的错误,这样治标不治本,况且只要模型不改,问题就不可能被解决。
在看了mindspore官方文档后发现mindspore原来是支持动态图的呀!嗨,因为原框架Pytorch就是动态图的,因此只需要将mindspore调整成动态图就行了,具体操作是添加下方代码:
context.set_context(mode=context.PYNATIVE_MODE)
3 总结
多看mindspore官方文档,深入了解框架原理及之间的区别,多利用社区。