MindSpore常见问题主要包括数据加载与处理问题、网络构建与训练问题以及分布式并行配置问题等。MindSpore在不同场景下,积累了大量常见的问题案例。以下是对应的案例列表。
1、数据集加载与增强报错案例
MindSpore数据集加载-调试小工具 py-spy
【MindData】如何将自有数据高效的生成MindRecord格式数据集,并且防止爆内存。
【MindData】在使用Dataset处理数据过程中内存占用高,怎么优化
如何处理数据集加载多进程(multiprocessing)错误
MindRecord-Windows下中文路径问题Unexpected error. Failed to open file
MindRecord-Windows下数据集报错Invalid file, DB file can not match
MindSpore数据集加载-GeneratorDataset卡住、卡死
GeneratorDataset报错误Unexpected error. Invalid data type
GeneratorDataset数据处理报错:The pointer[cnode] is null
MindSpore数据集报错【The data pipeline is not a tree】
MindSpore数据增强报错:Use Decode() for encoded data or ToPIL()
MindSpore自定义数据增强报错args should be Numpy narray.Got class tuple
MindSpore数据集加载-GeneratorDataset功能及常见问题
MindSpore数据集加载-【too many open files】错误
MindSpore数据增强-【 Invalid with type】错误
MindSpore数据集格式-【MindRecord File could not open successfully】
MindSpore数据集加载-【 has no attribute 'child_sampler'】
MindSpore数据集加载-【'DictIterator' has no attribute 'get_next'】错误
MindSpore数据集加载-【IndexError: list index out of range】错误
MindSpore报错: Unexpected error. Inconsistent batch ..
MindSpore数据增强: 内存不足,自动退出
MindSpore报错: Invalid data, Page size.
MindSpore报错: parse() missing 1 required positional.
MindSpore报错: Unable to data from Generator..
MindSpore报错:GeneratorDataset's num_workers=8, this value is ...
MindSpore报错: Exception thrown from PyFunc.
MindSpore报错: Unexpected error. Invalid data.
图像类型错误导致执行报错:TypeError: img should be PIL image or NumPy array. Got .
通道顺序错误引起matplotlib.image.imsave执行报错:raise ValueError("Third dimension must be 3 or 4")
cv2.imwrite保存Tensor引起类型报错:cv2.error: OpenCV(4.6.0) :-1: error: (-5:Bad argument) in function 'imwrite'
cv2保存图片类型错误执行报错cv2. error: OpenCV(4.6.0) :-1: error: (-5:Bad argument) in function 'imwrite' - img is not a numpy array
2、网络构建与训练报错案例
MindSpore报错: Exceed function call depth limit 1000解决
MindSpore报错: 'Tensor' has no attribute 'data'
MindSpore报错: 'NoneType' has no attribute 'name'
Mindspore报错: the dimension of logits must be equal to 2
MindSpore报错:For 'Pad',all elements of paddings must >=0
MindSpore报错:For primitive[BinaryCrossEntropy], the x shape must be equal to
MindSpore报错: For 'Tensor',the type of input_data should be one of 'Tensor', 'ndarray' ...
MindSpore报错: Data type conversion of 'Parameter' is not supported
MindSpore报错: ScalarAdd不支持bool类型
MindSpore报错: ReduceSum算子不支持8维及以上的输入
MindSpore报错: Unsupported parameter type for python primitive..
MindSpore报错:Primitive ScatterAdd's bprop not defined
MindSpore报错:For 'Conv2d',x shape的C_in除以group应等于weight的C_in
MindSpore报错:For 'Reshape', the shape of 'input_x' is ...,but product
MindSpore报错:For 'Sub', x.shape and y.shape are supposed to broadcast
MindSpore报错:For 'MatMul', the input dimensions must be equal,
MindSpore报错:Cann't select valid kernel[Default/Pow-op]
MindSpore报错:rank in NLLLoss should be int and must be
MindSpore报错:operator[StridedSlice] input(kNumberTypeUInt16) output(kNumberTypeUInt16) is not supported.
MindSpore报错:Select GPU kernel op * fail! Incompatible data type
MindSpore报错:For 'MirrorPad', paddings must be a Tensor with
MindSpore报错:For primitive[TensorSummary], the v rank must be greater than 0
MindSpore报错:For 'CellList', each cell should be subclass of Cell
MindSpore报错:task_fail_info or current_graph_ is nullptr
MindSpore报错:`padding_idx` in Embedding 超出范围的报错
MindSpore报错:seed2 in `StandardNormal` should be int and >=0
MindSpore报错:ReduceMean不支持8维及其以上的输入
MindSpore报错:output_shape shape element [2] must be positive integer or SHP_ANY
MindSpore报错:For 'TopK', the type of 'x' should be ...
MindSpore报错:Minimum inputs size 0 does not match...
MindSpore报错:`half_pixel_centers`=True only support in Ascend
MindSpore报错:For 'AvgPool' 输出的shape每一维都要大于零
MindSpore报错:The function `construct` need xx positional argument ...
MindSpore报错:Operator[ ] input( ) output( ) is not support
MindSpore报错:sth should be initialized as a Parameter type in ...
MindSpore报错:"operation does not support the type kMetaTypeNone"
MindSpore报错:When '...' by using Fallback feature, an error occurred
MindSpore报错: Net parameters weight shape xxx i
MindSpore GPU设备算力不足导致计算结果错误
MindSpore报错:设置卡失败 SetDevice failed
mindspore报错:Total stream number xxx exceeds the limit of ...
MindSpore报错:[LoadTask] Distribute Task Failed
MindSpore报错:GPU训练提示分配流失败[cudaStreamCreate failed]
MindSpore报错:出现内存申请大小为0的错误[The memory alloc size is 0]
MindSpore报错:Please input the correct checkpoint
MindSpore告警:"xxx parameters in the net are not
MindSpore报错:For 'context.set_context', package type xxx support 'devic
MindSpore报错:PyNative模式下The pointer[top_cell_] is null
MindSpore报错:Malloc device memory failed
MindSpore报错: Please input the correct checkpoint
MindSpore求导传入sens值报错: For 'MatMul', the input dimensions
没有ckpt文件导致模型加载执行报错:please check whether the 'ckpt_file_name' is correct
Tensor张量shape不匹配导致执行报错:ValueError: For 'Sub', x.shape and y.shape are supposed to broadcast
自定义loss没有继承nn.Cell导致执行报错:ParseStatement Unsupported statement 'Try'.
参数类型错误导致优化器执行报错:TypeError: For 'Optimizer', the argument parameters must be Iterable type
形参与实参的不对应导致ops.GradOperation执行报错:The parameters number of the function is 2, but the number of provided arguments is 1.
return回来的参数承接问题导致执行报错:AttributeError: 'tuple' object has no attribute 'asnumpy'
维度数错误引起模型输入错误:For primitive Conv2D, the x shape size must be equal to 4, but got 3.
报错:The value parameter,it's name 'xxxx' already exsts. please set a unique name for the parameter .
construct方法名称错误引起损失函数执行报错:The 'sub' operation does not support the type TensorFloat32, None.
RuntimeError:The 'sub' operation does not support the type TensorFloat32, None.
注释不当报错:There are incorrect indentations in definition or comment of function: 'Net.construct'.
静态图执行卡死问题:For MakeTuple, the inputs should not be empty..node:xxxxxxxxxxxx
mindspore报错: module() takes at most 2 arguments (3 given)
split后For 'Mul', x.shape and y.shape are supposed to broadcast
The operation does not support the type kMetaTypeNone, Tesor...
3、分布式并行报错案例
报错:the value of strategy must be the power of 2, but get xx
报错:PyNative Only support STAND_ALONE, DATA_PARALLEL...
报错:Parallel mode dose not support
报错:davinci_model : load task fail, return ret
报错:connect returned Connection timed out