【MindSpore两日训练营第五期笔记】开头结尾三份作业总结篇

转载地址:https://bbs.huaweicloud.com/forum/thread-106733-1-1.html

作者:skywalk

非常高兴参加了第五期两日训练营,收获颇多,而且需要慢慢消化的东西很多。本来雄心壮志,想要完成11个作业,拿到神秘大奖看看。我只是好奇神秘大奖是什么,老天就满足下我的好奇心吧。

可惜天不如人愿,这次的训练营做作业的时间跟昇腾模型大赛的时间有冲突,主要是本人能力有限,调模型调的昏天暗地,导致根本没有时间做作业。最近几天有时间了,mindspore环境又出问题,导致根本无法完成作业。今天早晨凌晨5点多起来做第11个作业,现在mindspore环境终于好了,把上个月做好的作业1和作业2传代码截图,勉强算做好3个作业。

感慨的是,大部分作业无法用mac本完成,这样无形中增加了完成作业的难度。好消息是现在手上能凑到两个mindspore环境,而且在写这篇文档的时候,突然想到colab应该也可以支持mindspore的,回头上去试试。 事实证明多学习,多写学习总结,不光能提高学习效率,还能有拓展思路的作用啊! 

头两个作业操作思路是一样的:训练模型,生成ckpt模型存盘文件,然后写一个转换成MindIR格式的python文件,执行格式转换工作。 看着很简单,具体操作起来还是要一番折腾的。

参考资料:AI框架中图层IR的分析

https://zhuanlan.zhihu.com/p/263420069

第一个作业 lenet

首先准备运行lenet,先处理好mnist数据集:

model_user14@34ffafe8-aacb-4c1f-87ed-966c50fed78d:~/ms/mindspore/model_zoo/official/cv/lenet/data$ gzip -d train-*

model_user14@34ffafe8-aacb-4c1f-87ed-966c50fed78d:~/ms/mindspore/model_zoo/official/cv/lenet/data$ ls

''$'\200\353'              ''$'\374''ϐ993'                  P                           train-images-idx3-ubyte

''$'\375\374\374\261\226'  ''$'\030\376\376''='             t10k-images-idx3-ubyte.gz   train-labels-idx1-ubyte

''$'\367\360\305\027'      '$_'$'\233\355\375\375\262'')'   t10k-labels-idx1-ubyte.gz

model_user14@34ffafe8-aacb-4c1f-87ed-966c50fed78d:~/ms/mindspore/model_zoo/official/cv/lenet/data$ gzip -d t10k-*

model_user14@34ffafe8-aacb-4c1f-87ed-966c50fed78d:~/ms/mindspore/model_zoo/official/cv/lenet/data$ 

执行训练:python train.py --data_path Data

报错:路径没权限或不存在

原来是路径用了大小字母啊,重新来一次:

报错:

model_user14@34ffafe8-aacb-4c1f-87ed-966c50fed78d:~/ms/mindspore/model_zoo/official/cv/lenet$ python lenet2mindir.py 

[WARNING] ME(34535:281473740427280,MainProcess):2021-01-25-00:54:32.303.511 [mindspore/_check_version.py:207] MindSpore version 1.1.0 and "te" wheel package version 1.0 does not match, reference to the match info on: https://www.mindspore.cn/install

MindSpore version 1.1.0 and "topi" wheel package version 0.6.0 does not match, reference to the match info on: https://www.mindspore.cn/install

Traceback (most recent call last):

  File "lenet2mindir.py", line 17, in

    load_checkpoint("ckpt/checkpoint_lenet-10_1875.ckpt", net=lenetwork())

  File "/usr/local/lib/python3.7/site-packages/mindspore/nn/cell.py", line 360, in __call__

    output = self.construct(*cast_inputs, **kwargs)

TypeError: construct() missing 1 required positional argument: 'x'

原来是抄写代码的时候写错了:

load_checkpoint("ckpt/checkpoint_lenet-10_1875.ckpt", net=lenetwork)

写成了load_checkpoint("ckpt/checkpoint_lenet-10_1875.ckpt", net=lenetwork()) 

修改代码,把括号去掉。

报错:

Traceback (most recent call last):

  File "lenet2mindir.py", line 19, in

    export(lenetwork, Tensor(input), file_name="lenet", file_format="MINDIR")

  File "/usr/local/lib/python3.7/site-packages/mindspore/train/serialization.py", line 537, in export

    _export(net, file_name, file_format, *inputs)

  File "/usr/local/lib/python3.7/site-packages/mindspore/train/serialization.py", line 575, in _export

    graph_id, _ = _executor.compile(net, *inputs, phase=phase_name, do_convert=False)

  File "/usr/local/lib/python3.7/site-packages/mindspore/common/api.py", line 502, in compile

    result = self._executor.compile(obj, args_list, phase, use_vm)

  File "/usr/local/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 388, in __infer__

    out[track] = fn(*(x[track] for x in args))

  File "/usr/local/lib/python3.7/site-packages/mindspore/ops/operations/nn_ops.py", line 1208, in infer_shape

    Rel.EQ, self.name)

  File "/usr/local/lib/python3.7/site-packages/mindspore/_checkparam.py", line 206, in check

    raise excp_cls(f'{msg_prefix} `{arg_name}` should be {rel_str}, but got {arg_value}.')

ValueError: For 'Conv2D' the `x_shape[1] / group` should be == w_shape[1]: 1, but got 3.

model_user14@34ffafe8-aacb-4c1f-87ed-966c50fed78d:~/ms/mindspore/model_zoo/official/cv/lenet$ 

把shape改成1:size=[32, 1, 224, 224]

新的报错:

Traceback (most recent call last):

  File "lenet2mindir.py", line 19, in

    export(lenetwork, Tensor(input), file_name="lenet", file_format="MINDIR")

  File "/usr/local/lib/python3.7/site-packages/mindspore/train/serialization.py", line 537, in export

    _export(net, file_name, file_format, *inputs)

  File "/usr/local/lib/python3.7/site-packages/mindspore/train/serialization.py", line 575, in _export

    graph_id, _ = _executor.compile(net, *inputs, phase=phase_name, do_convert=False)

  File "/usr/local/lib/python3.7/site-packages/mindspore/common/api.py", line 502, in compile

    result = self._executor.compile(obj, args_list, phase, use_vm)

  File "/usr/local/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 388, in __infer__

    out[track] = fn(*(x[track] for x in args))

  File "/usr/local/lib/python3.7/site-packages/mindspore/ops/operations/math_ops.py", line 753, in infer_shape

    + f', x2 shape {x2}(transpose_b={self.transpose_b}).')

ValueError: For 'MatMul' evaluator shapes of inputs can not do this operator, got 44944 and 400, with x1 shape [32, 44944](transpose_a=False), x2 shape [120, 400](transpose_b=True).

再次修改大小:

input = np.random.uniform(0.0, 1.0, size=[1, 1, 32, 32]).astype(np.float32)

终于pass了 

最终定稿代码:

import numpy as np

from src.lenet import LeNet5

from mindspore import Tensor, export, load_checkpoint, load_param_into_net

# 定义网络

lenetwork = LeNet5(10)

load_checkpoint("ckpt/checkpoint_lenet-10_1875.ckpt", net=lenetwork)

input = np.random.uniform(0.0, 1.0, size=[16, 1, 32, 32]).astype(np.float32)

export(lenetwork, Tensor(input), file_name="lenet", file_format="MINDIR")

第二个作业Resnet50

Resnet50的步骤与lenet基本类似:

将cifar10数据集放到/dataset/cifar-10-batches-bin 目录了 

python train.py --net=$1 --dataset=$2 --dataset_path=$PATH1

用这条命令启动:python train.py --net="resnet50" --dataset="cifar10" --dataset_path="dataset/cifar-10-batches-bin"

看了一下src/config.py文件,里面epoch的设置为:

"epoch_size": 90

用npu训练的时候发现1个epoch大约需要30秒,这样一共大约需要45分钟。 

相比lenet,这个需要的时间更多,因此需要npu或者gpu才能更好的完成。

第11个作业

最后一个作业是在华为云模拟器里面完成:

作业内容:请完成华为云可视化调试调优实验(环境限制,此实验中不包括调试器实操), 提供实验报告截图,所有步骤状态应为完成。

作业链接:https://lab.huaweicloud.com/testdetail.html?testId=464 

只要按照提示一步一步的操作,就可以完成。发现华为云MindSpore的可视化非常棒,可以为调试网络提供很大的助力!

因为时间有限,就先写到这里了。

学无止境,让我们与华为一起前行!

你可能感兴趣的:(【MindSpore两日训练营第五期笔记】开头结尾三份作业总结篇)