TensorFlow网络模型迁移和训练
NPU是AI算力的发展势,但是目前训练和在线推理脚本大多还基于GPU。由于NPU与GPU的架构差异,基于GPU的训练和在线推理脚本不能直接在NPU上使用,需要转换为支持NPU的脚本后才能使用。
昇腾910 AI处理器是华为在2019年发布的人工智能(AI)专用的神经网络处理器,其算力高达256T,最新款算力高达310T,是业界主流处理器算力的2倍,可配套MindSpore训练框架。当前业界大多数训练脚本基于TensorFlow的Python API开发,默认运行在CPU/GPU/TPU上。为了使其能够利用昇腾910 AI处理器的澎湃算力执行训练,提升训练性能,我们需要对训练网络脚本进行相关的迁移适配工作。
Ascend平台提供了 Tensorflow 1.15
网络迁移工具,该工具将基于TensorFlow的Python API开发的训练脚本迁移到昇腾AI处理器上执行训练,并达到训练精度性能最优。该工具适用于原生的Tensorflow训练脚本迁移场景,AI算法工程师通过该工具分析原生的TensorFlow Python API和Horovod Python API在昇腾AI处理器上的支持度情况,同时将原生的TensorFlow训练脚本自动迁移成昇腾AI处理器支持的脚本。对于无法自动迁移的API,您可以参考工具输出的迁移报告,对训练脚本进行相应的适配修改。
基于TensorFlow的Python API开发的训练脚本迁移到昇腾AI处理器上执行训练,目前有两种迁移方式:自动迁移方式 和 手工迁移方式。其中,手工迁移方式较为复杂,建议优先使用自动迁移方式。
下载源码;
以 tensorflow版本的yolov3为例,下载源码 tensorflow-yolov3;
博主的仓库:https://gitee.com/lljyoyo1995/tensorflow-yolov3
# 待迁移的脚本
https://gitee.com/lljyoyo1995/tensorflow-yolov3.git
# 迁移后的脚本
https://gitee.com/lljyoyo1995/tensorflow-yolov3_npu.git
准备VOC数据集;
下载数据集
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
解压数据集,整理数据集
VOC # VOC数据集所在路径
├── test
| └──VOCdevkit
| └──VOC2007 (from VOCtest_06-Nov-2007.tar)
└── train
└──VOCdevkit
└──VOC2007 (from VOCtrainval_06-Nov-2007.tar)
└──VOC2012 (from VOCtrainval_11-May-2012.tar)
生成txt标注文件
生成 voc_train.txt
和voc_test.txt
标注文件。
python scripts/voc_annotation.py --data_path /PATH/TO/VOC
在CPU或者GPU上跑通代码;
上传代码到昇腾服务器,博主上传到ModelArts平台进行训练;
/home/ma-user/Ascend/tfplugin/latest/python/site-packages/npu_bridge/convert_tf2npu
[ma-user@notebook-87136e07-6a9a-4138-beec-742972f7b62f convert_tf2npu]$ ll
total 372
drwxr-x--- 2 ma-user ma-group 4096 Mar 4 18:20 __pycache__
-rw-rw-r-- 1 ma-user ma-group 37897 Mar 4 18:20 ast_impl.py
-rw-rw-r-- 1 ma-user ma-group 2954 Mar 4 18:20 conver.py
-rw-rw-r-- 1 ma-user ma-group 5020 Mar 4 18:20 conver_by_ast.py
-rw-rw-r-- 1 ma-user ma-group 9965 Mar 4 18:20 file_op.py
-rw-rw-r-- 1 ma-user ma-group 5925 Mar 4 18:20 main.py # linux环境
-rw-rw-r-- 1 ma-user ma-group 8769 Mar 4 18:20 main_win.py # windows环境
drwxr-x--- 2 ma-user ma-group 4096 Mar 4 18:20 mappings
-rw-rw-r-- 1 ma-user ma-group 255453 Mar 4 18:20 tf1.15_api_support_list.xlsx
-rw-rw-r-- 1 ma-user ma-group 4619 Mar 4 18:20 tf_func_def.py
-rw-rw-r-- 1 ma-user ma-group 6203 Mar 4 18:20 util.py
-rw-rw-r-- 1 ma-user ma-group 1299 Mar 4 18:20 util_global.py
-rw-rw-r-- 1 ma-user ma-group 5870 Mar 4 18:20 visit_by_ast.py
运行指令
cd /home/ma-user/Ascend/tfplugin/latest/python/site-packages/npu_bridge/convert_tf2npu
python3 main.py -i /root/models/official/resnet
cd /usr/local/Ascend/tfplugin/latest/python/site-packages/npu_bridge/convert_tf2npu
python main.py -i /PATH/TO/tensorflow-yolov3 -o /PATH/TO -r /PATH/TO
[ma-user convert_tf2npu]$python main.py -i /home/ma-user/work/MyDocuments/tensorflow-yolov3 -o /home/ma-user/work/MyDocuments -r /home/ma-user/work/MyDocuments
Begin conver, input file: /home/ma-user/work/MyDocuments/tensorflow-yolov3
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/image_demo.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/image_demo.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/from_darknet_weights_to_ckpt.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/from_darknet_weights_to_ckpt.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/LICENSE
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/evaluate.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/evaluate.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/video_demo.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/video_demo.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/convert_weight.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/convert_weight.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/freeze_graph.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/freeze_graph.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/README.md
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/LICENSE.fuck
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.gitignore
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/from_darknet_weights_to_pb.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/from_darknet_weights_to_pb.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/train.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/train.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/yolov3_coco.pb
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/yolov3_coco_demo.ckpt.index
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/checkpoint
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/yolov3_coco.ckpt.index
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/yolov3_coco_demo.ckpt.meta
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/yolov3_coco_demo.ckpt.data-00000-of-00001
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/yolov3_coco.ckpt.meta
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/.DS_Store
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/checkpoint/yolov3_coco.ckpt.data-00000-of-00001
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/main.py
No Tensorflow module is imported in script main.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/main.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/__init__.py
No Tensorflow module is imported in script __init__.py.
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/remove_space.py
No Tensorflow module is imported in script remove_space.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/remove_space.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/remove_delimiter_char.py
No Tensorflow module is imported in script remove_delimiter_char.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/remove_delimiter_char.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/find_class.py
No Tensorflow module is imported in script find_class.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/find_class.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/remove_class.py
No Tensorflow module is imported in script remove_class.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/remove_class.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/result.txt
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_pred_yolo.py
No Tensorflow module is imported in script convert_pred_yolo.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_pred_yolo.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/intersect-gt-and-pred.py
No Tensorflow module is imported in script intersect-gt-and-pred.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/intersect-gt-and-pred.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_keras-yolo3.py
No Tensorflow module is imported in script convert_keras-yolo3.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_keras-yolo3.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/README.md
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/class_list.txt
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_gt_xml.py
No Tensorflow module is imported in script convert_gt_xml.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_gt_xml.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/rename_class.py
No Tensorflow module is imported in script rename_class.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/rename_class.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_pred_darkflow_json.py
No Tensorflow module is imported in script convert_pred_darkflow_json.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_pred_darkflow_json.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_gt_yolo.py
No Tensorflow module is imported in script convert_gt_yolo.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/mAP/extra/convert_gt_yolo.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/scripts/show_bboxes.py
No Tensorflow module is imported in script show_bboxes.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/scripts/show_bboxes.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/scripts/voc_annotation.py
No Tensorflow module is imported in script voc_annotation.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/scripts/voc_annotation.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/data/dataset/voc_train.txt
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/data/dataset/voc_test.txt
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/data/classes/coco.names
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/data/classes/voc.names
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/data/log/events.out.tfevents.1657273471.LAPTOP-4DTD5D42
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/data/anchors/basline_anchors.txt
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/data/anchors/coco_anchors.txt
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/index
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/description
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/packed-refs
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/HEAD
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/config
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/refs/heads/master
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/refs/remotes/origin/HEAD
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/info/exclude
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/pre-receive.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/pre-push.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/commit-msg.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/pre-rebase.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/pre-commit.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/post-update.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/pre-merge-commit.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/push-to-checkout.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/pre-applypatch.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/update.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/prepare-commit-msg.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/fsmonitor-watchman.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/hooks/applypatch-msg.sample
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/objects/pack/pack-e9fd50bffc7fd46014db51ca22d5e478d1f7c993.pack
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/objects/pack/pack-e9fd50bffc7fd46014db51ca22d5e478d1f7c993.idx
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/logs/HEAD
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/logs/refs/heads/master
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.git/logs/refs/remotes/origin/HEAD
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.idea/tensorflow-yolov3.iml
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.idea/workspace.xml
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.idea/.gitignore
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.idea/misc.xml
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.idea/modules.xml
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.idea/vcs.xml
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/docs/requirements.txt
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/docs/Box-Clustering.ipynb
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/docs/images/.jpg
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/docs/images/road.mp4
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/docs/images/road.jpeg
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/docs/images/611_result.jpg
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/.github/FUNDING.yml
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/dataset.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/dataset.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/common.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/common.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/yolov3.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/yolov3.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__init__.py
No Tensorflow module is imported in script __init__.py.
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/config.py
No Tensorflow module is imported in script config.py.
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/config.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/utils.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/utils.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/backbone.py
Finish conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/backbone.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__pycache__/common.cpython-36.pyc
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__pycache__/__init__.cpython-36.pyc
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__pycache__/yolov3.cpython-36.pyc
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__pycache__/dataset.cpython-36.pyc
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__pycache__/utils.cpython-36.pyc
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__pycache__/config.cpython-36.pyc
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/core/__pycache__/backbone.cpython-36.pyc
1.In brief: Total API: 200, in which Support: 187, API support after migration: 4, Network training support after migration: 9, Not support but no impact on migration: 0, Not support or recommended: 0, Compatible: 0, Deprecated: 0, Analysing: 0
2.After eliminate duplicate: Total API: 63, in which Support: 60, API support after migration: 2, Network training support after migration: 1, Not support but no impact on migration: 0, Not support or recommended: 0, Compatible: 0, Deprecated: 0, Analysing: 0
Finish conver, output file: /home/ma-user/work/MyDocuments; report file: /home/ma-user/work/MyDocuments/report_npu_20220712170053
tensorflow-yolov3 # 被迁移的原始脚本路径
report_npu_20220712164226 # 生成的迁移报告路径
tensorflow-yolov3_npu_20220712164226 # 迁移后的脚本路径
设置环境变量
# 昇腾设备安装部署软件包Ascend-cann-nnae
source /usr/local/Ascend/nnae/set_env.sh
# tfplugin包依赖
source /usr/local/Ascend/tfplugin/set_env.sh
拉起训练进程
python train.py
Begin conver file: /home/ma-user/work/MyDocuments/tensorflow-yolov3/from_darknet_weights_to_ckpt .py
SyntaxError('invalid syntax', ('' , 5, 45, "darknet_weights = ''\n" ))
ERROR:There is a format problem in the script, please check the python code specification or whe ther it is converted to a linux file through 'dos2unix'
解决办法:
修改错误的代码,重新迁移
其他类似的问题,也可以按照这个思路解决
darknet_weights = ''
改为
darknet_weights = ''
华为ModelArts训练Alexnet模型
tensorflow.python.framework.errors_impl.InternalError: The input shape of GeOp5_0 is dynamic, please ensure that npu option[dynamic_input] is set correctly, for more details please refer to the migration guide.
[[{{node GeOp5_0}}]]
解决办法:
在迁移后的train.py中添加
config = tf.ConfigProto(allow_soft_placement=True)
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["dynamic_input"].b = True
custom_op.parameter_map["dynamic_graph_execute_mode"].s = tf.compat.as_bytes("lazy_recompile")