ACL TF 1.x 训练脚本迁移

TF 1.x 训练脚本迁移
实验内容及目标
昇腾910 AI处理器(下文简称NPU)是华为在2019年发布的人工智能(AI)专用的神经网络处理器,其算力高达256T,最新款算力高达310T,是业界主流芯片算力的2倍。当前业界大多数训练脚本基于TensorFlow的Python API开发,默认运行在CPU/GPU/TPU上,为了使其能够利用昇腾910 AI处理器的澎湃算力执行训练,提升训练性能,我们需要对训练网络脚本进行简单的迁移适配工作。当前昇腾910 AI处理器上支持TensorFlow 1.15的三种API开发的训练脚本迁移:分别是Estimator,Sess.run,Keras。 本实验以一个Sess.run的手写数字分类网络为例,介绍如何迁移TensorFlow 1.15训练脚本,以支持NPU训练。
运行环境配置
本环境默认分配的是CPU资源规格,我们需要NPU资源场景 ,因此需要在界面右上角区域点击【切换规格】,将资源切换成Ascend:1*Ascend 910目标规格。

切换前后资源规格状态如下图所示。

下载训练数据集
直接执行以下命令下载数据集

!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/t10k-images.idx3-ubyte
!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/t10k-labels.idx1-ubyte
!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/train-labels.idx1-ubyte
!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/train-images.idx3-ubyte

导入库文件
要使基于TensorFlow开发的训练脚本在昇腾910 AI处理器上训练,需要借助Tensorflow框架适配插件(即TF Adapter),TF Adapter中提供了适配Tensorflow框架的用户Python接口,用于CANN软件与TensorFlow框架对接。因此在训练之前,需要在训练代码中增加:from npu_bridge.npu_init import *,导入相关库文件。

import tensorflow as tf
import numpy as np
import struct
import os
import time 
from npu_bridge.npu_init import *

处理MNIST数据集
此部分代码一般无需改造。

#加载数据集
def load_image_set(filename):
    print ("load image set",filename)
    binfile = open(filename, 'rb')  # 读取二进制文件
    buffers = binfile.read()
    head = struct.unpack_from('>IIII', buffers, 0)  # 读取前四个整数,返回一个元组
    offset = struct.calcsize('>IIII')  # 定位到data开始的位置
    image_num = head[1]  # 获取图片数量
    width = head[2]
    height = head[3]
    bits = image_num * width * height 
    bits_string = '>' + str(bits) + 'B'  # fmt格式:'>47040000B'
    imgs = struct.unpack_from(bits_string, buffers, offset)  # 取data数据,返回一个元组
    binfile.close()
    imgs = np.reshape(imgs, [image_num, width * height])  # reshape为[60000,784]型的数组
    print ("load imgs finished")
    return imgs, head

#加载标签
def load_label_set(filename):
    print ("load lable set",filename)
    binfile = open(filename, 'rb')  # 读取二进制文件
    buffers = binfile.read()
    head = struct.unpack_from('>II', buffers, 0)  # 读取label文件前两个整形数
    label_num = head[1]
    offset = struct.calcsize('>II')  # 定位到label数据开始的位置
    num_string = '>' + str(label_num) + 'B'  # fmt格式:'>60000B'
    labels = struct.unpack_from(num_string, buffers, offset)  # 取label数据
    binfile.close()
    labels = np.reshape(labels, [label_num])
    print ("load lable finished")
    return labels, head

# 手动one_hot编码
def encode_one_hot(labels):
    num = labels.shape[0]
    res = np.zeros((num, 10))
    for i in range(num):
        res[i, labels[i]] = 1  # labels[i]表示0,1,2,3,4,5,6,7,8,9,则 对应的列是1,这就是One-Hot编码
    return res

train_image = '/home/ma-user/work/Data/train-images.idx3-ubyte'
train_label = '/home/ma-user/work/Data/train-labels.idx1-ubyte'
test_image = '/home/ma-user/work/Data/t10k-images.idx3-ubyte'
test_label ='/home/ma-user/work/Data/t10k-labels.idx1-ubyte'
imgs, data_head = load_image_set(train_image)

# 这里的label是60000个数字,需要转成one-hot编码
labels, labels_head = load_label_set(train_label)
test_images, test_images_head = load_image_set(test_image)
test_labels, test_labels_head = load_label_set(test_label)

模型搭建/计算Loss/梯度更新
此部分代码一般无需改造。

# 定义参数
learning_rate = 0.01
training_epoches = 10
bacth_size = 100  # mini-batch
display_step = 2 # display once every 2 epochs

# tf graph input
x = tf.placeholder(tf.float32, [None, 784])  # 28 * 28 = 784
y = tf.placeholder(tf.float32, [None, 10])  # 0-9 ==> 10 classes

# 定义模型参数
W = tf.Variable(tf.zeros([784, 10]))  # tf.truncated_normal()
b = tf.Variable(tf.zeros([10]))

# 构建模型
prediction = tf.nn.softmax(tf.matmul(x, W) + b)
loss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(tf.clip_by_value(prediction,1e-8,1.0)), reduction_indices=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
init = tf.global_variables_initializer()
res = encode_one_hot(labels)
print("res", res)
total_batchs = int(data_head[1] / bacth_size)
print("total_batchs:", total_batchs)

创建session并执行训练
我们需要在创建session前添加如下配置,创建config并添加custom_op:

config = tf.ConfigProto()
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF

创建好的config作为session config传给tf.Session,使得训练能够在NPU上执行,sess.run代码无需修改。

#训练
def train():
    with tf.Session(config=config) as sess:
            sess.run(init)
            for epoch in range(training_epoches):
                start_time = time.time()
                avg_loss = 0.
                total_batchs = int(data_head[1] / bacth_size)  # data_head[1]是图片数量

                for i in range(total_batchs):
                    batch_xs = imgs[i * bacth_size: (i + 1) * bacth_size, 0:784]
                    batch_ys = res[i * bacth_size: (i + 1) * bacth_size, 0:10]

                    _, l = sess.run([optimizer, loss], feed_dict={x: batch_xs, y: batch_ys})

                    # 计算平均损失
                    avg_loss += l / total_batchs
                end_time = time.time()
                if epoch % display_step == 0:
                    print("Epoch:", '%04d' % (epoch), "loss=", "{:.9f}".format(avg_loss), "time=", "{:.3f}".format(end_time-start_time) )

            print("Optimization Done!")

            correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

            print("Accuracy:", accuracy.eval({x: test_images, y: encode_one_hot(test_labels)}))
train()

cout:

Epoch: 0000 loss= 6.713961569 time= 23.566
Epoch: 0002 loss= 5.096043814 time= 15.108
Epoch: 0004 loss= 5.052241913 time= 14.907
Epoch: 0006 loss= 4.953291978 time= 15.157
Epoch: 0008 loss= 4.819878323 time= 14.910
Optimization Done!
Accuracy: 0.74480003

现在,这个MNIST模型的准确度已经达到 74%。 在创建session时,除了以上基本配置外,用户还可以通过相关配置,在NPU上使能混合计算、Profiling性能数据采集、训练迭代训练下沉等能力,想要了解更多,请登录昇腾社区(https://www.hiascend.com) 阅读相关文档。

你可能感兴趣的:(GPU_by_ACL,python,人工智能,服务器)