TF 1.x 训练脚本迁移
实验内容及目标
昇腾910 AI处理器(下文简称NPU)是华为在2019年发布的人工智能(AI)专用的神经网络处理器,其算力高达256T,最新款算力高达310T,是业界主流芯片算力的2倍。当前业界大多数训练脚本基于TensorFlow的Python API开发,默认运行在CPU/GPU/TPU上,为了使其能够利用昇腾910 AI处理器的澎湃算力执行训练,提升训练性能,我们需要对训练网络脚本进行简单的迁移适配工作。当前昇腾910 AI处理器上支持TensorFlow 1.15的三种API开发的训练脚本迁移:分别是Estimator,Sess.run,Keras。 本实验以一个Sess.run的手写数字分类网络为例,介绍如何迁移TensorFlow 1.15训练脚本,以支持NPU训练。
运行环境配置
本环境默认分配的是CPU资源规格,我们需要NPU资源场景 ,因此需要在界面右上角区域点击【切换规格】,将资源切换成Ascend:1*Ascend 910目标规格。
切换前后资源规格状态如下图所示。
下载训练数据集
直接执行以下命令下载数据集
!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/t10k-images.idx3-ubyte
!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/t10k-labels.idx1-ubyte
!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/train-labels.idx1-ubyte
!wget -N -P /home/ma-user/work/Data https://modelarts-train-ae.obs.cn-north-4.myhuaweicloud.com/train/Data/train-images.idx3-ubyte
导入库文件
要使基于TensorFlow开发的训练脚本在昇腾910 AI处理器上训练,需要借助Tensorflow框架适配插件(即TF Adapter),TF Adapter中提供了适配Tensorflow框架的用户Python接口,用于CANN软件与TensorFlow框架对接。因此在训练之前,需要在训练代码中增加:from npu_bridge.npu_init import *,导入相关库文件。
import tensorflow as tf
import numpy as np
import struct
import os
import time
from npu_bridge.npu_init import *
处理MNIST数据集
此部分代码一般无需改造。
#加载数据集
def load_image_set(filename):
print ("load image set",filename)
binfile = open(filename, 'rb') # 读取二进制文件
buffers = binfile.read()
head = struct.unpack_from('>IIII', buffers, 0) # 读取前四个整数,返回一个元组
offset = struct.calcsize('>IIII') # 定位到data开始的位置
image_num = head[1] # 获取图片数量
width = head[2]
height = head[3]
bits = image_num * width * height
bits_string = '>' + str(bits) + 'B' # fmt格式:'>47040000B'
imgs = struct.unpack_from(bits_string, buffers, offset) # 取data数据,返回一个元组
binfile.close()
imgs = np.reshape(imgs, [image_num, width * height]) # reshape为[60000,784]型的数组
print ("load imgs finished")
return imgs, head
#加载标签
def load_label_set(filename):
print ("load lable set",filename)
binfile = open(filename, 'rb') # 读取二进制文件
buffers = binfile.read()
head = struct.unpack_from('>II', buffers, 0) # 读取label文件前两个整形数
label_num = head[1]
offset = struct.calcsize('>II') # 定位到label数据开始的位置
num_string = '>' + str(label_num) + 'B' # fmt格式:'>60000B'
labels = struct.unpack_from(num_string, buffers, offset) # 取label数据
binfile.close()
labels = np.reshape(labels, [label_num])
print ("load lable finished")
return labels, head
# 手动one_hot编码
def encode_one_hot(labels):
num = labels.shape[0]
res = np.zeros((num, 10))
for i in range(num):
res[i, labels[i]] = 1 # labels[i]表示0,1,2,3,4,5,6,7,8,9,则 对应的列是1,这就是One-Hot编码
return res
train_image = '/home/ma-user/work/Data/train-images.idx3-ubyte'
train_label = '/home/ma-user/work/Data/train-labels.idx1-ubyte'
test_image = '/home/ma-user/work/Data/t10k-images.idx3-ubyte'
test_label ='/home/ma-user/work/Data/t10k-labels.idx1-ubyte'
imgs, data_head = load_image_set(train_image)
# 这里的label是60000个数字,需要转成one-hot编码
labels, labels_head = load_label_set(train_label)
test_images, test_images_head = load_image_set(test_image)
test_labels, test_labels_head = load_label_set(test_label)
模型搭建/计算Loss/梯度更新
此部分代码一般无需改造。
# 定义参数
learning_rate = 0.01
training_epoches = 10
bacth_size = 100 # mini-batch
display_step = 2 # display once every 2 epochs
# tf graph input
x = tf.placeholder(tf.float32, [None, 784]) # 28 * 28 = 784
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 ==> 10 classes
# 定义模型参数
W = tf.Variable(tf.zeros([784, 10])) # tf.truncated_normal()
b = tf.Variable(tf.zeros([10]))
# 构建模型
prediction = tf.nn.softmax(tf.matmul(x, W) + b)
loss = tf.reduce_mean(-tf.reduce_sum(y * tf.log(tf.clip_by_value(prediction,1e-8,1.0)), reduction_indices=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
init = tf.global_variables_initializer()
res = encode_one_hot(labels)
print("res", res)
total_batchs = int(data_head[1] / bacth_size)
print("total_batchs:", total_batchs)
创建session并执行训练
我们需要在创建session前添加如下配置,创建config并添加custom_op:
config = tf.ConfigProto()
custom_op = config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
创建好的config作为session config传给tf.Session,使得训练能够在NPU上执行,sess.run代码无需修改。
#训练
def train():
with tf.Session(config=config) as sess:
sess.run(init)
for epoch in range(training_epoches):
start_time = time.time()
avg_loss = 0.
total_batchs = int(data_head[1] / bacth_size) # data_head[1]是图片数量
for i in range(total_batchs):
batch_xs = imgs[i * bacth_size: (i + 1) * bacth_size, 0:784]
batch_ys = res[i * bacth_size: (i + 1) * bacth_size, 0:10]
_, l = sess.run([optimizer, loss], feed_dict={x: batch_xs, y: batch_ys})
# 计算平均损失
avg_loss += l / total_batchs
end_time = time.time()
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch), "loss=", "{:.9f}".format(avg_loss), "time=", "{:.3f}".format(end_time-start_time) )
print("Optimization Done!")
correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy:", accuracy.eval({x: test_images, y: encode_one_hot(test_labels)}))
train()
cout:
Epoch: 0000 loss= 6.713961569 time= 23.566
Epoch: 0002 loss= 5.096043814 time= 15.108
Epoch: 0004 loss= 5.052241913 time= 14.907
Epoch: 0006 loss= 4.953291978 time= 15.157
Epoch: 0008 loss= 4.819878323 time= 14.910
Optimization Done!
Accuracy: 0.74480003
现在,这个MNIST模型的准确度已经达到 74%。 在创建session时,除了以上基本配置外,用户还可以通过相关配置,在NPU上使能混合计算、Profiling性能数据采集、训练迭代训练下沉等能力,想要了解更多,请登录昇腾社区(https://www.hiascend.com) 阅读相关文档。