课程出处,请点击这里
(Credit https://aistudio.baidu.com/aistudio/education/lessonvideo/530426)
Place
来指定执行的设备。指定GPU实例:place=fluid.CUDAPlace(0)
CompiledProgram.with_data_parallel
,生成一个可以多卡并发的训练图CompiledProgram.with_optimized_inference
,生成一个优化后的预测图Executor
执行:Executor.run(compiled_program)
import paddle
from paddle import fluid
# Add to input variables to program to accept data inputs.
images = fluid.layers.data(name='pixel', shape=[1,28,28], dtyp='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
# constrct the model with input variables.
conv_pool_1 = fluid.nets.simple_img_conv_pool(
input=images, filter_size=5, num_filters=20,
pool_size=2, pool_stride=2, act='relu'
)
conv_pool_2 = fluid.nets.simple_img_conv_pool(
input=conv_pool_1, filter_size=5, num_filters=50,
pool_size=2, pool_stride=2, act='relu'
)
SIZE = 10
input_shape = conv_pool_2.shape
param_shape = [reduce(lambda a,b: a*b, input[1:], 1)] + [SIZE]
scale = (2.0 / (param_shape[0]**2*SIZE))**0.5
predict = fluid.layers.fc(
input=conv_pool_2, size=SIZE, act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Normallnitializer(
loc=0.0, scale=scale
)
)
)
# Calculate cost and use optimize to minimize it.
cost = fluid.layers.cross_entropy(input=predict, label=label)
avg_cost = fluid.layers.mean(x=cost)
opt = fluid.optimizer.Adamptimizer(
learning_rate=0.001, beta1=0.9, beta2=0.999
)
opt.minimize(avg_cost)
# read data in batches.
reader = paddle.dataset.minist.train()
batched_reader = paddle.batch(reader, batch_size=32)
# Create execute. Use fluid.CPUPlace() if gpu is not available.
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
feeder = fluid.DataFeeder(feed_list=[images, label], place=place)
# initialize the model.
exe.run(fluid.default_startup_program())
# feed data and execute the train program. We see cost going down, Yeah.
for i, data in zip(xrange(10), batched_reader()):
loss = exe.run(feed=feeder.feed(data), fetch_list=[avg_cost])
print(loss)
step 1: 用户自定义数据读取reader
用户可用过如下方式定义读取MNIST训练数据集的reader
Reader是一个Python生成器(generator),每次通过yield返回一个样本数据,全体数据集遍历完毕后reader退出
import numpy as np
def mnist_reader(image_file, label_file):
def __reader__():
with open(image_file) as f:
f.seek(16)
images = np.reshape(np.fromfile(f, dtype='unit8'), [-1, 28, 28])
images = images / 255.0 * 2.0 - 1.0 # 图像灰度范围 [-1.0, 1.0]
with open(label_file) as f:
f.seek(8)
labels = np.fromfile(f, dtype='unit8') # 标签数字范围 [0, 9]
for idx in range(len(labels)):
yield images[idex, :], labels[idex]
return __reader__
# 训练用reader
train_reader = mnist_reader('train-images.idx3-ubtye', 'train-labels.idx1-ubyte')
Paddle提供官方的MNIST Reader API,用于自动下载并读取MNIST训练数据集
import paddle
train_reader = paddle.dataset.mnist.train()
用户可在reader中定义数据增强逻辑,例如:
import paddle
import paddle.fluid as fluid
import numpy as np
# 左右翻转
def random_filpped_reader(reader):
def __reader__():
for image, label in reader():
if np.random.random_integers(2) ==1:
image = p.fliplr(image)
yield image, label
return __reader__
# 随机加噪
def random_nosied_reader(reader):
def __reader__():
for image, label in reader():
image += np.random.normal((0, 0.01, size=image.shape))
yield image, label
return __reader__
train_reader = paddle.dataset.mnist.train()
train_reader = random_nosied_reader(random_filpped_reader(train_reader))
对reader返回数据进行随机shuffle
shuffled_reader = paddle.reader.shuffle(train_reader, buf_size=64)
调用paddle.batch组成一个batch的训练数据
batch_reader = paddle.batch(shuffled_reader, batch_size=32)
step 2: 使用feed方法或py_reader接口送入训练数据
Feed方式
用户先将reader数据用过data feeder转换为Paddle可识别的Tensor格式数据,传入执行器进行训练
# 第一步:准备data layer
# 定义data layer的数据类型、尺寸等
images = fluid.layers.data(name='image', shape=[28,28], dtype='float32')
labels = fluid.layers.data(name='label', shape=[1], dtype='int64')
# 第二步:定义data feeder对象
# GPU训练
data_feeder = fluid.DataFeeder(feed_list=[images, labels], place=fluid.CUDAPlace(0))
# CPU训练
data_feeder = fluid.DataFeeder(feed_list=[images, labels], place=fluid.CPUPlace())
# 第三步:训练网络
exe.run(fluid.default_startup_program())
for epoch_id in range(epoch_num):
for batch_data in batch_reader():
# DataFeeder.feed()将用户定义的batch_reader数据
# 转换为paddle可识别的Tensor格式数据
exe.run(feed=data_feeder.feed(batch_data), ...)
使用py_reader接口送入训练数据
# 第一步:定义准备py_reader对象
# 定义准备py_reader对象
py_reader = fluid.layers.py_reader(
capacity=8, # capacity为数据队列容量
shapes=([28,28], [1]),
dtypes=('float32', 'int64')
)
# 第二步:调用read_file读取数据
# 调用read_file接口从py_reader中读取数据。
# read_file返回值对应feed方式下的data layer
images, labels = fluid.layers.read_file(py_reader)
# 第三步:传入用户自定义reader
# 将用户自定义reader传入py_reader
py_reader.decorate_paddle_reader(batch_reader)
# 第四步:训练网络
exe.run(fluid.default_startup_program())
for epoch_id in range(epoch_num):
py_reader.start()
while True:
try:
exe.run(...)
except fluid.core.EOFException:
# 每轮epoch结束后,C++ Backed跑出EOF异常,用户需捕获该异常,
# 并调用py_reader.reset()重置py_reader,以启动下轮epoch训练
py_reader.reset()
break
概括
神经网络模型简介
神经网络模型可分为两个部分:网络结构和模型参数。训练神经网络的本质是通过某种优化算法(如:随机梯度下降)找到一组模型参数,使得神经网络模型在训练集和测试集上得到Loss尽可能的小,准确率尽可能的高。
配置模型结构
import paddle
import paddle.fluid as fluid
# Define Input for model
image = fluid.layers.data(name='pixel', shape=[1,28,28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
# Define the model
conv1 = fluid.layers.conv2d(input=image, filter_size=5, num_filers=20)
relu1 = fluid.layers.relu(conv1)
pool1 = fluid.layers.pool2d(input=relu1, pool_size=2, pool_stride=2)
conv2 = fluid.layers.conv2d(input=pool1, filter_size=5, num_filers=50)
relu2 = fluid.layers.relu(conv2)
pool2 = fluid.layers.pool2d(input=relu2, pool_size=2, pool_stride=2)
predict = fluid.layers.fc(input=pool2, size=10, act='softmax')
# Get the loss
cost = fluid.layers.cross_entropy(input=predict, label=label)
注意:Fluid提供了多种loss函数,比如
cross_entropy
,linear_chain_crd
,bpr_oss
,edit_distance
,warpctc
,dice_loss
,mean_iou
,log_loss
,huber_loss
等。不同人物可能需要选择不同的loss函数。
对于cross_entropy
有如下参数
avg_cost = fluid.layers.mean(x=cost)
注意: Fluid提供了多种优化算法,比如:SGD, Momentum, Adagrad, Adam, Adamx, DecayedAdagrad, Ftrl, Adadelta, RMSProp, LarsMomentum等
opt = fluid.optimizer.AdamOptimizer()
m t = β 1 m t − 1 + ( 1 − β 1 ) ∂ L t ∂ W t v t = β 2 m t − 1 + ( 1 − β 2 ) ( ∂ L t ∂ W t ) 2 m ^ t = m t 1 − β 1 t v ^ t = v t 1 − β 2 t W ← W − η m ^ t v ^ t + ε m_t = \beta_1 m_{t-1} + (1-\beta_1) \frac{\partial L_t}{\partial W_t}\\ v_t = \beta_2 m_{t-1} + (1-\beta_2) (\frac{\partial L_t}{\partial W_t})^2\\ \hat{m}_t = \frac{m_t}{1 - \beta_{1}^{t}}\\ \hat{v}_t = \frac{v_t}{1 - \beta_{2}^{t}}\\ W \leftarrow W - \eta \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \varepsilon} mt=β1mt−1+(1−β1)∂Wt∂Ltvt=β2mt−1+(1−β2)(∂Wt∂Lt)2m^t=1−β1tmtv^t=1−β2tvtW←W−ηv^t+εm^t
# Add operations(backward operators) to minimize avg_loss
opt.minimize(avg_cost)
模型结构配置完成之后会得到两个fluid.Program
如果没有特殊制定,模型情况下使用的是全局fluid.Program,即:fluid.default_startup_prgram()
, fluid.default_main_program()
初始化模型参数
配置完模型结构后,参数初始化操作会被写入fluid.default_startup_program()
中,模型训练之前需要对参数进行初始化,且只需要执行一次初始化操作:
place = fluid.CPUPlace() # fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program)
如果希望参数初始化在GPU上进行,将place设置为fluid.CUDAPlace(0)
即可
初始化后的参数存放在fluid.global_scope()
中,用户可通过参数名从该scope中获取参数
训练模型
单机单卡
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=128
)
for epoch_id in range(5):
for batch_id, data in enumerate(train_reader()):
img_data = np.array([x[0].reshape([1,28,28]) for x in data]).astype('float32')
y_data = np.array([x[1] for x in data]).reshape([len(img_data), 1]).astype('int64')
loss, acc = exe.run(
fluid.default_main_program(),
feed={'pixel':img_data, 'label':y_data},
fetch_list=[avg_loss, batch_acc]
)
print (
"epoch: %d, batch = %d, Loss = %f, Accuracy = %f"
%(epoch_id, batch_id, loss, acc)
)
单机多卡
export CUDA_VISIBLE_DEVICES=0,1,2,3
from paddle.fluid import compiler
compiled_program = compiler.CompliedProgram(fliud.default_main_program())
compiled_program.with_data_parallel(loss_name=avg_cost.name)
训练时,只需将fluid.default_main_program()
改为compiled_program
for epoch_id in range(5):
for batch_id, data in enumerate(train_reader()):
img_data = np.array([x[0].reshape([1,28,28]) for x in data]).astype('float32')
y_data = np.array([x[1] for x in data]).reshape([len(img_data), 1]).astype('int64')
loss, acc = exe.run(
compiled_program,
feed={'pixel':img_data, 'label':y_data},
fetch_list=[avg_loss, batch_acc]
)
print (
"epoch: %d, batch = %d, Loss = %f, Accuracy = %f"
%(epoch_id, batch_id, loss, acc)
)
CPU多线程
对于CPU训练,如何制定模型用几个线程训练呢?
可以通过设置环境变量:export CPU_NUM=4
如果需要多个模型,应该如何切换Program?
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
# Define Program1
main_program_1 = fluid.Program()
startup_program_1 = fluid.Program()
with fluid.program_guard(main_program_1, startup_program_1):
in_data_1, label_1, loss_1 = model1()
exe.run(startup_program_1)
for batch_id, data in enumerate(train_reader1()):
img_data, y_data = ...
loss = exe.run(
main_program_1,
feed={in_data_1.name: img_data, 'label':label},
betch_list=[loss_1]
)
print ("batch = %d, Loss = %s" %(batch_id, loss))
# Define Program2
main_program_2 = fluid.Program()
startup_program_2 = fluid.Program()
with fluid.program_guard(main_program_2, startup_program_2):
in_data_2, label_2, loss_2 = model2()
exe.run(startup_program_2)
for batch_id, data in enumerate(train_reader2()):
img_data, y_data = ...
loss = exe.run(
main_program_2,
feed={in_data_2.name: img_data, 'label':label},
betch_list=[loss_2]
)
print ("batch = %d, Loss = %s" %(batch_id, loss))
如何是多个Program之间共享参数?
Paddle采用变量名区分不同变量,切变量名是根据unique_name模块中的计数器自动生成的,没生成一个变量名计数值加1。fluid.unique_name.guard()
的作用是重置unique_name模块中的计数器,保证多次调用fluid.unique_name.guard()
配置网络时对应变量的变量名相同,从而实现参数共享。
有多个pserver进程和多个trainer进程
使用Collective(NCCL2)模式进行分布式训练,不需要启动pserver进程,每个trainer进程都保存一份完整的模型参数,完成计算梯度之后通过trainer之间的相互通信,Reduce梯度数据到所有节点,然后每个节点在各自完成参数更新
目前使用NCCL2模式支持同步训练方式,更适合模型体积较大,并需要使用同步训练和GPU训练,如果硬件设备支持RDMA和GPU Direct,可以达到很高的分布式训练性能
Collective(NCCL2)模式注意事项
export NCCL_SOCKET_IFNAME=eth2
两种分布式模型的对比
特性 | parameter server | Collective(NCCL2) |
---|---|---|
支持的硬件设备 | CPU/GPU | 只支持GPU |
速度 | 快 | 更快 |
模型并行 | 支持 | 不支持 |
同步训练 | 支持 | 支持 |
异步训练 | 支持 | 不支持 |
容错训练 | 支持 | 不支持 |
各种模式如何选择
模式 | 支持硬件 | 特点 |
---|---|---|
parameter server同步模式 | CPU/GPU | 通用性强,支持所有场景,速度相对最慢 |
parameter server异步模式 | CPU/GPU | 速度比同步模式快,适合大规模(服务器较多)训练场景,模型收敛性需要重新调参 |
Collective(NCCL2)同步模式 | GPU | 适合GPU卡较多,模型相对较复杂的场景,可以提高GPU卡之间的通信速度,只支持同步训练 |
分布式任务相关配置接口和参数配置
单机训练
# 1. 定义单机网络
def model():
y = fluid.layers.data(name='y', shape=[1], dtype='float32')
x = fluid.layers.data(name='x', shape=[13], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
loss = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(loss)
opt = fluid.optimizer.SGD(learning_rate=0.001)
opt.minimize(avg_loss)
# 2. 定义训练流程
def train_loop():
train_reader = paddle.batch(
paddel.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500
),
batch_size=BATCH_SIZE
)
# 单机训练
# 1. 定义网络
model()
# 2. 训练过程
train_loop()
分布式训练
# 1. 定义单机网络
model()
# 2. 配置分布式程序转换器
t = fluid.DistributeTranspiler()
t.transpile(
trainer_id=trainer_id,
pservers=pserver_endpoints,
trainers=trainers
)
# 3. 根据自己角色启动parameter server
if training_role == "PSERVER":
pserver_prog = t.get_pserver_program(current_endpoint)
startup_prog = t.get_startup_program(current_endpoint, pserver_prog)
exe.run(startup_prog)
exe.run(pserver_prog)
# 4. 根据自己角色启动trainer
elif training_role == "TRAINER":
trainer_prog = t.get_trainer_program()
exe.run(fluid.default_startup_program())
train_loop()
parameter server模式配置参数
参考:https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/transpiler_cn/DistributeTranspiler_cn.html
使用DistributeTranspiler可以把单机可以执行的程序快速转变成可以分布式执行的程序。在不同的服务器节点上,通过传给transpiler对应的参数,以获取当前节点需要执行的Program
import paddle.fluid as fluid
role = "PSERVER"
trainer_id = 0 # get actual trainer id from cluster
pserver_endpoints = "192.168.1.1:6170,192.168.1.2:6170"
current_endpoint = "192.168.1.1:6170" # get acutal current endpoint
trainers = 4
t = fluid.DistributeTranspiler()
t.transpile(trainer_id, trainers=trainers, pservers=pserver_endpoints)
if role == "PSERVER":
pserver_prog = t.get_pserver_programs(current_endpoint)
pserver_startup = t.get_startup_program(
current_endpoint,pserver_program
)
exe.run(pserver_startup)
exe.run(pserver_prog)
elif role == "TRAINER":
train_loop(t.get_trainer_program())
Collective(NCCL2)模式配置参数
NCCL2模式的分布式训练,由于没有parameter server角色,是itrainer之间互相通信,使用时注意:
import paddle.fluid as fluid
trainer_id = 0 # get actual trainer id from cluster
trainers = "192.168.1.1:6170,192.168.1.2:6170"
current_endpoint="192.168.1.1:6170"
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id, trainers=trainers, current_endpoint=current_endpoint)
exe = fluid.ParallelExecutor(
use_cuda=True,
loss_name='loss',
num_trainers=len(trainers.split(',')),
trainer_id=trainer_id
)
概括
保存与加载的模型变量
Fluid中保存和加载的内容包括模型参数和模型结构、优化器中间状态等长期变量,梯度或其他临时变量目前不会保存和加载值
Fluid中模型变量类型
fluid.Variable()
的 persistable
属性设置为True
来表示,目前模型参数都是长期变量。Python端做预测的一般步骤
保存预测模型
batch_id = 0
for pass_id in range(PASS_NUM):
train_py_reader.start()
try:
while True:
loss_value, = train_exe.rn(fetch_list=[avg_cost.name])
if batch_id % 10 == 0:
loss_value = numpy.mean(loss_value)
print("batch={}, loss={}, sample num={}".format(
batch_id, loss_value, BATCH_SIZE*cpu_num
))
batch_id += 1
except fluid.core.EOFException:
train_py_reader.reset()
model_dir = 'output/models'
# 在每一个epoch结束后执行一次save_inference_model保存需要预测的模型及参数
fluid.io.save_inference_model(
model_dir, [], [predict, auc_var, cur_auc_var], exe
)
加载预测模型进行预测
inference_scope = fluid.core.Scope()
with fluid.scope_guard(inference_scope):
# 通过load_inference_model把sava下来的模型和相应的参数加载到程序中
[inference_program, feed_target_names, fetch_targets] = (
fluid.io.load_inference_model(save_dirname, ext)
)
test_reader = paddle.batch(paddle.dataset.uci_housing.test(), batch_size=20)
results = exe.run(
inference_program,
feed={feed_target_names[0]: numpy.array(test_feat)},
fetch_list=fetch_targets
)
print("infer results: ", results[0])
单机做增量训练与恢复训练的一般步骤
保存预测模型
batch_id = 0
for pass_id in range(PASS_NUM):
train_py_reader.start()
try:
while True:
loss_value, = train_exe.rn(fetch_list=[avg_cost.name])
if batch_id % 10 == 0:
loss_value = numpy.mean(loss_value)
print("batch={}, loss={}, sample num={}".format(
batch_id, loss_value, BATCH_SIZE*cpu_num
))
batch_id += 1
except fluid.core.EOFException:
train_py_reader.reset()
model_dir = 'output/models'
# 在每一个epoch结束后执行一次save_persistables保存需要预测的模型或者恢复的参数
fluid.io.save_persistables(
model_dir, exe, main_program
)
加载模型变量
exe.run(start_program)
# 在训练的start_program执行完后执行load_persistables加载保存的参数
print ("WARNING: model dir: {} exist, load it". format(load_persistabel_dir))
fluid.io.load_persistables(
exe, load_persistabel_dir, main_program=start_program
)
pe = fluid.ParallelExecutor(...)
try:
while True:
cost_val, label_val = pe.run(fetch_list=[avg_cost.name, label.name])
except fluid.core.EOFException:
reader.reset()
fluid.io.save_persistables(
exe, "...", main_program=main_program
)
分布式做增量训练与恢复训练的一般步骤
与单机的不同点
保存预测模型
batch_id = 0
for pass_id in range(PASS_NUM):
train_py_reader.start()
try:
while True:
loss_value, = train_exe.rn(fetch_list=[avg_cost.name])
if batch_id % 10 == 0:
loss_value = numpy.mean(loss_value)
print("batch={}, loss={}, sample num={}".format(
batch_id, loss_value, BATCH_SIZE*cpu_num
))
batch_id += 1
except fluid.core.EOFException:
train_py_reader.reset()
model_dir = 'output/models'
# 在每一个epoch结束后执行一次save_persistables
# 保存需要预测的模型或者恢复的参数,
# 对于多机训练,建议0号trainer保存参数
if trainer_id == 0:
fluid.io.save_persistables(
model_dir, exe, main_program
)
加载模型变量
t = fluid.DistributeTranspiler(config=config)
t.transpile(
trainer_id, pservers=pserver_endpoints,
trainers=trainers, sync_mode=True, curent_endpoint=current_point
)
if training_mode == "PSERVER":
exe = fluid.Executor(fluid.CPUPlace())
prog = t.get_pserver_program(current_endpoint)
startup = t.get_startup_program(current_endpoint, pserver_program=prog)
exe.run(startup)
# 多机需要在PServer端加载参数,且在startup运行之后
fluid.io.load_persistables(exe, "...", prog)
exe.run(prog)
elif training_mode == "TRAINER":
train_prog = t.get_trainer_program()
train_loop(args, train_prog, word2vec_reader, py_reader, loss, trainer_id)
VisualDL使用
安装可以详见官网
pip install --upgrade visualdl
visualdl --logdir=scratch_log --port=8080
http://127.0.0.1:8080
查看Python简单用例
cat test.py
import random
from visualdl import LogWriter
logdir = "./tmp"
logger = LogWriter(logdir, sync_cycle=10000)
# mark the componets with 'train' label.
with logger.mode('trian'):
# create a scalar component called 'scalar0'
scalar0 = logger.scalar('scalar0')
# add some records during DL model running.
for step in range(100):
scalar0.add_record(step, random.random())
visual --logdir=tmp --port=8080
访问http://127.0.0.1:8080
查看
VisualDL可视化损失和评估指标
方式一:load_persistable,需定义网络
def load_image(file):
pass
# 1. Define Network
img = fluid.layers.data(name='img', shape=[1,28,28], dtype='float32')
pred = lenet_5(img)
test_program = fliud.default_main_program().clone(for_test=True)
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
# 2. Load model
fluid.io.load_persistables(
exe, 'mnist_model', main_program=test_program
)
# 3. Load image
im = load_image('test.png')
# 4. Run
results, = exe.run(
infer_program, feed={'img': im}, fetch_list=[pred]
)
num = np.argmax(results)
prob = results[0][num]
print ("Inference result prob: {}, number {}".format(prob, num))
方式二:load_inference_model,无需定义网络
def load_image(file):
pass
# 1. Load model
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
model_path = "mnist_save_model"
infer_program, feeds, fetches = fluid.io.laod_inference_model(
model_path, exe, model_filename='model', params_filename='params'
)
# 2. Load image
im = load_image('test.png')
# 3. Run
results, = exe.run(
infer_program, feed={feeds[0]: im}, fetch_list=feteches
)
num = np.argmax(results)
prob = results[0][num]
print ("Inference result prob: {}, number {}".format(prob, num))
Paddle服务器端预测库面向高性能线上服务部署
直接下载编译好的预测库,链接使用。详见这里
自行编译预测库
名称 | 建议值 | 描述 |
---|---|---|
CMAKER_BUILD_TYPE | Release | 编译模式,Release为生产环境模式,Debug为debug模式 |
ON_INFER | ON | 是否打开预测专用优化 |
WITH_GPU | OFF | 是否开启GPU支持,默认CPU |
WITH_MKL | ON | 是否开启MKL |
PADDLE_ROOT=/path/of/capi
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
mkdir build
cd build
cmake -DFLUID_INFERENCE_INSTALL_DIR=$PADDLE_ROOT \
- DCMAKE_BUILD_TYPE=Release \
- DWITH_FLUID_ONLY=ON \
- DWITH_SWIG_PY=OFF \
- DWITH_PYTHON=OFF \
- DWITH_MKL=OFF \
- DWITH_GPU=OFF \
- DON_INFER=ON \
...
make
make inference_lib_dist
编译输出
内容包含
version.txt
GIT COMMIT ID: ...
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: on
CDUA version: 8.0
CUDNN version: v5
C++预测API/预测过程
C++预测API
案例介绍:手写数字识别
#include "paddle_inference_api.h"
// 创建一个 config,并修改相关设置
paddle::NativeConfig config;
config.model_dir = 'xxx';
config.use_gpu = false;
// 创建一个原生的 PaddlePredictor
auto predictor = paddle::CreatePaddlePredictor;
// 创建输入 tensor
float data[784];
Random(data, 784); // fake input
paddle::PaddleTensor tensor;
tensor.shape = std::vector({1, 28*28});
tensor.data.Reset(data, sizeof(data));
tensor.dtype = paddle::PaddleDType::INT64;
// 创建输出 tensor,输入 tensor 的内存可以复用
std::vector outputs;
// 执行预测 CHECK(predictor->Run(slots, &outputs));
// 获取 outputs ...
图像分类 | AlexNet, VGG, GoogleNet, DPN, Lnception-v4, MobileNet, ResNEt, SE-ResNeXt |
目标检测 | Fast R-CNN, Faster R-CNN, Mask R-CNN, SSD, Yolo V3 |
人脸检测 | Pyramidbox |
语义分割 | ICNext, DeepLab V3+, Mask R-CNN |
关键点 | OpenPose |
视频分类 | TSN, NeXtVlad, StNet, LSTM, Attention Clusters |
OCR识别 | CRNN CTC, Seq2Seq + Attention |
人脸识别 | Deep Metric Learning |
图像生成 | CycleGAN, ConditionalGAN, DCGAN |
词法分析 | Bi-GRU-CRF |
语义表示 | ELMo, BERT |
语义匹配 | DAM, CDSSM, DecAtt, InferSent, SSE |
情感分析 | BOW, CNN, BiLSTM |
语言模型 | GRU, LSTM |
机器翻译 | Seq2Seq, Transformer |
阅读理解 | BiDAF |
git clone https://github.com/PaddlePaddle/book.git --depth=1
train_reader = paddle.batch(
paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=500),
batch_size=BATCH_SIE
)
cd recognize_digits
python train.py
train(
nn_type=nn_type,
use_cuda=use_cuda,
save_dirname=save_dirname,
model_filename=model_filename,
params_filename=params_filename
)
infer(
use_cuda=use_cuda,
save_dirname=save_dirname,
model_filename=model_filename,
params_filename=params_filename
)
- 下载模型库代码
```bash
git clone https://github.com/PaddlePaddle/models.git --depth=1
```
- 准备训练数据
- 执行以下命令
```bash
cd fluid/PaddleNLP/neural_machine_tanslation/transformer
bash gen_data.sh
```
- 获取机器翻译WMT2016英德数据
- 使用自定义数据集
- 准备原始训练数据
- 参考gen_data.sh脚本的过程处理
- 训练翻译模型
```bash
python -u train.py \
--src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--special_token '' '' '' \
--train_file_pattern gen_data/wmt16_ende_data_bpe/train.tok.clean.bpe.32000.en-de \
--token_delimiter ' ' \
--use_token_batch True \
--batch_size 4096 \
--sort_type pool \
--pool_size 200000
```
- 翻译测试集
```bash
python -u infer.py \
--src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--special_token '' '' '' \
--test_file_pattern gen_data/wmt16_ende_data_bpe/newstest2016.tok.bpe.32000.en-de \
--token_delimiter ' ' \
--batch_size 32 \
model_path trained_models/iter_100000.infer.model \
n_head 16 \
d_model 1024 \
d_inner_hid 4096 \
prepostprecess_dropout 0.3 \
beam_size 5 \
max_out_len 255
```
perl gen_data/mosesdecoder/scripts/generic/multi-bleu.perl gen_data/wmt16_ende_data/newstest2014.tok.de < predict.tok.txt