Paddle2中支持动态图,动态图是指神经网络的结构可以动态变化,build with run,类似于Python语言中变量不需要定义好类型就可以赋值。这样带来的好处是使得网络更加灵活,更具运行的情况来调整数据流走向,确定就是运行比较慢,这和Python的缺点一样。参考一、参考二。
例如:
import paddle
import paddle.nn.functional as F
import numpy as np
class MyModel(paddle.nn.Layer):
def __init__(self, input_size, hidden_size):
super(MyModel, self).__init__()
self.linear1 = paddle.nn.Linear(input_size, hidden_size)
self.linear2 = paddle.nn.Linear(hidden_size, hidden_size)
self.linear3 = paddle.nn.Linear(hidden_size, 1)
def forward(self, inputs):
x = self.linear1(inputs)
x = F.relu(x)
if paddle.rand([1,]) > 0.5:
x = self.linear2(x)
x = F.relu(x)
x = self.linear3(x)
return x
total_data, batch_size, input_size, hidden_size = 1000, 64, 128, 256
x_data = np.random.randn(total_data, input_size).astype(np.float32)
y_data = np.random.randn(total_data, 1).astype(np.float32)
model = MyModel(input_size, hidden_size)
paddle.summary(model, (input_size))
loss_fn = paddle.nn.MSELoss(reduction='mean')
optimizer = paddle.optimizer.SGD(learning_rate=0.01,
parameters=model.parameters())
for t in range(200 * (total_data // batch_size)):
idx = np.random.choice(total_data, batch_size, replace=False)
x = paddle.to_tensor(x_data[idx,:])
y = paddle.to_tensor(y_data[idx,:])
y_pred = model(x)
loss = loss_fn(y_pred, y)
if t % 200 == 0:
print(t, loss.numpy())
loss.backward()
optimizer.step()
optimizer.clear_grad()
上述代码中第二层的隐藏层出现的概率是50%。网络的结构可能是这样:
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-19 [[128]] [256] 33,024
Linear-20 [[256]] [256] 65,792
Linear-21 [[256]] [1] 257
===========================================================================
Total params: 99,073
Trainable params: 99,073
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.38
Estimated Total Size (MB): 0.38
---------------------------------------------------------------------------
再次运行也可能是这样:
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-25 [[128]] [256] 33,024
Linear-27 [[256]] [1] 257
===========================================================================
Total params: 33,281
Trainable params: 33,281
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.13
Estimated Total Size (MB): 0.13
---------------------------------------------------------------------------
那么网络的结构在什么时候确定下来呢?是定义模型的时候?是训练模型的时候,还是模型推理的时候?
在上述代码最后加上paddle.summary(model, (input_size))
查看训练后的网络:
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-34 [[128]] [256] 33,024
Linear-35 [[256]] [256] 65,792
Linear-36 [[256]] [1] 257
===========================================================================
Total params: 99,073
Trainable params: 99,073
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.38
Estimated Total Size (MB): 0.38
---------------------------------------------------------------------------
运行以下模型推理:
x = np.random.randn(input_size).astype(np.float32)
x = paddle.to_tensor(x)
y_infer = model(x)
print(y_infer)
paddle.summary(model, (input_size))
结果也有两种:
Tensor(shape=[1], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
[0.94865662])
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-34 [[128]] [256] 33,024
Linear-36 [[256]] [1] 257
===========================================================================
Total params: 33,281
Trainable params: 33,281
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.13
Estimated Total Size (MB): 0.13
---------------------------------------------------------------------------
{'total_params': 33281, 'trainable_params': 33281}
或者
Tensor(shape=[1], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
[-0.08627059])
---------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===========================================================================
Linear-34 [[128]] [256] 33,024
Linear-35 [[256]] [256] 65,792
Linear-36 [[256]] [1] 257
===========================================================================
Total params: 99,073
Trainable params: 99,073
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.38
Estimated Total Size (MB): 0.38
---------------------------------------------------------------------------
这说明模型的大小在定义时是动态的,训练时是动态的,推理时还是动态的。以上代码可在百度AI Studio的notebook中运行。