函数式API创建模型比Sequential API更加方便,可以处理有非线性拓扑图、有共享层和多输入多输出模型。
深度学习通常是由层组成的有向无环图,所以函数式API是一种构建层计算图的方式。
考虑如下模型:
(input: 784-dimensional vectors)
↧
[Dense (64 units, relu activation)]
↧
[Dense (64 units, relu activation)]
↧
[Dense (10 units, softmax activation)]
↧
(output: logits of a probability distribution over 10 classes)
这是一个简单的三层计算图,接下来,通过使用函数式API建立这个模型。
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# 1.建立输入节点
inputs = keras.Input(shape=(784,)) # 输入数据为784-dim向量,
# image_inputs = keras.Input(shape=(32,32,3)) 如果是图片可以是(32,32,3)
print("inputs shape:",inputs.shape)
print("inputs dtype:",inputs.dtype)
inputs shape: (None, 784)
inputs dtype:
# 通过在inputs对象上调用层在层计算图中创建新的节点
# “层调用”操作就像将输入“传递”到 dense 层,然后得到 x。
dense = layers.Dense(64, activation="relu")
x = dense(inputs)
# 同理,可以在计算图中添加更多的层
x = layers.Dense(64,activation = "relu")(x)
outputs = layers.Dense(10)(x)
# 通过指定计算图的输入输出创建一个模型
model = keras.Model(inputs=inputs,outputs=outputs,name="mnist_model")
# 查看模型摘要
model.summary()
Model: "mnist_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 784)] 0
_________________________________________________________________
dense (Dense) (None, 64) 50240
_________________________________________________________________
dense_1 (Dense) (None, 64) 4160
_________________________________________________________________
dense_2 (Dense) (None, 10) 650
=================================================================
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________
# 绘制模型计算图
import pydot
keras.utils.plot_model(model,"my_first_model.png",show_shapes = True)
# 训练、评估、推断
# 1.数据准备
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
# 2. 模型建立
inputs = keras.Input(shape=(784,))
x = layers.Dense(64,activation = "relu")(inputs)
outputs = layers.Dense(10)(x)
x = layers.Dense(64,activation = "relu")(x)
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs,outputs=outputs,name="mnist_model")
model.summary()
# 3. 模型编译
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.RMSprop(),
metrics=["accuracy"],
)
# 4. 模型训练
history = model.fit(x_train, y_train, batch_size=64, epochs=2, validation_split=0.2)
# 5. 模型评估
test_scores = model.evaluate(x_test, y_test, verbose=2)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
# 6. 模型保存
# 通过model.save保存的文件包括:模型架构、权重、训练配置、优化器及其状态
model.save("path_to_my_model")
del model
# 载入保存的模型
model = keras.models.load_model('path_to_my_model')
Model: "mnist_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 784)] 0
_________________________________________________________________
dense_3 (Dense) (None, 64) 50240
_________________________________________________________________
dense_5 (Dense) (None, 64) 4160
_________________________________________________________________
dense_6 (Dense) (None, 10) 650
=================================================================
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________
Train on 48000 samples, validate on 12000 samples
Epoch 1/2
48000/48000 [==============================] - 3s 66us/sample - loss: 0.3455 - accuracy: 0.9015 - val_loss: 0.2020 - val_accuracy: 0.9412
Epoch 2/2
48000/48000 [==============================] - 3s 53us/sample - loss: 0.1630 - accuracy: 0.9512 - val_loss: 0.1417 - val_accuracy: 0.9577
10000/10000 - 1s - loss: 0.1402 - accuracy: 0.9567
Test loss: 0.1401731805846095
Test accuracy: 0.9567
WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\TF2.1\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: path_to_my_model\assets
我们从之前的内容知道,在函数式API中,模型是通过在层计算图中指定其输入和输出来创建的。
这意味着,一个层计算图可用来生成多个模型。
下面的例子中,我们通过使用相同的层栈实例化两个模型:
(1) 一个encode模型,将输入图片转化为一个16维的向量,
(2) 一个端到端的autoencoder模型,用于训练
# encoder layers
encoder_input = keras.Input(shape = (28,28,1),name = "img")
x = layers.Conv2D(16,3,activation = "relu")(encoder_input)
x = layers.Conv2D(32,3,activation = "relu")(x)
x = layers.MaxPool2D(3)(x)
x = layers.Conv2D(32,3,activation = "relu")(x)
x = layers.Conv2D(16,3,activation = "relu")(x)
encoder_output = layers.GlobalAveragePooling2D()(x)
# encoder_model
encoder = keras.Model(encoder_input,encoder_output,name = "encoder")
# decoder layers
x = layers.Reshape((4,4,1))(encoder_output)
x = layers.Conv2DTranspose(16,3,activation = "relu")(x)
x = layers.Conv2DTranspose(32,3,activation = "relu")(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16,3,activation = "relu")(x)
decoder_output = layers.Conv2DTranspose(1,3,activation = "relu")(x)
# decoder的结构与encoder的结构是严格对称的,所以output shape与input shape是一致的.
# Conv2DTranspose是Conv2D的逆运算,同样的,Maxpooling2D与Upsamling2D互为逆运算。
# autoencoder model
autoencoder = keras.Model(encoder_input,decoder_output,name = "autoencoder")
autoencoder.summary()
# autoencoder模型的建立使用了encoder的输入和outcoder的输出,
# encoder层图既用于了encoder模型本身的建立,同时也用于autoencoder模型的建立。
Model: "autoencoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
img (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d (Conv2D) (None, 26, 26, 16) 160
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 32) 4640
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 6, 6, 32) 9248
_________________________________________________________________
conv2d_3 (Conv2D) (None, 4, 4, 16) 4624
_________________________________________________________________
global_average_pooling2d (Gl (None, 16) 0
_________________________________________________________________
reshape (Reshape) (None, 4, 4, 1) 0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 6, 6, 16) 160
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 8, 8, 32) 4640
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 24, 24, 32) 0
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 26, 26, 16) 4624
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 28, 28, 1) 145
=================================================================
Total params: 28,241
Trainable params: 28,241
Non-trainable params: 0
_________________________________________________________________
我们可以将任意一个模型当作层一样调用,调用模型不仅可以应用模型的结构,还可以应用模型的权重。
这一部分,我们通过调用模型的方式实现一个autoencoder模型,通过链接encoder和decodeer模型获得autoencoder模型。
# encoder_model
encoder_input = keras.Input(shape=(28, 28, 1), name="original_img")
x = layers.Conv2D(16, 3, activation="relu")(encoder_input)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.Conv2D(16, 3, activation="relu")(x)
encoder_output = layers.GlobalMaxPooling2D()(x)
encoder = keras.Model(encoder_input, encoder_output, name="encoder")
encoder.summary()
# decoder_model
decoder_input = keras.Input(shape=(16,), name="encoded_img")
x = layers.Reshape((4, 4, 1))(decoder_input)
x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu")(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation="relu")(x)
decoder = keras.Model(decoder_input, decoder_output, name="decoder")
decoder.summary()
# autoencoder_model
autoencoder_input = keras.Input(shape=(28, 28, 1), name="img")
# 调用encoder
encoded_img = encoder(autoencoder_input)
# 调用decoder
decoded_img = decoder(encoded_img)
# 通过输入和输出,链接encoder和deconder
autoencoder = keras.Model(autoencoder_input, decoded_img, name="autoencoder")
autoencoder.summary()
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
original_img (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 26, 26, 16) 160
_________________________________________________________________
conv2d_5 (Conv2D) (None, 24, 24, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 6, 6, 32) 9248
_________________________________________________________________
conv2d_7 (Conv2D) (None, 4, 4, 16) 4624
_________________________________________________________________
global_max_pooling2d (Global (None, 16) 0
=================================================================
Total params: 18,672
Trainable params: 18,672
Non-trainable params: 0
_________________________________________________________________
Model: "decoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
encoded_img (InputLayer) [(None, 16)] 0
_________________________________________________________________
reshape_1 (Reshape) (None, 4, 4, 1) 0
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 6, 6, 16) 160
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 8, 8, 32) 4640
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 24, 24, 32) 0
_________________________________________________________________
conv2d_transpose_6 (Conv2DTr (None, 26, 26, 16) 4624
_________________________________________________________________
conv2d_transpose_7 (Conv2DTr (None, 28, 28, 1) 145
=================================================================
Total params: 9,569
Trainable params: 9,569
Non-trainable params: 0
_________________________________________________________________
Model: "autoencoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
img (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
encoder (Model) (None, 16) 18672
_________________________________________________________________
decoder (Model) (None, 28, 28, 1) 9569
=================================================================
Total params: 28,241
Trainable params: 28,241
Non-trainable params: 0
_________________________________________________________________
正如举例所示,模型可以嵌套,一个模型中可以包含子模型,模型嵌套的一个常见用例是用来作集成,如下,展示如何集成多个模型到一个模型中,通过其平均做预测:
def get_model():
inputs = keras.Input(shape = (128,))
outputs = layers.Dense(1)(inputs)
return keras.Model(inputs,outputs)
model1 = get_model()
model2 = get_model()
model3 = get_model()
inputs = keras.Input(shape = (128,))
y1 = model1(inputs)
y2 = model2(inputs)
y3 = model3(inputs)
outputs = layers.average([y1,y2,y3])
ensemble_model = keras.Model(inputs=inputs,outputs=outputs)
函数式API使得处理多输入多输出问题变得简单,这是Sequence API所不具备的。
举例来说,如果我们想要建立对自定义工单按优先级排序,并把工单传送到相应部门的系统,
该模型有三个输入:
两个输出:
建立如下模型:
num_tags = 12 # Number of unique issue tags
num_words = 10000 # Size of vocabulary obtained when preprocessing text data
num_departments = 4 # Number of departments for predictions
title_input = keras.Input(shape=(None,), name="title") # Variable-length sequence of ints
body_input = keras.Input(shape=(None,), name="body") # Variable-length sequence of ints
tags_input = keras.Input(shape=(num_tags,), name="tags") # Binary vectors of size `num_tags`
# Embed each word in the title into a 64-dimensional vector
title_features = layers.Embedding(num_words, 64)(title_input)
# Embed each word in the text into a 64-dimensional vector
body_features = layers.Embedding(num_words, 64)(body_input)
# Reduce sequence of embedded words in the title into a single 128-dimensional vector
title_features = layers.LSTM(128)(title_features)
# Reduce sequence of embedded words in the body into a single 32-dimensional vector
body_features = layers.LSTM(32)(body_features)
# Merge all available features into a single large vector via concatenation
x = layers.concatenate([title_features, body_features, tags_input])
# Stick a logistic regression for priority prediction on top of the features
priority_pred = layers.Dense(1, name="priority")(x)
# Stick a department classifier on top of the features
department_pred = layers.Dense(num_departments, name="department")(x)
# Instantiate an end-to-end model predicting both priority and department
model = keras.Model(
inputs=[title_input, body_input, tags_input],
outputs=[priority_pred, department_pred],
)
keras.utils.plot_model(model, "multi_input_and_output_model.png", show_shapes=True)
模型编译过程中,可以对不同的暑促定义不同的损失函数,甚至可以对损失函数赋不同的权重:
# 按照输出的顺序填入损失函数
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=[
keras.losses.BinaryCrossentropy(from_logits=True),
keras.losses.CategoricalCrossentropy(from_logits=True),
],
loss_weights=[1.0, 0.2],
)
# 或者根据不同的名字,建立字典
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss={
"priority": keras.losses.BinaryCrossentropy(from_logits=True),
"department": keras.losses.CategoricalCrossentropy(from_logits=True),
},
loss_weights=[1.0, 0.2],
)
# 通过传入inputs和targets的numpy数组列表训练模型
# Dummy input data
title_data = np.random.randint(num_words, size=(1280, 10))
body_data = np.random.randint(num_words, size=(1280, 100))
tags_data = np.random.randint(2, size=(1280, num_tags)).astype("float32")
# Dummy target data
priority_targets = np.random.random(size=(1280, 1))
dept_targets = np.random.randint(2, size=(1280, num_departments))
model.fit(
{"title": title_data, "body": body_data, "tags": tags_data},
{"priority": priority_targets, "department": dept_targets},
epochs=2,
batch_size=32,
)
Train on 1280 samples
Epoch 1/2
1280/1280 [==============================] - 4s 3ms/sample - loss: 1.2890 - priority_loss: 0.7065 - department_loss: 2.9127
Epoch 2/2
1280/1280 [==============================] - 1s 516us/sample - loss: 1.2874 - priority_loss: 0.6992 - department_loss: 2.9409
除了多输入多输出的模型外,函数式API还使得处理非线性连接拓扑(模型的层没有按顺序连接)变得容易。这是Sequential API无法处理的。
一个常用的实例应用于残差连接,下面我们以为CIFAR10数据集建立一个小型的ResNet为例进行说明:
inputs = keras.Input(shape=(32, 32, 3), name="img")
x = layers.Conv2D(32, 3, activation="relu")(inputs)
x = layers.Conv2D(64, 3, activation="relu")(x)
block_1_output = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(64, 3, activation="relu", padding="same")(block_1_output)
x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
block_2_output = layers.add([x, block_1_output])
x = layers.Conv2D(64, 3, activation="relu", padding="same")(block_2_output)
x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
block_3_output = layers.add([x, block_2_output])
x = layers.Conv2D(64, 3, activation="relu")(block_3_output)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation="relu")(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(10)(x)
model = keras.Model(inputs, outputs, name="toy_resnet")
model.summary()
Model: "toy_resnet"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
img (InputLayer) [(None, 32, 32, 3)] 0
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 30, 30, 32) 896 img[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 28, 28, 64) 18496 conv2d_8[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 9, 9, 64) 0 conv2d_9[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 9, 9, 64) 36928 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 9, 9, 64) 36928 conv2d_10[0][0]
__________________________________________________________________________________________________
add (Add) (None, 9, 9, 64) 0 conv2d_11[0][0]
max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 9, 9, 64) 36928 add[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, 9, 9, 64) 36928 conv2d_12[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 9, 9, 64) 0 conv2d_13[0][0]
add[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, 7, 7, 64) 36928 add_1[0][0]
__________________________________________________________________________________________________
global_average_pooling2d_1 (Glo (None, 64) 0 conv2d_14[0][0]
__________________________________________________________________________________________________
dense_10 (Dense) (None, 256) 16640 global_average_pooling2d_1[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 256) 0 dense_10[0][0]
__________________________________________________________________________________________________
dense_11 (Dense) (None, 10) 2570 dropout[0][0]
==================================================================================================
Total params: 223,242
Trainable params: 223,242
Non-trainable params: 0
__________________________________________________________________________________________________
keras.utils.plot_model(model, "mini_resnet.png", show_shapes=True)
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
model.compile(
optimizer=keras.optimizers.RMSprop(1e-3),
loss=keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=["acc"],
)
# We restrict the data to the first 1000 samples so as to limit execution time
# on Colab. Try to train on the entire dataset until convergence!
model.fit(x_train[:1000], y_train[:1000], batch_size=64, epochs=1, validation_split=0.2)
Train on 800 samples, validate on 200 samples
800/800 [==============================] - 7s 9ms/sample - loss: 2.3203 - acc: 0.0938 - val_loss: 2.2957 - val_acc: 0.1150
函数式API的另一个很好的用途是使用有共享层的模型。共享层是在同一个模型中多次重用的层实例,它们能够学习与层计算图中的多个路径相对应的特征。
共享层经常被用于对来自相似的空间的输入进行编码,他们能够分享来自不同输入的信息,以及在更小的数据集上训练模型。如果在一个输入中看到了一个给定的单词,那么将有利于处理通过共享层的所有输入。
在函数式API中共享层,需多次调用同一个层实例。例如,下面是一个在两个不同文本输入之间共享的 Embedding 层:
# Embedding for 1000 unique words mapped to 128-dimensional vectors
shared_embedding = layers.Embedding(1000, 128)
# Variable-length sequence of integers
text_input_a = keras.Input(shape=(None,), dtype="int32")
# Variable-length sequence of integers
text_input_b = keras.Input(shape=(None,), dtype="int32")
# Reuse the same layer to encode both inputs
encoded_input_a = shared_embedding(text_input_a)
encoded_input_b = shared_embedding(text_input_b)
由于要处理的层计算图是静态数据结构,可以对其进行访问和检查。而这就是将函数式模型绘制为图像的方式。
这也意味着我们可以访问中间层的激活函数(计算图中的“节点”)并在其他地方重用它们,这对于特征提取之类的操作十分有用。
让我们来看一个例子。下面是一个 VGG19 模型,其权重已在 ImageNet 上进行了预训练:
vgg19 = tf.keras.applications.VGG19()
# 通过查询计算图数据结构获得模型的中间激活
features_list = [layer.output for layer in vgg19.layers]
# 使用这些特征创建一个新的特征提取模型,模型返回中间层激活值
feat_extraction_model = keras.Model(inputs=vgg19.input, outputs=features_list)
img = np.random.random((1, 224, 224, 3)).astype("float32")
extracted_features = feat_extraction_model(img)
# 这种功能非常适用于像神经样式转换之类的任务
tf.keras包含了各种内置层,例如:
卷积层:Conv1D、Conv2D、Conv3D、Conv2DTranspose
池化层:MaxPooling1D、MaxPooling2D、MaxPooling3D、AveragePooling1D
RNN 层:GRU、LSTM、ConvLSTM2D
BatchNormalization、Dropout、Embedding 等
但是,如果仍不能满足所需,可以通过自定义层扩展API,所有层都可以子类化Layer类并实现如下方法:
build方法,创建层权重的一种方式,也可以直接在__init__中定义;
call方法,向前传播,指定由层完成的计算。
以下是tf.keras.layers.Dense的基本实现:
class CustomDense(layers.Layer):
def __init__(self,units=32):
super(CustomDense,self).__init__()
self.units = units
def build(self,input_shape):
self.w = self.add_weight(shape=(input_shape[-1],self.units),
initializer="random_normal",
trainable=True)
self.b = self.add_weight(shape=(self.units,),
initializer="random_normal",
trainable=True)
def call(self,inputs):
return tf.matmul(inputs,self.w) + self.b
# 为了自定义层支持序列化,定义一个get_config方法,返回层实例的构造函数参数
def get_config(self):
return{"units":self.units}
inputs = keras.Input((4,))
outputs = CustomDense(10)(inputs)
model = keras.Model(inputs,outputs)
config = model.get_config()
import pprint
pprint.pprint(config)
new_model = keras.Model.from_config(config,custom_objects={"CustomDense":CustomDense})
{'input_layers': [['input_10', 0, 0]],
'layers': [{'class_name': 'InputLayer',
'config': {'batch_input_shape': (None, 4),
'dtype': 'float32',
'name': 'input_10',
'ragged': False,
'sparse': False},
'inbound_nodes': [],
'name': 'input_10'},
{'class_name': 'CustomDense',
'config': {'units': 10},
'inbound_nodes': [[['input_10', 0, 0, {}]]],
'name': 'custom_dense'}],
'name': 'model_6',
'output_layers': [['custom_dense', 0, 0]]}
什么时候应该使用 Keras 函数式 API 来创建新的模型,或者什么时候应该直接对 Model 类进行子类化呢?通常来说,函数式 API 更高级、更易用且更安全,并且具有许多子类化模型所不支持的功能。
但是,当构建不容易表示为有向无环的层计算图的模型时,模型子类化会提供更大的灵活性。例如,无法使用函数式API来实现 Tree-RNN,而必须直接子类化 Model 类。
1. 更加简洁
没有 super(MyClass, self).init(…),没有 def call(self, …): 等内容。
函数式API:
inputs = keras.Input(shape=(32,))
x = layers.Dense(64, activation='relu')(inputs)
outputs = layers.Dense(10)(x)
mlp = keras.Model(inputs, outputs)
子类化:
class MLP(keras.Model):
def __init__(self, **kwargs):
super(MLP, self).__init__(**kwargs)
self.dense_1 = layers.Dense(64, activation='relu')
self.dense_2 = layers.Dense(10)
def call(self, inputs):
x = self.dense_1(inputs)
return self.dense_2(x)
# Instantiate the model.
mlp = MLP()
# Necessary to create the model's state.
# The model doesn't have a state until it's called at least once.
_ = mlp(tf.zeros((1, 32)))
2. 定义连接计算图时进行模型验证
在函数式 API 中,输入规范(shape和dtype)是预先创建的(使用 Input)。每次调用层时,该层都会检查传递给它的规范是否符合其假设,如不符合,它将引发有用的错误消息。
这样可以保证能够使用函数式 API 构建的任何模型都可以运行。所有调试(除与收敛有关的调试外)均在模型构造的过程中静态发生,而不是在执行时发生。这类似于编译器中的类型检查。
3. 函数式模型可绘制可检查
可以将模型绘制为计算图,并且可以轻松访问该计算图中的中间节点。例如,要提取和重用中间层的激活(如前面的示例所示):
features_list = [layer.output for layer in vgg19.layers]
feat_extraction_model = keras.Model(inputs=vgg19.input, outputs=features_list)
4. 函数式模型可以序列化或克隆
因为函数式模型是数据结构而非一段代码,所以它可以安全地序列化,并且可以保存为单个文件,从而使您可以重新创建完全相同的模型,而无需访问任何原始代码。
要序列化子类化模型,实现器必须在模型级别指定 get_config() 和 from_config() 方法。
不支持动态架构:函数式 API 将模型视为层的有向无环图(DAG)。对于大多数深度学习架构来说确实如此,但并非所有(例如,递归网络或 Tree RNN 就不遵循此假设,无法在函数式 API 中实现)。
在函数式 API 或模型子类化之间进行选择并非是让您作出二选一的决定而将您限制在某一类模型中。tf.keras API 中的所有模型都可以彼此交互,无论它们是 Sequential 模型、函数式模型,还是从头开始编写的子类化模型。
units = 32
timesteps = 10
input_dim = 5
# Define a Functional model
inputs = keras.Input((None, units))
x = layers.GlobalAveragePooling1D()(inputs)
outputs = layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
# 子类化
class CustomRNN(layers.Layer):
def __init__(self):
super(CustomRNN, self).__init__()
self.units = units
self.projection_1 = layers.Dense(units=units, activation="tanh")
self.projection_2 = layers.Dense(units=units, activation="tanh")
# Our previously-defined Functional model
self.classifier = model
def call(self, inputs):
outputs = []
state = tf.zeros(shape=(inputs.shape[0], self.units))
for t in range(inputs.shape[1]):
x = inputs[:, t, :]
h = self.projection_1(x)
y = h + self.projection_2(state)
state = y
outputs.append(y)
features = tf.stack(outputs, axis=1)
print(features.shape)
return self.classifier(features)
rnn_model = CustomRNN()
_ = rnn_model(tf.zeros((1, timesteps, input_dim)))
(1, 10, 32)
函数式 API 中使用任何子类化层或模型,前提是它实现了遵循以下模式之一的 call 方法:
call(self, inputs, **kwargs) - 其中 inputs 是张量或张量的嵌套结构(例如张量列表),**kwargs 是非张量参数(非输入)。
call(self, inputs, training=None, **kwargs) - 其中 training 是指示该层是否应在训练模式和推断模式下运行的布尔值。
call(self, inputs, mask=None, **kwargs) - 其中 mask 是一个布尔掩码张量(对 RNN 等十分有用)。
call(self, inputs, training=None, mask=None, **kwargs) - 当然,您可以同时具有掩码和训练特有的行为。
此外,如果您在自定义层或模型上实现了 get_config 方法,则您创建的函数式模型将仍可序列化和克隆。
下面是一个从头开始编写、用于函数式模型的自定义 RNN 的简单示例:
units = 32
timesteps = 10
input_dim = 5
batch_size = 16
class CustomRNN(layers.Layer):
def __init__(self):
super(CustomRNN, self).__init__()
self.units = units
self.projection_1 = layers.Dense(units=units, activation="tanh")
self.projection_2 = layers.Dense(units=units, activation="tanh")
self.classifier = layers.Dense(1)
def call(self, inputs):
outputs = []
state = tf.zeros(shape=(inputs.shape[0], self.units))
for t in range(inputs.shape[1]):
x = inputs[:, t, :]
h = self.projection_1(x)
y = h + self.projection_2(state)
state = y
outputs.append(y)
features = tf.stack(outputs, axis=1)
return self.classifier(features)
# Note that you specify a static batch size for the inputs with the `batch_shape`
# arg, because the inner computation of `CustomRNN` requires a static batch size
# (when you create the `state` zeros tensor).
inputs = keras.Input(batch_shape=(batch_size, timesteps, input_dim))
x = layers.Conv1D(32, 3)(inputs)
outputs = CustomRNN()(x)
model = keras.Model(inputs, outputs)
rnn_model = CustomRNN()
_ = rnn_model(tf.zeros((1, 10, 5)))
参考:
[1] https://tensorflow.google.cn/guide/keras/functional?hl=zh_cn
[2] https://tensorflow.google.cn/guide/keras/functional