通常机器学习的模型可以表示为简单层的组合与堆叠,使用TensorFlow中的tf.keras来构建模型。
class MyDenseLayer(tf.keras.layers.Layer): # 自定义层
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs])
def call(self, inputs):
return tf.matmul(inputs, self.kernel)
在__init__中创建变量需要明确指定创建变量所需的形状,但是在build中创建就可以根据层要运算的输入的形状启用变量创建。构造函数__init__是一个特殊的类实例方法,每当创建一个类的实例对象时,python解释器都会自动调用它。因此我们创建自定义层类的实例对象时需要确定num_outputs
layer = MyDenseLayer(10)
创建完自定义层类对象layer后,定义一个[10,5]大小的全为1的矩阵作为输入,这时调用的就是build()函数。
input=tf.ones([10,5])
test = layer(input) # Calling the layer `.builds` it.
可以看到buidl()函数接收一个input_shape参数,并作为add_weight的传入参数。
def add_weight(self,
name, # String, the name for the weight variable.
shape, # The shape tuple of the weight.
dtype=None, # The dtype of the weight.
initializer=None, # An Initializer instance (callable).
regularizer=None, # An optional Regularizer instance.
trainable=True, # A boolean, whether the weight should
#be trained via backprop or not (assuming
#that the layer itself is also trainable).
constraint=None): # An optional Constraint instance.
#Adds a weight variable to the layer.
# Returns
#The created weight variable.
add_weight()是继承层Layer的方法,用于为变量添加权重,参数trainable代表该参数的权重是否为可训练权重; 若trainable==True时,会执行self._trainable_weights.append(weight).此处的build()只传入了名称与大小。
因为矩阵相乘的第一个矩阵的列要和第二个矩阵的行数相同,如(n,k)*(k,m)得到(n,m),所以input_shape[-1]获得了第二个矩阵的行,再将输出数num_outputs作为列数得到kernel。
print(layer.kernel)
<tf.Variable 'my_dense_layer_13/kernel:0' shape=(5, 10) dtype=float32, numpy=
array([[-0.16188782, -0.21436638, -0.23779735, -0.5654684 , 0.3930928 ,
-0.4126814 , 0.33268565, 0.18108195, -0.48489177, -0.08284235],
[-0.02564168, 0.549053 , 0.42339212, 0.3485728 , -0.0736267 ,
0.5685448 , 0.27400726, 0.59572273, -0.4207679 , 0.19071192],
[-0.41549557, -0.15215197, -0.07686222, -0.16538733, 0.1426844 ,
0.26849395, 0.03620464, 0.07866323, 0.32265216, -0.15471825],
[-0.01316679, -0.44710308, 0.2655834 , 0.21193522, 0.5465316 ,
-0.1434204 , -0.35253885, -0.43908924, -0.5106529 , 0.2494039 ],
[ 0.52400905, -0.5664908 , 0.37424153, 0.507786 , 0.5197385 ,
-0.00330818, 0.03005803, -0.62411946, 0.2804129 , -0.2383785 ]],
dtype=float32)>
最后层对象返回的是输入和权重的矩阵相乘结果
test = layer(input) # Calling the layer `.builds` it.
test
<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329],
[-0.09218282, -0.8310592 , 0.74855745, 0.3374383 , 1.5284207 ,
0.27762878, 0.32041672, -0.20774078, -0.8132475 , -0.03582329]],
dtype=float32)>
定义自己的网络模型主要是定义所需的每一个层,组合一些层称为块,就像ResNet中每一个残差块都是卷积、批次归一化的组合,层和块可以相互嵌套和组合。比如官方文档中定义的ResNet残差块如下
class ResnetIdentityBlock(tf.keras.Model):
def __init__(self, kernel_size, filters):
super(ResnetIdentityBlock, self).__init__(name='')
filters1, filters2, filters3 = filters
self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))
self.bn2a = tf.keras.layers.BatchNormalization()
self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')
self.bn2b = tf.keras.layers.BatchNormalization()
self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))
self.bn2c = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv2a(input_tensor)
x = self.bn2a(x, training=training)
x = tf.nn.relu(x)
x = self.conv2b(x)
x = self.bn2b(x, training=training)
x = tf.nn.relu(x)
x = self.conv2c(x)
x = self.bn2c(x, training=training)
x += input_tensor
return tf.nn.relu(x)
从 keras.Model
继承了:Model.fit,Model.evaluate, and Model.save
(see Custom Keras layers and models for details).
block = ResnetIdentityBlock(1, [1, 2, 3])
传入的参数为卷积大小(决定的是输出层特征尺寸)和卷积核个数(决定输出通道数)
print(block.layers)
[
, , , , , ]
input = tf.random.normal([1, 2, 3, 3],mean=2,stddev=0.5)
resnet = block(input)
查看input和resnet长的是一样的,感觉很奇怪,不应该是经过了残差块就变了吗,发现是函数定义的training设置默认为False了,改成True之后再输出就发生变化了。
同样的,使用 tf.keras.Sequential
只需更少的代码就可以完成实现,因为都是逐一地调用层,如果不是需要设计自己的和较复杂地模型,应该可以直接使用sequential一层一层垒起来即可,下面的代码和上面用自定义类实的是一样的。
my_seq = tf.keras.Sequential([tf.keras.layers.Conv2D(1, (1, 1),
input_shape=(
None, None, 3)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(2, 1,
padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(3, (1, 1)),
tf.keras.layers.BatchNormalization()])
my_seq(tf.zeros([1, 2, 3, 3]))
调用以下任何一条语句都是一样的结果(除了名称不一样
block.summary()
my_seq.summary()
Model: "sequential_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_54 (Conv2D) (None, None, None, 1) 4
_________________________________________________________________
batch_normalization_54 (Batc (None, None, None, 1) 4
_________________________________________________________________
conv2d_55 (Conv2D) (None, None, None, 2) 4
_________________________________________________________________
batch_normalization_55 (Batc (None, None, None, 2) 8
_________________________________________________________________
conv2d_56 (Conv2D) (None, None, None, 3) 9
_________________________________________________________________
batch_normalization_56 (Batc (None, None, None, 3) 12
=================================================================
Total params: 41
Trainable params: 29
Non-trainable params: 12
_________________________________________________________________