最近打关于字体识别比赛和项目,用到了图像匹配,第一个就想到就是Triplet Loss,且取得了不错的效果。上次使用Triplet Loss还是18年复现 Facenet 的时候,只不过是tensorflow v1。
本博文对算法和代码进行了相关整理和总结,由于已经用OCR及目标检测的方法,框出了具体的字,因此本代码只需要识别单个字符的字体即可。本代码参考了keras官方代码,修改了代码及相关逻辑,方便使用。
配置相应的tensorflow环境后,本次的代码保证可以work。
Triplet Loss 整体思想比较简单,如上图所示,Anchor和Positive属于同一种字体,Anchor和Negative属于不同的字体。Triplet Loss 的目的就是让Anchor和Positive更近,Anchor和Negative更远。其公式为:
L ( A , P , N ) = m a x ( ∥ f ( A ) − f ( P ) ∥ 2 − ∥ f ( A ) − f ( N ) ∥ 2 + m a r g i n , 0 ) L(A,P,N)=max\Big(\lVert f(A) - f(P) \rVert \\^2 - \lVert f(A) - f(N) \rVert \\^2 +margin, \,\,0 \Big) L(A,P,N)=max(∥f(A)−f(P)∥2−∥f(A)−f(N)∥2+margin,0)
tensorflow-gpu==2.6.2
keras==2.6.0 (安装tenserflow会自动安装keras对应版本)
需要准备三份numpy array数据:anchor_arr
,positive_arr
和 negative_arr
。三者的shape均为为 ( N , W , H , C ) (N, W, H,C) (N,W,H,C),在本文中,使用的是大小为 100 × 100 100\times100 100×100的RGB图像,因此输入数据维度为 ( N , 100 , 100 , 3 ) (N,100,100,3) (N,100,100,3)。下载链接:数据(上传设置的是不需要积分就可以下载的,如果不能免费下载,可以私信我。)
如果只是想跑通模型,可以生成随机数据:
anchor_arr = np.random.randint(0,255,size=(200, 100, 100, 3),dtype='u1')
positive_arr = np.random.randint(0,255,size=(200, 100, 100, 3),dtype='u1')
negative_arr = np.random.randint(0,255,size=(200, 100, 100, 3),dtype='u1')
待更新…
# 定义输入图片的大小
target_shape = (100, 100)
def load_data(anchor_path="anchor_arr.npy",
positive_path="positive_arr.npy",
negative_path="negative_arr.npy"):
"""
anchor_path, positive_path,negative_path: 训练数据的路径
"""
anchor_arr = np.load(anchor_path)
positive_arr = np.load(positive_path)
negative_arr = np.load(negative_path)
image_count = len(anchor_arr)
dataset = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices( anchor_arr ),
tf.data.Dataset.from_tensor_slices( positive_arr ),
tf.data.Dataset.from_tensor_slices( negative_arr )))
print("Dataset.shuffle...")
dataset = dataset.shuffle(buffer_size=1024)
print("split dataset...")
train_dataset = dataset.take(round(image_count * 0.8))
val_dataset = dataset.skip(round(image_count * 0.8))
b_size = 128
train_dataset = train_dataset.batch(b_size, drop_remainder=False)
train_dataset = train_dataset.prefetch(8)
val_dataset = val_dataset.batch(b_size, drop_remainder=False)
val_dataset = val_dataset.prefetch(8)
return train_dataset, val_dataset
train_dataset, val_dataset = load_data()
这一步生成的embedding就是图像最终的embedding表示。本文使用了resnet50预训练集,预训练文件下载地址:resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
def base_model():
base_cnn = resnet.ResNet50(weights="./matching_model/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5",
input_shape=target_shape + (3,), include_top=False)
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(256, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
trainable = False
for layer in base_cnn.layers:
if layer.name == "conv5_block1_out":
trainable = True
layer.trainable = trainable
return embedding
embedding = base_model()
DistanceLayer 实现了 keras 的 layer.Layer 子类,并将 embedding 作为前一层。DistanceLayer相比于没有增加任何参数,可以通过 siamese_network .summary()
来验证。
class DistanceLayer(layers.Layer):
def __init__(self, **kwargs):
super().__init__(**kwargs)
def call(self, anchor, positive, negative):
ap_distance = tf.reduce_sum(tf.square(anchor - positive), -1)
an_distance = tf.reduce_sum(tf.square(anchor - negative), -1)
return (ap_distance, an_distance)
anchor_input = layers.Input(name="anchor", shape=target_shape + (3,))
positive_input = layers.Input(name="positive", shape=target_shape + (3,))
negative_input = layers.Input(name="negative", shape=target_shape + (3,))
distances = DistanceLayer()(
embedding(resnet.preprocess_input(anchor_input)),
embedding(resnet.preprocess_input(positive_input)),
embedding(resnet.preprocess_input(negative_input)),
)
siamese_network = Model(
inputs=[anchor_input, positive_input, negative_input], outputs=distances
)
由于Triplet Loss 没有标准的y_true和y_pred,所以需要自定义模型。相关解释监注释。
class SiameseModel(Model):
def __init__(self, siamese_network, margin=0.5):
super(SiameseModel, self).__init__()
self.siamese_network = siamese_network
self.margin = margin
self.loss_tracker = metrics.Mean(name="loss")
def call(self, inputs):
return self.siamese_network(inputs)
def train_step(self, data):
# 创建GradientTape, 记录loss相对于可训练变量(权重和可训练参数)的计算过程
with tf.GradientTape() as tape:
loss = self._compute_loss(data)
# 计算loss相对于可训练变量的梯度
gradients = tape.gradient(loss, self.siamese_network.trainable_weights)
# 使用指定的优化器在模型上应用梯度
self.optimizer.apply_gradients(
zip(gradients, self.siamese_network.trainable_weights)
)
# 更新和返回训练损失metrics
self.loss_tracker.update_state(loss)
return {"loss": self.loss_tracker.result()}
def test_step(self, data):
loss = self._compute_loss(data)
# 更新和返回训练损失metrics
self.loss_tracker.update_state(loss)
return {"loss": self.loss_tracker.result()}
def _compute_loss(self, data):
# 计算anchor与正负样本的距离
ap_distance, an_distance = self.siamese_network(data)
# 将距离作为损失.
loss = tf.maximum(ap_distance - an_distance + self.margin, 0.0)
return loss
@property
def metrics(self):
# 需要在这里列出监控指标,以便可以自动调用`reset_states()`。
return [self.loss_tracker]
siamese_model = SiameseModel(siamese_network)
# 使用Adam优化器
siamese_model.compile(optimizer=optimizers.Adam(0.0001))
# 训练
siamese_model.fit(train_dataset, epochs=5, validation_data=val_dataset)
输出:
running siamese_model.fit(train_dataset, epochs=?, validation_data=val_dataset)
Epoch 1/5
187/187 [==============================] - 26s 140ms/step - loss: 0.0579
Epoch 2/5
187/187 [==============================] - 26s 138ms/step - loss: 0.0489
Epoch 3/5
187/187 [==============================] - 26s 139ms/step - loss: 0.0448
Epoch 4/5
187/187 [==============================] - 26s 138ms/step - loss: 0.0421
Epoch 5/5
187/187 [==============================] - 26s 140ms/step - loss: 0.0365
CPU times: user 3min 19s, sys: 51 s, total: 4min 10s
Wall time: 3min 35s
可以增加epoch训练,获得更好的结果。
我们只需要保存生成embedding的模型即可,保存整个模型(而不是只保存权重)。
# 保存为asset文件夹形式
embedding.save('matching_model/embedding')
# 或者保存成h5的格式
embedding.save("embedding.h5")
待更新…