

In the modern Deep learning era, Neural networks are almost good at every task, but these neural networks rely on more data to perform well. But, for certain problems like face recognition and signature verification, we can’t always rely on getting more data, to solve this kind of tasks we have a new type of neural network architecture called Siamese Networks.

在现代深度学习时代,神经网络几乎可以胜任每项任务,但是这些神经网络需要更多的数据才能表现良好。 但是,对于诸如人脸识别和签名验证之类的某些问题,我们不能总是依靠获取更多数据来解决这类任务,我们拥有一种新型的神经网络架构,称为暹罗网络。

It uses only a few numbers of images to get better predictions. The ability to learn from very little data made Siamese networks more popular in recent years. In this article, we will explore what it is and how to develop a signature verification system with Pytorch using Siamese Networks.

它仅使用少量图像来获得更好的预测。 从很少的数据中学习的能力使得暹罗网络近年来变得越来越流行。 在本文中,我们将探讨它是什么以及如何使用Pytorch使用Siamese Networks开发签名验证系统。

什么是连体网络!? (What are Siamese Networks!?)

Signet Signet中使用的连体网络

A Siamese Neural Network is a class of neural network architectures that contain two or more identical subnetworks. ‘identical’ here means, they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both sub-networks. It is used to find the similarity of the inputs by comparing its feature vectors, so these networks are used in many applications

暹罗神经网络是一类神经网络体系结构,其中包含两个或多个相同的子网络 。 “ 相同”在这里是指它们具有相同的配置和相同的参数和权重。 参数更新反映在两个子网中。 它用于通过比较其特征向量来查找输入的相似性,因此这些网络被用于许多应用中

Traditionally, a neural network learns to predict multiple classes. This poses a problem when we need to add/remove new classes to the data. In this case, we have to update the neural network and retrain it on the whole dataset. Also, deep neural networks need a large volume of data to train on. SNNs, on the other hand, learn a similarity function. Thus, we can train it to see if the two images are the same (which we will do here). This enables us to classify new classes of data without training the network again.

传统上,神经网络会学习预测多个类别。 当我们需要向数据添加/删除新类时,这会带来问题。 在这种情况下,我们必须更新神经网络并在整个数据集中对其进行重新训练。 而且,深度神经网络需要大量的数据进行训练。 另一方面,SNN学习相似性函数。 因此,我们可以训练它以查看两个图像是否相同(我们将在此处执行)。 这使我们能够分类新的数据类别,而无需再次训练网络。

暹罗网络的优缺点: (Pros and Cons of Siamese Networks:)

The main advantages of Siamese Networks are,


  • More Robust to class Imbalance: With the aid of One-shot learning, given a few images per class is sufficient for Siamese Networks to recognize those images in the future


  • Nice to an ensemble with the best classifier: Given that its learning mechanism is somewhat different from Classification, simple averaging of it with a Classifier can do much better than average 2 correlated Supervised models (e.g. GBM & RF classifier)


  • Learning from Semantic Similarity: Siamese focuses on learning embeddings (in the deeper layer) that place the same classes/concepts close together. Hence, can learn semantic similarity.

    从语义相似性中学习:暹罗语专注于学习嵌入(在更深的层次中),这些嵌入将相同的类/概念放在一起。 因此,可以学习语义相似性

The downsides of the Siamese Networks can be,


  • Needs more training time than normal networks: Since Siamese Networks involves quadratic pairs to learn from (to see all information available) it is slower than normal classification type of learning(pointwise learning)


  • Doesn’t output probabilities: Since training involves pairwise learning, it won’t output the probabilities of the prediction, but the distance from each class


暹罗网络中使用的损耗函数: (Loss functions used in Siamese Networks:)

Contrastive Loss, Image created by Author 对比损失,作者创作的图像

Since training of Siamese networks involves pairwise learning usual, Cross entropy loss cannot be used in this case, mainly two loss functions are mainly used in training these Siamese networks, they are


Triplet loss is a loss function where a baseline (anchor) input is compared to a positive (truthy) input and a negative (falsy) input. The distance from the baseline (anchor) input to the positive (truthy) input is minimized, and the distance from the baseline (anchor) input to the negative (falsy) input is maximized.

三重损失是一种损失函数,其中将基线(锚定)输入与正(真实)输入和负(虚假)输入进行比较。 从基线(锚)输入到正(真实)输入的距离最小,并且从基线(锚)输入到负(虚假)输入的距离最大。

Image for post

In the above equation, alpha is a margin term used to “stretch” the distance differences between similar and dissimilar pairs in the triplet, fa, fa, fn are the feature embeddings for the anchor, positive and negative images.


During the training process, an image triplet (anchor image, negative image, positive image)(anchor image, negative image, positive image) is fed into the model as a single sample. The idea behind this is that distance between the anchor and positive images should be smaller than that between the anchor and negative images.

在训练过程中,将三元组图像(锚图像,负图像,正图像)(锚图像,负图像,正图像)作为单个样本输入到模型中。 这背后的想法是锚点和正像之间的距离应小于锚点和负像之间的距离。

Contrastive Loss: is a popular loss function used highly nowadays, It is a distance-based loss as opposed to more conventional error-prediction losses. This loss is used to learn embeddings in which two similar points have a low Euclidean distance and two dissimilar points have a large Euclidean distance.

对比损失 :是当今流行的损失函数,它是基于距离的损失 ,而不是传统的误差预测损失 该损失用于学习其中两个相似点的欧氏距离较小而两个不相似点的欧氏距离较大的嵌入。


And we defined Dw which is just the Euclidean distance as :



Gw is the output of our network for one image.


使用暹罗网络进行签名验证: (Signature verification with Siamese Networks:)

Siamese Network for Signature Verification, Image created by Author 暹罗签名验证网络,作者创建的图像

As Siamese networks are mostly used in verification systems such as face recognition, signature verification, etc…, Let’s implement a signature verification system using Siamese neural networks on Pytorch


数据集和预处理数据集: (Dataset and Preprocessing the Dataset:)

Image for post
Signatures in ICDAR dataset, Image created by Author ICDAR数据集中的签名,作者创建的图像

We are going to use the ICDAR 2011 dataset which consists of the signatures of the dutch users both genuine and fraud, and the dataset itself is separated as train and folders, inside each folder, it consists of users folder separated as genuine and forgery, also the labels of the dataset is available as CSV files, you can download the dataset from here

我们将使用ICDAR 2011数据集,该数据集由荷兰用户的真实签名和欺诈签名组成,数据集本身分为火车和文件夹,在每个文件夹内,它由分别由真实和伪造的用户文件夹组成,数据集的标签以CSV文件形式提供,您可以从此处下载数据集

Now to fed this raw data into our neural network, we have to turn all the images into tensors and add the labels from the CSV files to the images, to do this we can use the custom dataset class from Pytorch, here is how our full code will look like


Now after preprocessing the dataset, in PyTorch we have to load the dataset using Dataloader class, we will use the transforms function to reduce the image size into 105 pixels of height and width for computational purposes


神经网络架构: (Neural Network Architecture:)

Now let’s create a neural network in Pytorch, we will use the neural network architecture which will be similar, as described in the Signet paper


In the above code, we have created our network as follows, The first convolutional layers filter the 105*105 input signature image with 96 kernels of size 11 with a stride of 1 pixel. The second convolutional layer takes as input the(response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5. The third and fourth convolutional layers are connected to one another without any intervention of pooling or normalization of layers. The third layer has 384 kernels of size 3 connected to the (normalized, pooled, and dropout) output of the second convolutional layer. The fourth convolutional layer has 256 kernels of size 3 This leads to the neural network learning fewer lower level features for smaller receptive fields and more features for higher-level or more abstract features. The first fully connected layer has 1024 neurons, whereas the second fully connected layer has 128 neurons. This indicates that the highest learned feature vector from each side of SigNet has a dimension equal to 128, so where is the other network?

在上面的代码中,我们按如下方式创建了我们的网络:第一个卷积层使用96个大小为11的内核(跨度为1个像素)过滤105 * 105输入签名图像。 第二个卷积层将第一个卷积层的(响应归一化和池化)输出作为输入,并使用256个大小为5的内核对其进行过滤。第三和第四个卷积层彼此连接,而无需任何池化或归一化干预层。 第三层具有384个大小为3的内核,这些内核连接到第二个卷积层的(标准化,合并和丢失)输出。 第四卷积层具有大小为3的256个内核。这导致神经网络针对较小的接收场学习较少的较低层特征,而对于较高层或更多抽象特征学习更多特征。 第一完全连接层具有1024个神经元,而第二完全连接层具有128个神经元。 这表明从SigNet的每个侧面学习的最高特征向量的维数等于128,那么另一个网络在哪里?

Since the weights are constrained to be identical for both networks, we use one model and feed it two images in succession. After that, we calculate the loss value using both the images and then backpropagate. This saves a lot of memory and also computational efficiency.

由于两个网络的权重均被限制为相同,因此我们使用一个模型,并连续为其提供两个图像。 之后,我们同时使用图像和反向传播来计算损耗值。 这样可以节省大量内存并节省计算效率。

损失函数: (Loss Function:)

For this task, we will use Contrastive Loss, which learns embeddings in which two similar points have a low Euclidean distance and two dissimilar points have a large Euclidean distance, In Pytorch the implementation of Contrastive Loss will be as follows,


培训网络: (Training the Network:)

The training process of a Siamese network is as follows:


  • Initialize the network, loss function, and Optimizer(we will be using Adam for this project)

  • Pass the first image of the image pair through the network.

  • Pass the second image of the image pair through the network.

  • Calculate the loss using the outputs from the first and second images.

  • Back propagate the loss to calculate the gradients of our model.

  • Update the weights using an optimizer

  • Save the model


The model was trained for 20 epochs on google colab for an hour, the graph of the loss over time is shown below.

该模型在Google colab上训练了20个小时,历时一个小时,其损失随时间变化的图表如下所示。

Graph of loss over time 时间损失图

测试模型: (Testing the model:)

Now let’s test our signature verification system on the test dataset,


  • Load the test dataset using DataLoader class from Pytorch

  • Pass the image pairs and the labels

  • Find the euclidean distance between the images

  • Based on the euclidean distance print the output


The predictions were as follows,



结论: (Conclusion:)

In this article, we discussed how Siamese networks are different from normal deep learning networks and implemented a Signature verification system using Siamese networks, you can find the entire code here


翻译自: https://towardsdatascience.com/a-friendly-introduction-to-siamese-networks-85ab17522942

