上一篇博文(基于深度学习的图像匹配技术专题- [patch based matching3]-mnist siamese 学习)已经完整的介绍了caffe mnist siamese模型,并训练成功,相信大家已经有点感觉了。现在我们的目标回到MatchNet,虽然作者没有提供源码,我们也能自己实现出来,加油。
当然,我们还有很多问题没有介绍,如制备数据集合等内容。我们选择边实现模型边解决这些实际问题,抓住主干,各个击破。
1.MatchNet的网络结构实现
在本专题第一讲中,我们已经知道了MatchNet的网络结构,下面我们用caffe实现出来。这里有两个.prototxt文件:一个是模型,一个是solver。其中solver的参数该如何设置,可能是更麻烦的问题。
好了,我们首先是写train.prototxt文件。
caffe基础知识
layer {
name: "pair_data"
type: "Data"
top: "pair_data"
top: "sim"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/siamese/mnist_siamese_train_leveldb"
batch_size: 64
}
}
c
name: "mnist_matchnet_train_test"
##数据输入
layer {
name: "pair_data"
type: "Data"
top: "pair_data"
top: "sim"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/siamese/mnist_siamese_train_leveldb"
batch_size: 64
}
}
layer {
name: "pair_data"
type: "Data"
top: "pair_data"
top: "sim"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/siamese/mnist_siamese_test_leveldb"
batch_size: 100
}
}
layer {
name: "slice_pair"
type: "Slice"
bottom: "pair_data"
top: "data"
top: "data_p"
slice_param {
slice_dim: 1
slice_point: 1
}
}
###########卷积层########
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param { ##权重和偏置
name: "conv1_w"
lr_mult: 1
}
param {
name: "conv1_b"
lr_mult: 2
}
convolution_param { ## 输出通道,kernel,stride
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
name: "conv2_w"
lr_mult: 1
}
param {
name: "conv2_b"
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
name: "conv3_w"
lr_mult: 1
}
param {
name: "conv3_b"
lr_mult: 2
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
name: "conv4_w"
lr_mult: 1
}
param {
name: "conv4_b"
lr_mult: 2
}
convolution_param {
num_output: 96
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
name: "conv5_w"
lr_mult: 1
}
param {
name: "conv5_b"
lr_mult: 2
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv5"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer { name: "ip1" type: "InnerProduct" bottom: "pool3" top: "ip1" param { name: "ip1_w" lr_mult: 1 } param { name: "ip1_b" lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1"}layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { name: "ip2_w" lr_mult: 1 } param { name: "ip2_b" lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}layer { name: "feat" type: "InnerProduct" bottom: "ip2" top: "feat" param { name: "feat_w" lr_mult: 1 } param { name: "feat_b" lr_mult: 2 } inner_product_param { num_output: 2 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}
#######################second 分支######################################
layer { name: "conv1_p" type: "Convolution" bottom: "data_p" top: "conv1_p" param { name: "conv1_w" lr_mult: 1 } param { name: "conv1_b" lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}layer { name: "pool1_p" type: "Pooling" bottom: "conv1_p" top: "pool1_p" pooling_param { pool: MAX kernel_size: 2 stride: 2 }}layer { name: "conv2_p" type: "Convolution" bottom: "pool1_p" top: "conv2_p" param { name: "conv2_w" lr_mult: 1 } param { name: "conv2_b" lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}layer { name: "pool2_p" type: "Pooling" bottom: "conv2_p" top: "pool2_p" pooling_param { pool: MAX kernel_size: 2 stride: 2 }}
layer {
name: "conv3_p"
type: "Convolution"
bottom: "pool2_p"
top: "conv3_p"
param {
name: "conv3_w"
lr_mult: 1
}
param {
name: "conv3_b"
lr_mult: 2
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "conv4_p"
type: "Convolution"
bottom: "conv3_p"
top: "conv4_p"
param {
name: "conv4_w"
lr_mult: 1
}
param {
name: "conv4_b"
lr_mult: 2
}
convolution_param {
num_output: 96
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "conv5_p"
type: "Convolution"
bottom: "conv4_p"
top: "conv5_p"
param {
name: "conv5_w"
lr_mult: 1
}
param {
name: "conv5_b"
lr_mult: 2
}
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool3_p"
type: "Pooling"
bottom: "conv5_p"
top: "pool3_p"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer { name: "ip1_p" type: "InnerProduct" bottom: "pool3_p" top: "ip1_p" param { name: "ip1_w" lr_mult: 1 } param { name: "ip1_b" lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}layer { name: "relu1_p" type: "ReLU" bottom: "ip1_p" top: "ip1_p"}layer { name: "ip2_p" type: "InnerProduct" bottom: "ip1_p" top: "ip2_p" param { name: "ip2_w" lr_mult: 1 } param { name: "ip2_b" lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}layer { name: "feat_p" type: "InnerProduct" bottom: "ip2_p" top: "feat_p" param { name: "feat_w" lr_mult: 1 } param { name: "feat_b" lr_mult: 2 } inner_product_param { num_output: 2 weight_filler { type: "xavier" } bias_filler { type: "constant" } }}layer { name: "loss" type: "ContrastiveLoss" bottom: "feat" bottom: "feat_p" bottom: "sim" top: "loss" contrastive_loss_param { margin: 1 }}
2. solver参数设置
前面部分描述了使用matchnet网络来处理mnist siamese数据集,我们在原来网络的基础上增加了 3个卷积层和一个max pooling, 下面我们来加入他们的参数。
# The train/test net protocol buffer definition
net: "examples/siamese/mnist_matchnet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0000
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 50000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/siamese/mnist_siamese"
# solver mode: CPU or GPU
solver_mode: GPU
#!/usr/bin/env sh
set -e
TOOLS=./build/tools
$TOOLS/caffe train --solver=examples/siamese/mnist_match_solver.prototxt $@
可以看到这个网络结构,可以达到更高的准确度。网络训练过程中,可能过拟合了,因为训练中的Loss很小,测试的效果没有那么理想。
这个网络没有改变Loss function,而是保留了siamese 中的对应部分,这是接下来要做的工作。
——————————————————————————
你的支持,将会帮助我更新博客,提供更多有别于其他博客的博文。