caffe finetune微调固定权重的方法

结论:若不加param参数的resnet-50进行finetune,所有层还是会进行微调,若不想微调需要在层中加入param{lr_mult: 0}。

注意防坑:在train.sh文件中,训练命令,即train命令之后的–weights等参数行间不要夹杂注释内容,否则会不读入注释下一行的–参数。

如果发现精度上不去,可能是由于fc层没有加入初始化(或者学习率)的原因。需要w加入xavier初始化b为constant value:0以及param{lr_mult:1}等。

以Resnet-50为例:修改两层conv

layer {
bottom: “data”
top: “conv1”
name: “conv1”
type: “Convolution”
param{ #修改此处,只添加一次param使得w固定权重,而b则进行微调
lr_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 7
pad: 3
stride: 2
weight_filler {
type: “msra”
}
bias_term: true
}
}

layer {
bottom: “res2b_branch2a”
top: “res2b_branch2b”
name: “res2b_branch2b”
type: “Convolution”
param { #修改此处param,使得lr_mult为0,此层在finetune时不发生改变
lr_mult: 0
#decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
weight_filler {
type: “msra”
}
bias_term: false
}
}
layer {
bottom: “pool5”
top: “fc1000-f”
name: “fc1000-f”
type: “InnerProduct”
# param {
# lr_mult: 1
# decay_mult: 1
# }
# param {
# lr_mult: 2
# decay_mult: 1
# }
inner_product_param {
num_output: 1150
weight_filler {
type: “xavier”
#value: 0
}
bias_filler {
type: “constant”
value: 0
}
}
}

fc层如果不初始化,权重更新缓慢,可能也会影响精度问题。

利用Resnet-50-model.caffemodel,进行finetune。需要修改fc层的名称以及num_output。

GLOG_log_dir= MODEL M O D E L CAFFE train \
–solver /Users/eclipsycn/Documents/Stamp_Proj/Stamp294_Resnet50/prototxt/resnet50_solver.prototxt \
–weights /Users/eclipsycn/Documents/Stamp_Proj/Stamp294_Resnet50/prototxt/ResNet-50-model.caffemodel

finetune训练

防坑:在ubuntu16.04下测试,利用GPUfinetune,尝试了一晚上,与上述结论不符合,结果发现,在train_finetune.sh的命令中,不能夹杂注释,例如

GLOG_log_dir= MODEL M O D E L CAFFE train \
–gpu $gpu \
–solver /home/sy/Stamp1150/prototxt/resnet50_solver.prototxt \
#–snapshot /home/sy/Stamp1150/snapshot/resnet50__iter_501085.solverstate
–weights /home/sy/Stamp1150/resnet_pretrainedmodel/ResNet-50-model.caffemodel

这样就不行,需要把注释加到最后,让有效命令每行挨着。

GLOG_log_dir= MODEL M O D E L CAFFE train \
–gpu $gpu \
–solver /home/sy/Stamp1150/prototxt/resnet50_solver.prototxt \
–weights /home/sy/Stamp1150/resnet_pretrainedmodel/ResNet-50-model.caffemodel
#–snapshot /home/sy/Stamp1150/snapshot/resnet50__iter_501085.solverstate

这也是通用的坑。这样训练出的模型,就是基于ResNet-50-model.caffemodel而训练的,而中间夹杂着注释的则是没有读入pretrained模型,自己初始化后的结果。

训练后的24次迭代caffemodel与Resnet-50-model.caffemodel进行比对。
(需要注意,在deploy文件中,原始Resnet-50分类为1000类,而finetune的train.prototxt的num_output是自己定义的,所以两次测试不能用同一个deploy文件。)

测试代码如下:

#coding:utf-8
import sys
caffe_root = '/Users/eclipsycn/Documents/caffe/'
sys.path.append(caffe_root + 'python')
import caffe
a = 1
caffemodel1 = '/Users/eclipsycn/Documents/Stamp_Proj/Stamp294_Resnet50/prototxt/ResNet-50-model.caffemodel'
deploy1 = '/Users/eclipsycn/Documents/Stamp_Proj/Stamp294_Resnet50/prototxt/ResNet_50_deploy_fc1000.prototxt'
net1 = caffe.Net(deploy1, caffemodel1, caffe.TEST)

caffemodel2 = '/Users/eclipsycn/Documents/Stamp_Proj/Stamp294_Resnet50/snapshot/resnet__iter_24.caffemodel'
deploy2 = '/Users/eclipsycn/Documents/Stamp_Proj/Stamp294_Resnet50/prototxt/ResNet_50_deploy.prototxt'
net2 = caffe.Net(deploy2, caffemodel2, caffe.TEST)


print 'net1-conv1-w',net1.params['conv1'][0].data[0][0][0]
print 'net2-conv1-w',net2.params['conv1'][0].data[0][0][0]
print 'net1-conv1-b == net2-conv1-b',net1.params['conv1'][1].data == net2.params['conv1'][1].data

print 'net1-res2b_branch2b',net1.params['res2b_branch2b'][0].data[0][0][0]
print 'net2-res2b_branch2b',net2.params['res2b_branch2b'][0].data[0][0][0]

结果:

net1-conv1-w [ 0.02825263  0.01818723  0.01588493  0.00242005 -0.05301533 -0.04735583
  0.01854295]
net2-conv1-w [ 0.02825263  0.01818723  0.01588493  0.00242005 -0.05301533 -0.04735583
  0.01854295]
net1-conv1-b == net2-conv1-b [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False]
net1-res2b_branch2b [ 0.01220033  0.01853037  0.00676085]
net2-res2b_branch2b [ 0.01220033  0.01853037  0.00676085]

从结果可以看出,如果设置lr_mult=0,则在finetune中固定此层权重,若有bias项,也同样根据param的数量,判断是否修改w和b。

你可能感兴趣的:(深度学习笔记,caffe)