系列博客目录:Caffe转Pytorch模型系列教程 概述
把Caffe模型转换为其他框架下的模型很关键的一步是从.caffemodel/.caffemodel.h5提取出网络的参数,本文将介绍两种方法提取网络的参数。
此方法需要编译Caffe和Caffe的pycaffe模块。编译教程:
讲真, Windows编译Caffe真的困难,我当时搞了好久。
此种方法适用于.prototxt和.caffemodel/.caffemodel.h5都可以获取的模型。
# coding=utf8
from __future__ import absolute_import, division, print_function
import caffe
# prototxt文件
MODEL_FILE = 'SfSNet_deploy.prototxt'
# 预先训练好的caffe模型
PRETRAIN_FILE = 'SfSNet.caffemodel.h5'
if __name__ == '__main__':
# 导入网络
net = caffe.Net(MODEL_FILE, PRETRAIN_FILE, caffe.TEST)
print('*' * 80)
# 遍历每一网络层
for param_name in net.params.keys():
# 得到此层的参数
layer_params = net.params[param_name]
if len(layer_params) == 0:
# 如果参数只有一组,则说明是反卷积层。
# SfSNet整个模型里就只有反卷积层只有一组weight参数
weight = layer_params[0].data
print('%s:\n\t%s (weight)' % (param_name, weight.shape))
elif len(layer_params) == 2:
# 如果参数有两个,则说明是卷积层或者全连接层。
# 卷积层或者全连接层都有两组参数:weight和bias
weight = layer_params[0].data # 权重参数
bias = layer_params[1].data # 偏置参数
print('%s:\n\t%s (weight)' % (param_name, weight.shape))
print('\t%s (bias)' % str(bias.shape))
elif len(layer_params) == 3:
# 如果有三个,则说明是BatchNorm层。
# BN层共有三个参数,分别是:running_mean、running_var和一个缩放参数。
running_mean = layer_params[0].data # running_mean
running_var = layer_params[1].data # running_var
print('%s:\n\t%s (running_var)' % (param_name, running_var.shape), )
print('\t%s (running_mean)' % str(running_mean.shape))
else:
# 如果报错,大家要检查自己模型哈
raise RuntimeError("还有参数个数超过3个的层,别漏了兄dei!!!\n")
代码不难,就是很简单的遍历,难点在于如何确定每层有几个参数。在代码中,我直接根据len(layer_params)
的个数来判断是什么层,实际上我也是根据打印param_name
和len(layer_params)
来判断个层的len(layer_params)
的大小的,希望读者能举一反三,若模型有其他类型的层,需要读者自己取判断此层的参数个数(包括写代码验证和百度)。
保存网络参数的思路非常简单,就是把提取出来的参数保存到一个dict里,然后在使用pickle保存。在Pytorch中,再使用pickle来读取(可自行查阅pickle的是干啥的)。直接上代码:
# coding=utf8
from __future__ import absolute_import, division, print_function
import pickle as pkl
import caffe
# prototxt文件
MODEL_FILE = 'SfSNet_deploy.prototxt'
# 预先训练好的caffe模型
PRETRAIN_FILE = 'SfSNet.caffemodel.h5'
if __name__ == '__main__':
# 导入网络
net = caffe.Net(MODEL_FILE, PRETRAIN_FILE, caffe.TEST)
print('*' * 80)
# 名字和权重词典
name_weights = {}
# 保存每层的参数信息
keys = open('keys.txt', 'w')
keys.write('generated by SfSNet-Caffe/convert_to_pkl.py\n\n')
# 遍历每一网络层
for param_name in net.params.keys():
name_weights[param_name] = {}
# 得到此层的参数
layer_params = net.params[param_name]
if len(layer_params) == 1:
# 如果参数只有一个,则说明是反卷积层,
# SfSNet整个模型里就只有反卷积层只有一组weight参数
weight = layer_params[0].data
name_weights[param_name]['weight'] = weight
print('%s:\n\t%s (weight)' % (param_name, weight.shape))
keys.write('%s:\n\t%s (weight)\n' % (param_name, weight.shape))
elif len(layer_params) == 2:
# 如果参数有两个,则说明是卷积层或者全连接层。
# 卷积层或者全连接层都有两组参数:weight和bias
# 权重参数
weight = layer_params[0].data
name_weights[param_name]['weight'] = weight
# 偏置参数
bias = layer_params[1].data
name_weights[param_name]['bias'] = bias
print('%s:\n\t%s (weight)' % (param_name, weight.shape))
print('\t%s (bias)' % str(bias.shape))
keys.write('%s:\n\t%s (weight)\n' % (param_name, weight.shape))
keys.write('\t%s (bias)\n' % str(bias.shape))
elif len(layer_params) == 3:
# 如果有三个,则说明是BatchNorm层。
# BN层共有三个参数,分别是:running_mean、running_var和一个缩放参数。
running_mean = layer_params[0].data # running_mean
name_weights[param_name]['running_mean'] = running_mean / layer_params[2].data
running_var = layer_params[1].data # running_var
name_weights[param_name]['running_var'] = running_var/layer_params[2].data
print('%s:\n\t%s (running_var)' % (param_name, running_var.shape),)
print('\t%s (running_mean)' % str(running_mean.shape))
keys.write('%s:\n\t%s (running_var)\n' % (param_name, running_var.shape))
keys.write('\t%s (running_mean)\n' % str(running_mean.shape))
else:
# 如果报错,大家要检查自己模型哈
raise RuntimeError("还有参数个数超过3个的层,别漏了兄dei!!!\n")
keys.close()
# 保存name_weights
with open('weights.pkl', 'wb') as f:
pkl.dump(name_weights, f, protocol=2)
补充说明:
我遇到的网络模型权重的后缀是.caffemodel.h5,是一个h5文件,所以可以使用h5py来读取。参考:python库——h5py读取h5文件。
# coding=utf8
from __future__ import absolute_import, division, print_function
import h5py
if __name__ == '__main__':
f = h5py.File('SfSNet.caffemodel.h5', 'r')
for group_name in f.keys():
# print(group)
# 根据一级组名获得其下面的组
group = f[group_name]
for sub_group_name in group.keys():
# print('----'+subgroup)
# 根据一级组和二级组名获取其下面的dataset
dataset = f[group_name + '/' + sub_group_name]
# 遍历该子组下所有的dataset
for dset in dataset.keys():
# 获取dataset数据
sub_dataset = f[group_name + '/' + sub_group_name + '/' + dset]
data = sub_dataset[()]
print(sub_dataset.name, data.shape)
代码简单,直接上代码:
# coding=utf8
from __future__ import absolute_import, division, print_function
import h5py
import pickle as pkl
if __name__ == '__main__':
f = h5py.File('SfSNet.caffemodel.h5', 'r')
for group_name in f.keys():
# print(group_name)
# 根据一级组名获得其下面的组
name_weights = {}
group = f[group_name]
for sub_group_name in group.keys():
# print('----'+sub_group_name)
if sub_group_name not in name_weights.keys():
name_weights[sub_group_name] = {}
# 根据一级组和二级组名获取其下面的dataset
# 经过实验,一个dataset对应一层的参数
dataset = f[group_name + '/' + sub_group_name]
# 遍历该子组下所有的dataset。
# print(dataset.keys())
if len(dataset.keys()) == 1:
# 如果参数只有一个,则说明是反卷积层,
# SfSNet整个模型里就只有反卷积层只有一组weight参数
weight = dataset['0'][()]
name_weights[sub_group_name]['weight'] = weight
print('%s:\n\t%s (weight)' % (sub_group_name, weight.shape))
elif len(dataset.keys()) == 2:
# 如果参数有两个,则说明是卷积层或者全连接层。
# 卷积层或者全连接层都有两组参数:weight和bias
weight = dataset['0'][()] # 权重参数
# print(type(weight))
# print(weight.shape)
name_weights[sub_group_name]['weight'] = weight
bias = dataset['1'][()] # 偏置参数
name_weights[sub_group_name]['bias'] = bias
print('%s:\n\t%s (weight)' % (sub_group_name, weight.shape))
print('\t%s (bias)' % str(bias.shape))
elif len(dataset.keys()) == 3:
# 如果有三个,则说明是BatchNorm层。
# BN层共有三个参数,分别是:running_mean、running_var和一个缩放参数。
running_mean = dataset['0'][()] # running_mean
name_weights[sub_group_name]['running_mean'] = running_mean / dataset['2'][()]
running_var = dataset['1'][()] # running_var
name_weights[sub_group_name]['running_var'] = running_var / dataset['2'][()]
print('%s:\n\t%s (running_var)' % (sub_group_name, running_var.shape), )
print('\t%s (running_mean)' % str(running_mean.shape))
elif len(dataset.keys()) == 0:
# 没有参数
continue
else:
# 如果报错,大家要检查自己模型哈
raise RuntimeError("还有三叔个数超过3个的层,别漏了兄dei!!!\n")
with open('weights1.pkl', 'wb') as f:
pkl.dump(name_weights, f, protocol=2)
补充说明(先看代码再来看):
dataset.keys()
得知。知己知彼,百战不殆。要想把Caffe的权重载入到Pytorch模型里,首先就是要确认两个网络的结构层次是否一样啊对不对。我们先要看下转换后的模型的层以及参数大小。首先下载model.py,然后新建model-test.py,输入如下代码:
# coding=utf-8
from __future__ import absolute_import, division, print_function
from src.models.model import SfSNet
if __name__ == '__main__':
net = SfSNet()
net.eval()
index = 0
for name, param in list(net.named_parameters()):
print(str(index) + ':', name, param.size())
index += 1
运行,得到输出结果(为了节省篇幅,并不给出全部的运行结果):
0: conv1.weight (64, 3, 7, 7)
1: conv1.bias (64,)
2: bn1.weight (64,)
3: bn1.bias (64,)
...
10: n_res1.bn.weight (128,)
11: n_res1.bn.bias (128,)
12: n_res1.conv.weight (128, 128, 3, 3)
13: n_res1.conv.bias (128,)
14: n_res1.bnr.weight (128,)
15: n_res1.bnr.bias (128,)
16: n_res1.convr.weight (128, 128, 3, 3)
17: n_res1.convr.bias (128,)
...
50: nbn6r.weight (128,)
51: nbn6r.bias (128,)
52: nup6.weight (128, 1, 4, 4)
53: nconv6.weight (128, 128, 1, 1)
54: nconv6.bias (128,)
...
120: fc_light.weight (27, 128)
121: fc_light.bias (27,)
对上面的结果进行简单的分析:
self.conv1 = nn.Conv2d(3, 64, 7, 1, 3)
。请参考:PyTorch之保存加载模型。文中提到了一个函数nn.Module.load_state_dict,这个函数就是pytorch模型载入参数的函数。load_state_dict的原型如下:
load_state_dict(state_dict, strict=True)
state_dict是一个字典,包含了网络的参数。比如conv1的weight是数组arr1,那么state_dict[‘conv1.weight’]==arr1;conv1的bias是数组arr2,那么state_dict[‘conv1.bias’]==arr2。依次类推,state_dict应当包含Pytorch模型中所有层的参数,不能多也不能少。
那么,要载入我们从.caffemodel中提取的参数,就构造一个state_dict,其中包含了所有的层的参数。
2.1 设置conv1的weight和bias
首先从weights.pkl载入name_weights;然后初始化一个字典state_dict。
from torch import from_numpy
with open('weights.pkl', 'rb') as wp:
name_weights = pkl.load(wp)
state_dict = {}
然后:
state_dict['conv1.weight'] = from_numpy(name_weights['conv1']['weight'])
state_dict['conv1.bias'] = from_numpy(name_weights['conv1']['bias'])
‘conv1.weight’和’conv1.bias’是第1步打印出来的参数,这两个key对应的值就会被分别载入到conv1的weight和bias里面。torch.from_numpy
是从numpy数组生成Tensor。name_weights里面就是存的提取出来的权重,假如这层的名字是’conv1’,这层的参数就存在name_weights['conv1']
里面,也是一个字典。
2.2 设置bn1的参数
state_dict['bn1.running_var'] = from_numpy(name_weights[key]['running_var'])
state_dict['bn1.running_mean'] = from_numpy(name_weights[key]['running_mean'])
state_dict['bn1.weight'] = torch.ones_like(state_dict[layer + '.running_var'])
state_dict['bn1.bias'] = torch.zeros_like(state_dict[layer + '.running_var'])
bn1是一个BatchNorm2d层,原型声明为:
BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
参数afline保持默认为true,track_running_stats也要保持为true。
2.3 设置残差块的参数
SfSNet中重复的结构,所以我把它们改成了一个残差块。为了避免写很多重复的代码,把设置残差块的代码封装成函数,以便重复使用。
def _set(layer, key):
state_dict[layer + '.weight'] = from_numpy(name_weights[key]['weight'])
state_dict[layer + '.bias'] = from_numpy(name_weights[key]['bias'])
def _set_bn(layer, key):
state_dict[layer + '.running_var'] = from_numpy(name_weights[key]['running_var'])
state_dict[layer + '.running_mean'] = from_numpy(name_weights[key]['running_mean'])
state_dict[layer + '.weight'] = torch.ones_like(state_dict[layer + '.running_var'])
state_dict[layer + '.bias'] = torch.zeros_like(state_dict[layer + '.running_var'])
def _set_res(layer, n_or_a, index):
_set_bn(layer+'.bn', n_or_a + 'bn' + str(index))
_set(layer+'.conv', n_or_a + 'conv' + str(index))
_set_bn(layer+'.bnr', n_or_a + 'bn' + str(index) + 'r')
_set(layer+'.convr', n_or_a + 'conv' + str(index) + 'r')
_set_res
就是设置一个残差块的函数。参数:第一个参数layer是残差层的名字,第二个参数n_or_a指定
那么设置第一个残差块n_res1/a_res1参数的代码就是:
_set_res('n_res1', 'n', 1)
...
_set_res('a_res1', 'a', 1)
2.4 设置nup6的参数
state_dict['nup6.weight'] = from_numpy(name_weights[key]['weight'])
nup6是一个反卷积层,在SfSNet中bias项被关闭,所以只有一组参数weight。
2.5 封装
其他层的参数设置就不多说了,等所有参数都设置完成之后,调用nn.Module.load_state_dict
函数载入参数:
net.load_state_dict(state_dict)
由于构造state_dict是个非常依赖于模型的过程,所以我把它封装成了一个函数:load_weights_from_pkl
。完整的代码在:model.py中。