Caffe中BN层与CONV层的融合(merge_bn)

半年前写的博客,今天发现没有发出去,还好本地有md的文档,决定重新发一下
毕竟网上来回抄袭的blog太多了,代码质量也莫得保证
今天需要用pytorch融合下bn层,写个脚本稍后再传到csdn上

原理

BN层的融合即为将caffe中BN层与Scale层融合进Convolution的权重之中,把三层减少为一层,适用于训练完成的模型部署阶段,而不能用在训练中。

融合BN与卷积要求BN层位于卷积之后,且融合后的卷积层参数convolution_param中的bias_term必须为true。

假设每一层的输入均表示为 X X X,则卷积层可以写作
W X + b WX+b WX+b
BN层执行了两个操作,一个是归一化,另一个是缩放,两个阶段的操作分别为:
X − m e a n v a r \frac {X-mean} {\sqrt{var}} var Xmean
γ ∗ X + β \gamma*X+\beta γX+β

将上面三个式子合并,即可得到:
γ ∗ ( W o l d ∗ X + b o l d ) − m e a n v a r + β \gamma*\frac{(W_{old}*X+b_{old})-mean}{\sqrt{var}} + \beta γvar (WoldX+bold)mean+β

展开,即可得到:
W n e w = γ v a r ∗ W o l d W_{new} = \frac{\gamma}{\sqrt{var}}*W_{old} Wnew=var γWold
b n e w = γ v a r ( b o l d − m e a n ) + β b_{new} = \frac{\gamma}{\sqrt{var}} (b_{old}-mean)+\beta bnew=var γ(boldmean)+β

这样合并后的卷积就成了
W n e w X + b n e w W_{new}X+b_{new} WnewX+bnew

代码

转换Caffe模型的Python代码如下:

import caffe
import os
import numpy as np
import google.protobuf as pb
import google.protobuf.text_format


# project root
ROOT = '/home/zym/tensorrt/mobilenet'

# choose your source model and destination model
WEIGHT = os.path.join(ROOT, 'mobilenet.caffemodel')
MODEL = os.path.join(ROOT, 'mobilenet.prototxt')
DEPLOY_MODEL = os.path.join(ROOT, 'mobilenet_deploy.prototxt')

# set network using caffe api
caffe.set_mode_gpu()
net = caffe.Net(MODEL, WEIGHT, caffe.TRAIN)
dst_net = caffe.Net(DEPLOY_MODEL, caffe.TEST)
with open(MODEL) as f:
    model = caffe.proto.caffe_pb2.NetParameter()
    pb.text_format.Parse(f.read(), model)

# go through source model 
for i, layer in enumerate(model.layer):
    if layer.type == 'Convolution':
        # extract weight and bias in Convolution layer
        name = layer.name
        if 'fc' in name:
            dst_net.params[name][0].data[...] = net.params[name][0].data
            dst_net.params[name][1].data[...] = net.params[name][1].data
            break
        w = net.params[name][0].data
        batch_size = w.shape[0]
        try:
            b = net.params[name][1].data
        except:
            b = np.zeros(batch_size)

        # extract mean and var in BN layer
        bn = name+'/bn'
        mean = net.params[bn][0].data
        var = net.params[bn][1].data
        scalef = net.params[bn][2].data
        if scalef != 0:
            scalef = 1. / scalef
        mean = mean * scalef
        var = var * scalef

        # extract gamma and beta in Scale layer
        scale = name+'/scale'
        gamma = net.params[scale][0].data
        beta = net.params[scale][1].data

        # merge bn
        tmp = gamma/np.sqrt(var+1e-5)
        w = np.reshape(tmp, (batch_size, 1, 1, 1))*w
        b = tmp*(b-mean)+beta

        # store weight and bias in destination net
        dst_net.params[name][0].data[...] = w
        dst_net.params[name][1].data[...] = b

dst_net.save('mobilenet_deploy.caffemodel')

# test merged network
img = caffe.io.load_image('/home/zym/imagenet_test.JPEG')

transformer = caffe.io.Transformer({'data': dst_net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))
transformer.set_channel_swap('data', (2, 1, 0))
transformer.set_raw_scale('data', 255)
transformer.set_mean('data', np.array([103.939, 116.779, 123.68]))
transformer.set_input_scale('data', 0.017)

# get merged network output
img = transformer.preprocess('data', img)
dst_net.blobs['data'].data[...] = img
out = dst_net.forward()['prob']
print(np.argmax(out.flatten()))

你可能感兴趣的:(深度学习,CNN,caffe)