Batch normalization Folding

参考:http://machinethink.net/blog/object-detection-with-yolo/#converting-to-metal

https://github.com/hollance/Forge/blob/master/Examples/YOLO/yolo2metal.py

https://blog.csdn.net/grllery/article/details/90383417

Why Batch normalization

The idea behind “batch norm” is that neural network layers work best when the data is clean. Ideally, the input to a layer has an average value of 0 and not too much variance. This should sound familiar to anyone who’s done any machine learning because we often use a technique called “feature scaling” or “whitening” on our input data to achieve this.

Batch normalization does a similar kind of feature scaling for the data in between layers. This technique really helps neural networks perform better because it stops the data from deteriorating as it flows through the network.

Why Batch normalization Folding

Batch normalization is important when training a deep network, but it turns out we can get rid of it at inference time. Which is a good thing because not having to do the batch norm calculations will make our app faster.

How Batch normalization Folding

batch normalization 层通常放在卷积层conv和全连层dense后,对于输入x,batchnorm层的输出为:

bn_y = gamma *(x -mean)/sqrt(variance) + beta

对于卷积层conv的输入x,卷积层的输出为:

conv_y = conv(x, weight)
=x[i]*w[0] + x[i+1]*w[1]+x[i+2]*w[2]+…+x[i+k]*w[k] + b

设定卷积层的new_weight为:

new_weight = gamma * weight / sqrt(variance)
new_b = gamma *(b-mean) / sqrt(variance) + beta

则在输入为x时,卷积层的输出为

conv(x, new_weight)
= x[i] * w[0] * gamma/sqrt(variance) + x[i] * w[0] * gamma/sqrt(variance) + … x[i] * w[0] * gamma/sqrt(variance) + gamma *(b-mean) / sqrt(variance) + beta
= conv(x, weight) * gamma/sqrt(variance) - gamma * mean / sqrt(variance) + beta
= gamma * ( conv(x, weight) - mean)/ sqrt(variance) + beta

即在卷积层输出为conv(x, weight),batchnorm层输入为卷积层输出时,batchnorm层的输出。即更新weight和b之后,可将卷积层可将batchnorm层合并。

代码示例:

你可能感兴趣的:(TensorFlow,Python)