Local Response Normalization (LRN)

This concept was raised in AlexNet, click here to learn more.
Local response normalization algorithm was inspired by the real neurons, as the author said, “bears some resemblance to the local contrast normalization”. The common point is that they both want to introduce competitions to the neuron outputs, the difference is LRN do not subtract mean and the competition happens among the outputs of adjacent kernels at the same layer.
The formula for LRN is as follows:

a(i, x, y) represents the i th conv. kernel’s output (after ReLU) at the position of (x, y) in the feature map.
b(i, x, y) represents the output of local response normalization, and of course it’s also the input for the next layer.
N is the number of the conv. kernel number.
n is the adjacent conv. kernel number, this number is up to you. In the article they choose n = 5.
k, α, β are hyper-parameters, in the article, they choose k = 2, α = 10e-4, β = 0.75.

Flowchart of Local Response Normalization

I drew the above figure to illustrate the process of LRN in neural network. Just a few tips here:

* This graph presumes that the i th kernel is not at the edge of the kernel space. If i equals zero or one or last or one to the last, one or two additional zero padding conv. kernels are required.
* In the article, n is 5, we presume n/2 is integer division, 5/2 = 2.
* Summation of the squares of output of ReLU stands for: for each output of ReLU, compute its square, then, add the 5 squared value together. This process is the summation term of the formula.
* I presume the necessary padding is used by the input feature map so that the output feature maps have the same size of the input feature map, if you really care. But this padding may not be quite necessary.

After knowing what LRN is, another question is: what the output of LRN looks like?
Because the LRN happens after ReLU, so the inputs should all be no less than 0. The following graph tries to give you an intuitive understanding on the output of LRN, however, you still need to use your imagination.

Be noted that the x axis represents the summation of the squared output of ReLU, ranging from 0 to 1000, and the y axis represents b(i, x, y) divides a(i, x, y). The hyper-parameters are set default to the article.
So, the real b(i, x, y)’s value should be the the y axis’s value multiplied with the a(i, x, y), use your imagination here, two different inputs a(i, x, y) pass through this function. Since the slope at the beginning is very steep, little difference among the inputs will be significantly enlarged, this is where the competition happens.
The figure was generated by the following python code:
1
2
3
4
5
6
7
8
9
10
11

import numpy as np
import matplotlib.pyplot as plt
def lrn(x):
… y = 1 / (2 + (10e-4) * x * 2) * 0.75
… return y
input = np.arange(0, 1000, 0.2)
output = lrn(input)
plt.plot(input, output)
plt.xlabel(‘sum(x^2)’)
plt.ylabel(‘1 / (k + a * sum(x^2))’)
plt.show()

你可能感兴趣的:(Local Response Normalization (LRN))