Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)

1. Tips for DNN

In this lesson, Pro. LEE taught us some tips for deep neural network, which contains:

  1. Adaptive Learning Rate
  2. New Activation Function
  3. Dropout
  4. Regularization
  5. Early Stopping

Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第1张图片

1.1 Adaptive Learning rate

The knowledge about Adaptive Learning Rate has already been introduced in my previous blog. Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2).

1.2 New Activate Function

The reason that we need to find a new activate function, rather than previous Sigmoid Function, can be explained in the following figure. Because Sigmoid Function maps a large value to a small value,the influence of the input layer will become less and less during the process of propogation. Judging from the perspective of back-propogation theory, the gradient of the input layer will become so little that we can not train the network.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第2张图片
With the aim of solving this problem, the new activate function introduced in this lesson is ReLU. Looking at the first quadrant,the activate function does not change the value of the input, which will vanish the gradient problem. And the second quadrant of the function resemble the working process of our brain. Most neurons in the brain are not excited. They are excited only when the stimulation exceeds a certain threshold.

Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第3张图片
The stage of different neurals can be shown as following figure. For the activate neurals, they does not change the value of input. For the unactivate neurals, we can regard them as unexisted ones.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第4张图片

So, the structure of the neural network can be shown as:
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第5张图片
ReLU has some other versions. Some people think gradient should not equals to zero but to a very small number when the ouput is less than zero, so the left one is proposed. Some people think the gradient should be a changeable parameter, so the right one is developed.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第6张图片
ReLu actually offers us a linear activate function, but just one linear structure. Can we find a activate containing different linear structure for different input value? The answer is yes. The maxout can do this.
The following figure tells us maxout can do the same thing with ReLU.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第7张图片
The following figures show us the way in which maxout offers us different linear structure in one activate function.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第8张图片
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第9张图片

1.3 Early Stopping

Early stopping is an effective way to deal with the problem of overfitting. We need to make the training of our neural network stop before finding the minima of the loss function of training set.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第10张图片

1.4 Regularization

Regularization is adding a for our parameters in order to avoid overfitting.

1.5 Dropout

Dropout is also a method to avoid overfitting. Dropout means we need to drop some parts of our neurals every time we train our network.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第11张图片
When we use our network in testing set, we should not drop any parts of our network and should times (1-p)% for all weights if our droping rate is p% at training set.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第12张图片
Dropout can be seen as a method of ensemble, just like Randomforest or XGBoost. The reason why we can regard dropout as a kind of ensemble method can be explained by the following figures.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第13张图片
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)_第14张图片

你可能感兴趣的:(深度学习,神经网络,人工智能)