Introduction of Deep Learning

1. three step for deep learning

  1. define a set of function / neural network
  2. goodness of function
  3. pick the best function

2. neural network

2.1 neuron

Introduction of Deep Learning_第1张图片
image.png

2.2 sigmoid function

Introduction of Deep Learning_第2张图片
image.png

sigmoid function的值域为(0,1),常用于神经网络的阈值函数,将变量映射到0,1之间。

2.3 neural network

  1. different connections leads to different network structure.
  2. Each neurons can have different values of weights and biases.Weights and biases are network parameters θ.

2.4 fully connect feedforward network

Introduction of Deep Learning_第3张图片
image.png

Deep means many hidden layers.

2.5 output layer(option)

ordinary layer:


Introduction of Deep Learning_第4张图片
image.png

softmax layer:


Introduction of Deep Learning_第5张图片
image.png

In general,the output of network can be any value.May not be easy to interpet.
如果输出层是使用的softmax,输出结果可以认为是概率,因为满足:
  • 1>yi>0
  • Σyi=1

2.6 FAQ

  • how many layers?How many neurons for each layer?
    trial and error、intuition
  • Can structure be automatically determined?

3. goodness of function

3.1 training date

  • input date
  • their labels

3.2 learning target

根据输入数据,输出对应的label

3.3 loss

Loss can be the distance between the network output and target.
A good function should make the loss of all examples as small as possible.
total loss: L=Σli , li∈train date

4. pick the best function

How to pick the best function (find network parameters θ that minimize total loss L)?

  1. Enumerate all possible values
  2. Gradient Descent

4.1 Gradient Descent

Introduction of Deep Learning_第6张图片
image.png
  1. Pick an initial value for w(Random,RBM pre-train)
  2. Compute ∂L/∂w,negative→Increase w;Positive→Decrease w.
    w←w-η∂L/∂w,η is called "learning rate".
  3. Repeat 2,until ∂L/∂w is approximately small(when update is little)

4.2 avoid local minima

Gradient descent never guarantee global minima.There are some tips to help you avoid local minima,no guarantee.

5. Why Deep?

  1. Deep is better.More parameters,better performance.(说明通过deep可以实现)
  2. Universality Theorem:Any continuous function f


    image.png

    Can be realized by a network with one hidden layer(given enough hidden neurons).(说明通过fat可以实现)

  3. 从错误率来看,Thin+Tall is better than Fat+Short
  4. 通过deep,实现Modularization,需要更少的training data。例如,将一群动物分类,如果直接按照种分类的话,因为种很多,每个种都需要一定数量的training data;可以按照纲、目、科、属、种的顺序一层层分类,需要的training data就少一些。

你可能感兴趣的:(Introduction of Deep Learning)