李宏毅机器学习(2017)——笔记2Deep Learning


深度学习(DL, Deep Learning)

每个 "logistic regression"看成一个 “Neuron”,多个 “Neuron” 组成Neural Network 神经网络。

1958:Perceptron感知器(linear model)
1969:Perceptron has limitation
1980s:Multi-layer perceptron
[Do not have significant difference from Deep Neural Networks(DNN) today]
[Usually more than 3 hidden layers is not helpful]
1989:1 hidden layer is “good enough”. why deep?
[突破:改了个名字 “深度学习”。]
2006:RBM(Restricted Boltzmann Machine) initialization_Geoffrey E. Hinton
2011:Start to be popular in speech recognition
2012:win ILSVRC image competition

sigmoid function ——> Activation Function
fully connected Feedforward network This is a function(Input vector, output vector)
Matrix Operation
You need to decide the network structure to let a good function in your function set.(层数,每层的个数和用什么激活函数)
special structure:

  • Convolutional Neural Network (CNN)

Chain Rule
Forward pass:
Backward pass:

用Keras 就像是在搭积木。

Tips in DNN
层数越多training data不一定会更好,所有首先在training data上得到好的结果。
Do not always blame Overfitting
到底要在training data好还是testing data好。
Good Results on Training Data? No
1new activation function
Vanishing Gradient Problem
ReLU(Rectified Linear Unit),ReLU-variant(Leaky ReLU, Parametric ReLU, ELU),
Maxout[Learnable activation function, ReLU is a special cases of Maxout]]
2adaptive learning rate
Adam(RMSProp + Momentum)]
Good Results on Testing Data? No
1Early Stopping
3Dropout[Dropout is a kind of ensemble]

CNN for 计算机视觉
Network 的架构是可以设计的。
CNN 是 fully connected network 的简化版(参数减少)。
[使用 CNN 处理图像 Image 的 3个理由:
1 A neuron does not have to see the whole image to discover the pattern. Connecting to small region with less parameters.
2The same patterns appear in different regions.
3Subsampling the pixels will not change the object.]
1、 2——》Convolution
3 ——》Max Pooling
Input, Convolution(layer), Max Pooling(layer), Convolution(layer), Max Pooling(layer),...,Convolution(layer), Max Pooling(layer), Flatten, Fully Connected Feedforward network, Output.
Property1 平移不变性
Property2 模型的空间层次结构
CNN – Convolution卷积运算
[大小 ,数值;
Feature Map特征图]
Convolution v.s. Fully Connected
注意:Each filter is a channel.
CNN – Max Pooling最大池化运算
注意:卷积通常使用 3×3 窗口和步幅 1 ;最大池化通常使用 2×2 窗口和步幅 2 。
分析 CNN(Filter) 的结果:
1First Convolution Layer[Typical-looking filters on the trained first layer];How about higher layers?[Which images make a specific neuron activate]
2What does CNN learn?[Degree of the activation of the k-th filter,]


  • 从头开始训练一个小型模型
  • 使用预训练的网络做特征提取
  • 对预训练的网络进行微调

RNN for 文本和序列


