深度学习(3)手写数字识别问题

深度学习(3)手写数字识别问题

  • 1. 问题归类
  • 2. 数据集
  • 3. Image
  • 4. Input and Output
  • 5. Regression VS Classification
  • 6. Computation Graph
  • 7. 两个问题
  • 8. Particularly
  • 9. 如何训练模型? → \to Loss
  • 10. 总结
  • 11. Deep Learning?
  • 12. Classification Procedure
  • 13. We need TensorFlow
  • 14. Next

1. 问题归类

Discrete Prediction(离散值的预测)

  • y = w ∗ x + b y=w*x+b y=wx+b
  • [up, left, down, right]
  • [dog, cat, whale, bird, …]
    手写数字识别问题属于离散值的预测。

2. 数据集

  • MNIST
    • 7000 images per category
    • train/test splitting: 60k vs 10k
      深度学习(3)手写数字识别问题_第1张图片

3. Image

  • [28, 28, 1]
    图片是由28行×28列,共784个像素点组成,[0, 255]代表图片像素的灰度值,其中0代表纯白色,255代表纯黑色,1代表每个像素点的灰度值,也就是每个像素点只有1个维度,就是其灰度值。
  • → \to [784]
    将28×28的数据变为一维,将第二行的像素点拼接到一行后,后面26行同理,这样一张图片就变为了拥有784个元素的一维数据。
    深度学习(3)手写数字识别问题_第2张图片
    深度学习(3)手写数字识别问题_第3张图片

4. Input and Output

(1) 输入
x : [ b , 784 ] x:[b,784] x:[b,784]
输入是[b, 784],b可以理解为共有多少张图片,784表示每张图片有784个像素点。
(2) 编码方式

  • dog=0, cat=1, fish=2, …
    缺点: 不确定性高,例如要是预测值为1.5,就会产生判断失误。
  • dog = [1, 0, 0, …],其中“1”表示该预测值为“dog”的概率,“0”表示该预测值为“cat”的概率,…,这些概率和为1。
    cat = [0, 1, 0, …]
    fish = [0, 0, 1, …]
    深度学习(3)手写数字识别问题_第4张图片

这种编码方式被称为one-hot编码。

5. Regression VS Classification

(1) 模型

  • y = x ∗ x + b y=x*x+b y=xx+b
  • y ∈ R d y∈R^d yRd

(2) 输出

  • o u t = X @ W + b out=X@W+b out=X@W+b
  • o u t : [ 0.1 , 0.8 , 0.02 , 0.08 ] out:[0.1,0.8,0.02,0.08] out:[0.1,0.8,0.02,0.08]

(3) 预测

  • p r e d = a r g m a x ( o u t ) pred=argmax(out) pred=argmax(out)
    • p r e d : 1 pred:1 pred:1
    • l a b e l : 2 label:2 label:2

6. Computation Graph

  • o u t = X @ W + b out=X@W+b out=X@W+b
  • X : [ b , 784 ] X:[b,784] X:[b,784]
  • W : [ 784 , 10 ] W:[784,10] W:[784,10]
  • b : [ 10 ] b:[10] b:[10]

7. 两个问题

(1) It’s Linear!

  • o u t = X @ W + b out=X@W+b out=X@W+b
  • → \to
  • o u t = f ( X @ W + b ) out=f(X@W+b) out=f(X@W+b)

深度学习(3)手写数字识别问题_第5张图片

  • o u t = r e l u ( X @ W + b ) out=relu(X@W+b) out=relu(X@W+b)

(2) It’s too simple!

  • o u t = r e l u ( X @ W + b ) out=relu(X@W+b) out=relu(X@W+b)
  • → \to
  • h 1 = r e l u ( X @ W 1 + b 1 ) h_1=relu(X@W_1+b_1) h1=relu(X@W1+b1)
  • h 2 = r e l u ( h 1 @ W 2 + b 2 ) h_2=relu(h_1@W_2+b_2) h2=relu(h1@W2+b2)
  • o u t = r e l u ( h 2 @ W 3 + b 3 ) out=relu(h_2@W_3+b_3) out=relu(h2@W3+b3)

8. Particularly

(1) X : [ v 1 , v 2 , … , v 7 84 ] X:[v_1,v_2,…,v_784] X:[v1,v2,,v784]

  • X:[1,784]

(2) h 1 = r e l u ( X @ W 1 + b 1 ) h_1=relu(X@W_1+b_1) h1=relu(X@W1+b1)

  • W_1:[784,512]
    → [ 1 , 784 ] @ [ 784 , 512 ] + [ 512 ] = [ 1 , 512 ] + [ 512 ] = [ 1 , 512 ] \to [1,784]@[784,512]+[512]=[1,512]+[512]=[1,512] [1,784]@[784,512]+[512]=[1,512]+[512]=[1,512]
  • b 1 : [ 1 , 512 ] b_1:[1,512] b1:[1,512]

(3) h 2 = r e l u ( h 1 @ W 2 + b 2 ) h_2=relu(h_1@W_2+b_2) h2=relu(h1@W2+b2)

  • W_2:[512,256]
    → [ 1 , 512 ] @ [ 512 , 256 ] + [ 256 ] = [ 1 , 256 ] + [ 256 ] = [ 1 , 256 ] \to [1,512]@[512,256]+[256]=[1,256]+[256]=[1,256] [1,512]@[512,256]+[256]=[1,256]+[256]=[1,256]
  • b 2 : [ 1 , 256 ] b_2:[1,256] b2:[1,256]

(4) o u t = r e l u ( h 2 @ W 3 + b 3 ) out=relu(h_2@W_3+b_3) out=relu(h2@W3+b3)

  • W_3:[256,10]
    → [ 1 , 256 ] @ [ 256 , 10 ] + [ 10 ] = [ 1 , 10 ] + [ 10 ] = [ 1 , 10 ] \to [1,256]@[256,10]+[10]=[1,10]+[10]=[1,10] [1,256]@[256,10]+[10]=[1,10]+[10]=[1,10]
  • b 3 : [ 1 , 10 ] b_3:[1,10] b3:[1,10]
    从以上计算过程可以看出,神经网络其实是一个降维的过程,图片由原来的 [ 1 , 784 ] [1,784] [1,784]降为 [ 1 , 512 ] [1,512] [1,512],再降为 [ 1 , 256 ] [1,256] [1,256],最后降为 [ 1 , 10 ] [1,10] [1,10]
    深度学习(3)手写数字识别问题_第6张图片
    → [ 0 , 0 , 0.01 , 0.1 , 0.8 , 0 , … ] \to [0,0,0.01,0.1,0.8,0,…] [0,0,0.01,0.1,0.8,0,]
    根据以上输出可以判断这张图片为“5”的概率最大,所以这张图片的预测值为“5”。

9. 如何训练模型? → \to Loss

  • out:[1,10]
    → \to
  • Y/label: 0~9
    • eg.: 1 → \to [0,1,0,0,0,0,0,0,0,0]
    • eg.: 3 → \to [0,0,0,1,0,0,0,0,0,0]

深度学习(3)手写数字识别问题_第7张图片
→ \to

  • Euclidean Distance(欧式距离): o u t → L a b e l out \to Label outLabel
    • MSE,即 ∑ ( y − o u t ) 2 \sum(y-out)^2 (yout)2

10. 总结

  • o u t = r e l u { r e l u { r e l u [ X @ W 1 + b 1 ] @ W 2 + b 2 } @ W 3 + b 3 } out=relu\{relu\{relu[X@W_1+b_1]@W_2+b_2\}@W_3+b_3\} out=relu{relu{relu[X@W1+b1]@W2+b2}@W3+b3}
  • p r e d = a r g m a x ( o u t ) pred=argmax(out) pred=argmax(out)
  • l o s s = M S E ( o u t , l a b e l ) loss=MSE(out,label) loss=MSE(out,label)
  • m i n i m i z e l o s s minimize loss minimizeloss
    • [ W 1 ′ , b 1 ′ , W 2 ′ , b 2 ′ , W 3 ′ , b 3 ′ ] [W_1',b_1',W_2',b_2',W_3',b_3'] [W1,b1,W2,b2,W3,b3]

11. Deep Learning?

  • We have not seen it.
  • But we already master it.
  • We will show you It’s(almost)Deep Learning!

12. Classification Procedure

  • Step1. Compute h 1 , h 2 , o u t h_1,h_2,out h1,h2,out
  • Step2. Compute L o s s Loss Loss
  • Step3. Compute gradient and update [ W 1 ′ , b 1 ′ , W 2 ′ , b 2 ′ , W 3 ′ , b 3 ′ ] [W_1',b_1',W_2',b_2',W_3',b_3'] [W1,b1,W2,b2,W3,b3]
  • Step4. Loop

13. We need TensorFlow

  • 数据量庞大;
  • TensorFlow计算和处理更快。
    深度学习(3)手写数字识别问题_第8张图片

14. Next

  • Step1. have fun on MNIST classification
  • Step2. and we learn TensorFlow
  • Step3. and we implement Step1. by ourselves!

参考文献:
[1] 龙良曲:《深度学习与TensorFlow2入门实战》

你可能感兴趣的:(深度学习,深度学习,tensorflow)