cs231n lecture5 CNN

  • CNN 笔记
      • Convolution Layer
      • Pooling Layer
      • Fully Connected Layer(FC layer)
  • Useful Notes
      • Preprocessing
      • Weigth Initialization
      • Regularization
      • Loss
          • classification
          • Attribute classification
          • regression
      • Summary
      • Later
  • Todos

CNN 笔记

  • detection: bounding box
  • segementation: pixel by pixel

Convolution Layer

  • convolve the filter with the image(dot products)
  • extend the full depth
  • first stretch filter to a vector(5*5*3 -> 1*75), then do dot products
  • 实际上把filter放在图像上,做一个点对点的乘积,结果就是中心的点的值
  • 不是信号处理的convolve
  • a set of multiple filters(N), N activation maps
  • longer filters for deeper depth

eg:
32x32x3–> 28x28x6(6 feature map)
CONV->ReLU->CONV->ReLU->POOLING->CONV->ReLU->CONV->ReLU->POOLING->CONV->ReLU->CONV->ReLU->POOLING->FULL CONNECT

  • ConvNet is a sequence of Convolution Layers, intersperesed with activation functions

    (NF)/stride+1 ( N − F ) / s t r i d e + 1

  • common: zero pad the border

  • parameters, always have 1 bias term for each filter

Pooling Layer

  • makes the representations smaller and more manageable
  • invariance over a given region
  • downsampling, not operate on depth
  • MAX POOLING
    • common is no overlap
    • better
    • common no zero-padding
  • stride can also be used for downsampling instead
    of pooling

Fully Connected Layer(FC layer)

typical arch:
[(CONVRELU)NPOOL?]M(FCRELU)K,SOFTMAX [ ( C O N V − R E L U ) ∗ N − P O O L ? ] ∗ M − ( F C − R E L U ) ∗ K , S O F T M A X

Useful Notes

  • cs231 three network notes

Preprocessing

  • Mean subtraction: X -= np.mean(X, axis=0)
  • Normalization: X /= np.std(X, axis=0)
  • PCA: saving space and time
  • whitening:
  • any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data.

Weigth Initialization

  • small random numbers: W = 0.01* np.random.rand(D, H)
  • Calibrating the variances with 1/sqrt(n)
  • Bacth Normalization:
    • 防止梯度弥散
    • 加快训练速度

Regularization

  • L2 reularization
  • Max norm constraints
  • Dropout
  • practice:
    • use a single, global L2 regularization strength
    • with dropout (p = 0.5)

Loss

classification
  • hinge loss
  • cross-entropy loss
  • large number classes: Hierachical softmax
Attribute classification

build a binary classifier for every single attribute independently
Li=jmax(0,1yijfj) L i = ∑ j m a x ( 0 , 1 − y i j f j )

  • yij y i j is either +1 or -1 depending on whether the i-th example is labeled with the j-th attribute

or train a logistic regression classifier for every attribute independently
P(y=1x;w,b)=11+e(wTx+b)=σ(wTx+b) P ( y = 1 ∣ x ; w , b ) = 1 1 + e − ( w T x + b ) = σ ( w T x + b )
Li=jyijlog(σ(fj))+(1yij)log(1σ(fj)) L i = ∑ j y i j log ⁡ ( σ ( f j ) ) + ( 1 − y i j ) log ⁡ ( 1 − σ ( f j ) )

  • gradient is Li/fj=yijσ(fj) ∂ L i / ∂ f j = y i j − σ ( f j )
regression

Li=fyi22 L i = ‖ f − y i ‖ 2 2

  • not stable
  • softmax loss more better

Summary

  • have mean of zero, and normalize its scale to [-1, 1] along each feature
  • W using gaussian distribution with standard deviation of 2/n 2 / n , n is number of inputs to the neuron
  • L2 regularization and dropout
  • batch normalization

Later

notes3

Todos

  1. reading batch normalization
  2. reading notes3

你可能感兴趣的:(AI)