2017CS231n Assignment1 Softmax

进入Jupyer Notebook (Linux Ubuntu)

micheal@Computer:~$ cd assignment1
micheal@Computer:~/assignment1$ source .env/bin/activate
(.env) micheal@Computer:~/assignment1$ jupyter notebook
# Softmax exercise

*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*

This exercise is analogous to the SVM exercise. You will:
#这个作业中需要完成的任务
- implement a fully-vectorized **loss function** for the Softmax classifier
- implement the fully-vectorized expression for its **analytic gradient**
- **check your implementation** with numerical gradient
- use a validation set to **tune the learning rate and regularization** strength
- **optimize** the loss function with **SGD**
- **visualize** the final learned weights

In[1] 准备 Preparation

from __future__ import print_function
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

#from __future__ import print_function 需要在第一行

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
#运行一些程序设定,图片的尺寸,插值方式,背景颜色

In[2] 读取数据和预处理 Reading Data & Preprocessing

#1 读取数据和标签,然后输出数据的尺寸。

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the linear classifier. These are the same steps as we used for the
    SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'CIFAR10' # 如果默认路径读取不出文件,那么可以下载到本地。
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)    
打印查看读取的原生数据的形状shape
    print('Train data shape: ', X_train.shape)
    print('Train labels shape: ', y_train.shape)
    print('Test data shape: ', X_test.shape)
    print('Test labels shape: ', y_test.shape)
    Train data shape:  (50000, 32, 32, 3)
    Train labels shape:  (50000,)
    Test data shape:  (10000, 32, 32, 3)
    Test labels shape:  (10000,)

#2 subsample the data 对原生数据采样

从50000张图片中,选取1000张作为测试数据 X_val, y_val
    # subsample the data
    # range(49000,49000 + 1000)
    # list(range(49000,49000 + 1000)) = [49000, 49001, ... , 49999]
    mask = list(range(num_training, num_training + num_validation))
    # 选取 原生50000张训练图片的最后1000张 作为 valdation set 验证图片集
    X_val = X_train[mask]
    # 选取 原生50000张训练图片的最后1000张图片的标签 作为 valdation set 验证图片集的标签
    y_val = y_train[mask]
更新训练数据 X_train, y_train
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
从10000张图片中选取num_test = 1000张图片作为测试集
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]
从num_training = 49000中,随机选取num_dev = 500张图片作为dev集
np.random.choice(49000,500,replace = False) # 从0, 1, 2, ... , 48999中选取 500 个不同(False)的数
    mask = np.random.choice(num_training, num_dev, replace=False)
    X_dev = X_train[mask]
    y_dev = y_train[mask]

#3 Reshape 数据

将每一张图片(32,32,3),拉成(1,3072)的格式,在这里可以试着打印查看Shape。
    # Preprocessing: reshape the image data into rows
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
    print('Train data shape: ', X_train.shape)
    print('Test data shape: ', X_test.shape)
    print('dev data shape: ', X_dev.shape)
    
    Train data shape:  (49000, 3072)
    Test data shape:  (1000, 3072)
    dev data shape:  (500, 3072)

#4 Normalize the Data 标准化数据

仅中心化,课堂上说了一般不除以标准差

均值是指:num_train = 49000 张图片的每个通道(RGB)的每个像素(Pixle),分别求均值。

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis = 0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_dev -= mean_image
    print('mean_image shape: ', mean_image.shape)
    print('Validation data shape: ', X_val.shape)
    print('Train data shape: ', X_train.shape)
    print('Test data shape: ', X_test.shape)
    print('dev data shape: ', X_dev.shape)
    
    mean_image shape:  (3072,)
    Validation data shape:  (1000, 3072)
    Train data shape:  (49000, 3072)
    Test data shape:  (1000, 3072)
    dev data shape:  (500, 3072)
加入非齐次偏差项Bias = (1,1, … , 1)维度
np.hstack和np.vstack是拼接数组的两种方法,细节可以查看[np.vstack()和np.hstack()的用法](https://blog.csdn.net/m0_37393514/article/details/79538748)
    # add bias dimension and transform into columns
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
    
    return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev

#5 打印查看结果

# Invoke the above function to get our data.  使用上面的函数来得到数据
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)
print('dev labels shape: ', y_dev.shape)

```output```

Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3073)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3073)
Test labels shape:  (1000,)
dev data shape:  (500, 3073)
dev labels shape:  (500,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.

In[3] 使用嵌套循环计算 损失函数 Loss Function

  • 首先使用嵌套循环 (比较初级的方法)
  • 来计算损失函数 Loss Function
  • 打开文件 s231n/classifiers/softmax.py 编辑 softmax_loss_naive function.
# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.

from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))
  • Softmax初级损失函数 Softmax_Loss_Naive Function
  #############################################################################
  # TODO: Compute the softmax loss and its gradient using explicit loops.     #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  #pass
  data_loss = 0.0
  (N,D) = X.shape
  C = W.shape[1]
  data_loss_array = np.zeros((1,N))
  scores = X.dot(W)
  dW = np.zeros((D,C))
  for i in range(N):
     scores_i = scores[i,:]
     scores_i -= np.max(scores[i,:])
     sum_ij = np.sum(np.exp(scores_i))
     probs = lambda t : np.exp(scores_i[t]) / sum_ij
     data_loss += - np.log( probs(y[i]) )
     for k in range(C):
      probs_k = probs(k)
      dW[:, k] += (probs_k - (k == y[i])) * X[i]     
  data_loss /= N 
  loss = data_loss + reg * np.sum(W*W)
  dW /= N
  dW += reg*W

这里需要注意的事情是X[i]j 类的得分就是 X.dot(W)[i,j].
需要注意的语法:

  • probs = lambda t : np.exp(scores_i[t]) / sum_ij
  • (probs_k - (k == y[i]))

关于函数的推导和梯度的推导过程如下:

  • 函数
    这里通过查看网上已有的答案,都考虑的数值溢出的影响。

cs231n作业:assignment1 - softmax
笔记:CS231n+assignment1(作业一)
cs231n作业:assignment1 - softmax 直上云霄

l o s s = d a t a _ l o s s + R e g = 1 N × ∑ i = 0 N − 1 L i + λ Reg(W) \mathtt{loss} = \mathtt{data\_loss + Reg} = \frac{1}{N}\times\sum^{N - 1}_{i = 0}L_i + \lambda \texttt{Reg(W)} loss=data_loss+Reg=N1×i=0N1Li+λReg(W)
其中 L_i = − ln ⟮ e s y i − m a x ( s y i ) Σ j e e j i − m a x ( e j i ) ⟯ \texttt{L\_i}= - \texttt{ln} \lgroup \frac{e^{s_{y_i}-max(s_{y_i})}}{\Sigma_j e^{{e_j}_i-max({e_j}_i)}} \rgroup L_i=lnΣjeejimax(eji)esyimax(syi)

  • 梯度
    ∂ loss ∂ W[:,j] = ∂ loss ∂ data_loss × ∂ data_loss ∂ W[:,j] + ∂ loss ∂ Reg(W) × ∂ Reg(W) ∂ W[:,j] \frac{\partial \texttt{loss}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} \times \frac{\partial \texttt{data\_loss}}{\partial \texttt{W[:,j]}} + \frac{\partial \texttt{loss}}{\partial \texttt{Reg(W)}} \times \frac{\partial \texttt{Reg(W)}}{\partial \texttt{W[:,j]}} W[:,j]loss=data_lossloss×W[:,j]data_loss+Reg(W)loss×W[:,j]Reg(W)
    = ∂ loss ∂ data_loss × ∑ i = 0 N − 1 ⟮ ∂ data_loss ∂ probs(y[i]) × ∂ probs(y[i]) ∂ W[:,j] ⟯ + ∂ loss ∂ Reg(W) × ∂ Reg(W) ∂ W[:,j] = \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} \times \sum_{i = 0}^{N-1}\lgroup {\frac{\partial \texttt{data\_loss}}{\partial \texttt{probs(y[i])}} \times \frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} \rgroup}+ \frac{\partial \texttt{loss}}{\partial \texttt{Reg(W)}} \times \frac{\partial \texttt{Reg(W)}}{\partial \texttt{W[:,j]}} =data_lossloss×i=0N1probs(y[i])data_loss×W[:,j]probs(y[i])+Reg(W)loss×W[:,j]Reg(W)
    ∂ loss ∂ data_loss = 1 N \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} = \frac{1}{N} data_lossloss=N1

∂ data_loss ∂ probs(y[i]) = − 1 probs(y[i]) \frac{\partial \texttt{data\_loss}}{\partial \texttt{probs(y[i])}} = - \frac{1}{\texttt{probs(y[i])}} probs(y[i])data_loss=probs(y[i])1

∂ probs(y[i]) ∂ W[:,j] = 1 ∑ j e s c o r e s _ i [ y [ i ] ] e s c o r e s _ i [ y [ i ] ] ∂ s c o r e s _ i [ y [ i ] ] ∂ W[:,j] + e s c o r e s _ i [ j ] − ( Σ j e s c o r e s _ i [ j ] ) 2 × ∑ j s c o r e s _ i [ j ] ⟮ e s c o r e s _ i [ j ] ∂ s c o r e s _ i [ j ] ∂ W[:,j] ⟯ \frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \frac{1}{\sum_{j}e^{scores\_i [y[i]]}}e^{scores\_i [y[i]]} \frac{\partial scores\_i [y[i]]}{\partial \texttt{W[:,j]}} \\+ \frac{e^{scores\_i [j]}}{-(\Sigma_{j}e^{scores\_i [j]})^2} \times \sum_j^{scores\_i [j]} \lgroup{e^{scores\_i [j]}\frac{\partial scores\_i [j]}{\partial \texttt{W[:,j]}} \rgroup} W[:,j]probs(y[i])=jescores_i[y[i]]1escores_i[y[i]]W[:,j]scores_i[y[i]]+(Σjescores_i[j])2escores_i[j]×jscores_i[j]escores_i[j]W[:,j]scores_i[j]
∂ scores_i   [j] ∂ W[:,j] = ∂ X[i]W[:,y[i]] ∂ W[:,j] = X[i]    ,    ( y[i]   ==   j ) ∂ scores_i   [j] ∂ W[:,j] = ∂ X[i]W[:,y[i]] ∂ W[:,j] = 0    ,    ( y[i]   !=   j ) \frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = \texttt{X[i]}\:\:,\:\: (\texttt{y[i] == j}) \\ \frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = 0\:\:,\:\: (\texttt{y[i] != j}) W[:,j]scores_i [j]=W[:,j]X[i]W[:,y[i]]=X[i],(y[i] == j)W[:,j]scores_i [j]=W[:,j]X[i]W[:,y[i]]=0,(y[i] != j)
∂ probs(y[i]) ∂ W[:,j] = (probs[j[i]]   -   (y[i]   ==   j))X[i] \frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \texttt{(probs[j[i]] - (y[i] == j))X[i]} W[:,j]probs(y[i])=(probs[j[i]] - (y[i] == j))X[i]

你可能感兴趣的:(2017CS231n Assignment1 Softmax)