micheal@Computer:~$ cd assignment1
micheal@Computer:~/assignment1$ source .env/bin/activate
(.env) micheal@Computer:~/assignment1$ jupyter notebook
# Softmax exercise
*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*
This exercise is analogous to the SVM exercise. You will:
#这个作业中需要完成的任务
- implement a fully-vectorized **loss function** for the Softmax classifier
- implement the fully-vectorized expression for its **analytic gradient**
- **check your implementation** with numerical gradient
- use a validation set to **tune the learning rate and regularization** strength
- **optimize** the loss function with **SGD**
- **visualize** the final learned weights
from __future__ import print_function
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
#from __future__ import print_function 需要在第一行
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
#运行一些程序设定,图片的尺寸,插值方式,背景颜色
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
"""
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the linear classifier. These are the same steps as we used for the
SVM, but condensed to a single function.
"""
# Load the raw CIFAR-10 data
cifar10_dir = 'CIFAR10' # 如果默认路径读取不出文件,那么可以下载到本地。
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
Train data shape: (50000, 32, 32, 3)
Train labels shape: (50000,)
Test data shape: (10000, 32, 32, 3)
Test labels shape: (10000,)
# subsample the data
# range(49000,49000 + 1000)
# list(range(49000,49000 + 1000)) = [49000, 49001, ... , 49999]
mask = list(range(num_training, num_training + num_validation))
# 选取 原生50000张训练图片的最后1000张 作为 valdation set 验证图片集
X_val = X_train[mask]
# 选取 原生50000张训练图片的最后1000张图片的标签 作为 valdation set 验证图片集的标签
y_val = y_train[mask]
mask = list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]
mask = list(range(num_test))
X_test = X_test[mask]
y_test = y_test[mask]
np.random.choice(49000,500,replace = False) # 从0, 1, 2, ... , 48999中选取 500 个不同(False)的数
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
print('Train data shape: ', X_train.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)
Train data shape: (49000, 3072)
Test data shape: (1000, 3072)
dev data shape: (500, 3072)
均值是指:num_train = 49000 张图片的每个通道(RGB)的每个像素(Pixle),分别求均值。
# Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis = 0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image
print('mean_image shape: ', mean_image.shape)
print('Validation data shape: ', X_val.shape)
print('Train data shape: ', X_train.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)
mean_image shape: (3072,)
Validation data shape: (1000, 3072)
Train data shape: (49000, 3072)
Test data shape: (1000, 3072)
dev data shape: (500, 3072)
np.hstack和np.vstack是拼接数组的两种方法,细节可以查看[np.vstack()和np.hstack()的用法](https://blog.csdn.net/m0_37393514/article/details/79538748)
# add bias dimension and transform into columns
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev
# Invoke the above function to get our data. 使用上面的函数来得到数据
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)
print('dev labels shape: ', y_dev.shape)
```output```
Train data shape: (49000, 3073)
Train labels shape: (49000,)
Validation data shape: (1000, 3073)
Validation labels shape: (1000,)
Test data shape: (1000, 3073)
Test labels shape: (1000,)
dev data shape: (500, 3073)
dev labels shape: (500,)
Your code for this section will all be written inside cs231n/classifiers/softmax.py.
# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.
from cs231n.classifiers.softmax import softmax_loss_naive
import time
# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)
# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))
#############################################################################
# TODO: Compute the softmax loss and its gradient using explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
#pass
data_loss = 0.0
(N,D) = X.shape
C = W.shape[1]
data_loss_array = np.zeros((1,N))
scores = X.dot(W)
dW = np.zeros((D,C))
for i in range(N):
scores_i = scores[i,:]
scores_i -= np.max(scores[i,:])
sum_ij = np.sum(np.exp(scores_i))
probs = lambda t : np.exp(scores_i[t]) / sum_ij
data_loss += - np.log( probs(y[i]) )
for k in range(C):
probs_k = probs(k)
dW[:, k] += (probs_k - (k == y[i])) * X[i]
data_loss /= N
loss = data_loss + reg * np.sum(W*W)
dW /= N
dW += reg*W
这里需要注意的事情是X[i]
在 j
类的得分就是 X.dot(W)[i,j]
.
需要注意的语法:
probs = lambda t : np.exp(scores_i[t]) / sum_ij
(probs_k - (k == y[i]))
cs231n作业:assignment1 - softmax
笔记:CS231n+assignment1(作业一)
cs231n作业:assignment1 - softmax 直上云霄
l o s s = d a t a _ l o s s + R e g = 1 N × ∑ i = 0 N − 1 L i + λ Reg(W) \mathtt{loss} = \mathtt{data\_loss + Reg} = \frac{1}{N}\times\sum^{N - 1}_{i = 0}L_i + \lambda \texttt{Reg(W)} loss=data_loss+Reg=N1×i=0∑N−1Li+λReg(W)
其中 L_i = − ln ⟮ e s y i − m a x ( s y i ) Σ j e e j i − m a x ( e j i ) ⟯ \texttt{L\_i}= - \texttt{ln} \lgroup \frac{e^{s_{y_i}-max(s_{y_i})}}{\Sigma_j e^{{e_j}_i-max({e_j}_i)}} \rgroup L_i=−ln⟮Σjeeji−max(eji)esyi−max(syi)⟯
∂ data_loss ∂ probs(y[i]) = − 1 probs(y[i]) \frac{\partial \texttt{data\_loss}}{\partial \texttt{probs(y[i])}} = - \frac{1}{\texttt{probs(y[i])}} ∂probs(y[i])∂data_loss=−probs(y[i])1
∂ probs(y[i]) ∂ W[:,j] = 1 ∑ j e s c o r e s _ i [ y [ i ] ] e s c o r e s _ i [ y [ i ] ] ∂ s c o r e s _ i [ y [ i ] ] ∂ W[:,j] + e s c o r e s _ i [ j ] − ( Σ j e s c o r e s _ i [ j ] ) 2 × ∑ j s c o r e s _ i [ j ] ⟮ e s c o r e s _ i [ j ] ∂ s c o r e s _ i [ j ] ∂ W[:,j] ⟯ \frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \frac{1}{\sum_{j}e^{scores\_i [y[i]]}}e^{scores\_i [y[i]]} \frac{\partial scores\_i [y[i]]}{\partial \texttt{W[:,j]}} \\+ \frac{e^{scores\_i [j]}}{-(\Sigma_{j}e^{scores\_i [j]})^2} \times \sum_j^{scores\_i [j]} \lgroup{e^{scores\_i [j]}\frac{\partial scores\_i [j]}{\partial \texttt{W[:,j]}} \rgroup} ∂W[:,j]∂probs(y[i])=∑jescores_i[y[i]]1escores_i[y[i]]∂W[:,j]∂scores_i[y[i]]+−(Σjescores_i[j])2escores_i[j]×j∑scores_i[j]⟮escores_i[j]∂W[:,j]∂scores_i[j]⟯
∂ scores_i [j] ∂ W[:,j] = ∂ X[i]W[:,y[i]] ∂ W[:,j] = X[i] , ( y[i] == j ) ∂ scores_i [j] ∂ W[:,j] = ∂ X[i]W[:,y[i]] ∂ W[:,j] = 0 , ( y[i] != j ) \frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = \texttt{X[i]}\:\:,\:\: (\texttt{y[i] == j}) \\ \frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = 0\:\:,\:\: (\texttt{y[i] != j}) ∂W[:,j]∂scores_i [j]=∂W[:,j]∂X[i]W[:,y[i]]=X[i],(y[i] == j)∂W[:,j]∂scores_i [j]=∂W[:,j]∂X[i]W[:,y[i]]=0,(y[i] != j)
∂ probs(y[i]) ∂ W[:,j] = (probs[j[i]] - (y[i] == j))X[i] \frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \texttt{(probs[j[i]] - (y[i] == j))X[i]} ∂W[:,j]∂probs(y[i])=(probs[j[i]] - (y[i] == j))X[i]