上一节使用了深度学习框架的高级API简洁实现了线性回归,下面对分类问题更感兴趣,即不是问多少,而是想知道哪一个
通常有两类
实际上,我们对这两种的界限往往和模糊,即使是硬性的,仍然使用软类别的模型
独热编码(one-hot enconding):用来表示分类数据的简单方法
y ∈ { ( 1 , 0 , 0 ) , ( 0 , 1 , 0 ) , ( 0 , 0 , 1 ) } y\in\{(1,0,0),(0,1,0),(0,0,1)\} y∈{(1,0,0),(0,1,0),(0,0,1)}
略
o = W x + b o = Wx+b o=Wx+b
可以看到,全连接层的参数是非常多的,对于任何具有d个输入和q个输出的全连接层,参数为 O ( d q ) O(dq) O(dq),太高了,为此,可以有办法减少到 O ( d q n ) O(\frac{dq}{n}) O(ndq)
为了保证输出的值为概率,需要两个条件,1 :均为正数,2 :和为1
对此,科学家采用了softmax函数
$\hat{\mathbf y} = \text{softmax}(\mathbf{o}) $其中 y j ^ = exp ( o j ) ∑ k exp ( o k ) \hat{y_j} = \cfrac{\text{exp}(o_j)}{\sum_k{\text{exp}(o_k)}} yj^=∑kexp(ok)exp(oj)
这样依旧可以比较各个结果之间的概率
softmax回归的输出本质上还是有输入特征的仿射变换决定,因此依旧是一个线性模型(linear model)
为了加速计算,通常针对小批量数据执行矢量计算。
假设读取了一个批量的样本为X,其中维度为d,批量大小为n,且假设输出中有q个类别。则
O = X W + b \mathbf {O = XW+b } O=XW+b
Y ^ = softmax ( O ) \hat {\mathbf Y} = \text{softmax}(\mathbf O) Y^=softmax(O)
这里依旧采用最大似然估计
对数似然
l ( y , y ^ ) = − ∑ j = 1 q y j log y j ^ l(\mathbf y,\mathbf{\hat y} ) = - \sum^q_{j=1} y_j \log \hat{y_j} l(y,y^)=−j=1∑qyjlogyj^
l ( y , y ^ ) = − ∑ j = 1 q y j log exp ( o j ) ∑ k exp ( o k ) l(\mathbf y,\mathbf{\hat y} ) = - \sum^q_{j=1} y_j \log\cfrac{\text{exp}(o_j)}{\sum_k{\text{exp}(o_k)}} l(y,y^)=−j=1∑qyjlog∑kexp(ok)exp(oj)
= ∑ j = 1 q y j log ∑ k = 1 q exp ( o k ) − ∑ j = 1 q y j o j = \sum^q_{j=1}y_j \log\sum^q_{k=1}\exp(o_k) - \sum^q_{j=1}{y_j o_j} =j=1∑qyjlogk=1∑qexp(ok)−j=1∑qyjoj
= log ∑ k = 1 q exp ( o k ) − ∑ j = 1 q y j o j = \log\sum^q_{k=1}\exp(o_k) - \sum^q_{j=1}{y_j o_j} =logk=1∑qexp(ok)−j=1∑qyjoj
则导数有
∂ o j l ( y , y ^ ) = exp ( o j ) ∑ k = 1 q exp ( o k ) − y j = softmax ( o ) j − y j \partial_{o_j}l(\mathbf y,\mathbf{\hat y} ) =\cfrac{\exp(o_j)}{\sum^q_{k=1}{\exp(o_k)}}-y_j = \text{softmax}(\mathbf o)_j-y_j ∂ojl(y,y^)=∑k=1qexp(ok)exp(oj)−yj=softmax(o)j−yj
现在通过 l表示损失,此损失称为交叉熵损失(cross-entropy loss)
下面进行更多介绍
跳过
精度等于正确预测数与预测总数之间的比率
%matplotlib inline
import torch
import torchvision
from torch.utils import data
from torchvision import transforms
import d2l
d2l.use_svg_display()
我们可以通过内置函数将Fashion-MNIST数据集下载并读取到内存中
trans = transforms.ToTensor()
should_download = 0
mnist_train = torchvision.datasets.FashionMNIST(
root = "D:/yuansWork/code/python/jupyterFile/data",train = True,transform = trans,download = should_download)
mnist_test = torchvision.datasets.FashionMNIST(
root = "D:/yuansWork/code/python/jupyterFile/data",train = False,transform = trans,download = should_download)
len(mnist_train),len(mnist_test)
(60000, 10000)
每个输入图像的高度和宽度均为28像素。数据集由灰度图像组成,通道数为1。下面为了简单起见,将高度h像素、宽度w像素的形状记为(h,w)
a = mnist_train[0]
print(len(a))
b = a[0]
label = a[1]
print(b.size())
print(label)
2
torch.Size([1, 28, 28])
9
def get_fashion_mnist_labels(labels):
text_labels = ['Ts','Kz','ts','Lyq','wt','lx','cs','ydx','b','dx']
return [text_labels[int(i)] for i in labels]
def show_images(imgs,num_rows,num_cols,titles = None ,scale = 1.5):
figsize = (num_cols*scale,num_rows*scale)
_,axes = d2l.plt.subplots(num_rows,num_cols,figsize = figsize)
axes = axes.flatten()
for i,(ax,img) in enumerate(zip(axes,imgs)):
if torch.is_tensor(img):
ax.imshow(img.numpy())
else:
ax.imshow(img)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
if titles:
ax.set_title(titles[i])
return axes
X,y = next(iter(data.DataLoader(mnist_train,batch_size = 18)))
show_images(X.reshape(18,28,28),2,9,titles = get_fashion_mnist_labels(y));
通过内置数据迭代器,我们可以随机打乱所有样本,从而无偏见地读取小批量
batch_size = 256
def get_dataloader_workers():
# 使用4个进程读取数据
return 4
train_iter = data.DataLoader(mnist_train,batch_size,shuffle=True,num_workers = get_dataloader_workers())
timer = d2l.Timer()
for X,y in train_iter:
continue
print(f'{timer.stop():.2f}sec')
5.16sec
从上面的例子可以看出,定义iter是不耗时间的,只有读取是耗时间的
def load_data_fashion_mnist(batch_size , resize = None):
trans = [transforms.ToTensor()]
if resize:
trans.insert(0,transforms.Resize(resize))
trans = transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST(
root = "D:/yuansWork/code/python/jupyterFile/data",train = True,transform = trans,download = should_download)
mnist_test = torchvision.datasets.FashionMNIST(
root = "D:/yuansWork/code/python/jupyterFile/data",train = False,transform = trans,download = should_download)
return (data.DataLoader(mnist_train,batch_size,shuffle = True,
num_workers = get_dataloader_workers()),
data.DataLoader(mnist_test,batch_size,shuffle = False,
num_workers = get_dataloader_workers()))
train_iter,test_iter = load_data_fashion_mnist(32,resize = 64)
for X,y in train_iter:
print(X.shape,X.dtype,y.shape,y.dtype)
break
torch.Size([32, 1, 64, 64]) torch.float32 torch.Size([32]) torch.int64
当batch_size 过小时,确实会影响读取性能,时间从4s增加到22s