CNN结构
CNN图层
VGG-16
CNN由处理视觉信息的层组成。CNN首先接收输入图像,然后将其传递通过这些层。有几种不同类型的层:最常用的层:卷积,池化和完全连接的层。
首先,让我们来看看完整的CNN架构; 下面是一个名为VGG-16的网络,它经过培训可识别各种图像类别。它接收图像作为输入,并输出该图像的预测类。
在PyTorch定义图层
对于卷积神经网络,由一些列简单的层组成:
- 卷积层
- 最大池化层
- 全连接(线性)层
要在PyTorch定义神经网络,创建并命名一个新的神经网络类,在函数init中定义网络层。
注意:在训练期间,PyTorch将能够通过跟踪网络的前馈行为并使用autograd来计算网络中权重的更新来执行反向传播。
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self, n_classes):
super(Net, self).__init__()
# 1 input image channel (grayscale), 32 output channels/feature maps
# 5x5 square convolution kernel
self.conv1 = nn.Conv2d(1, 32, 5)
# maxpool layer
# pool with kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)
# fully-connected layer
# 32*4 input size to account for the downsampled image size after pooling
# num_classes outputs (for n_classes of image data)
self.fc1 = nn.Linear(32*4, n_classes)
# define the feedforward behavior
def forward(self, x):
# one conv/relu + pool layers
x = self.pool(F.relu(self.conv1(x)))
# prep for linear layer by flattening the feature maps into feature vectors
x = x.view(x.size(0), -1)
# linear layer
x = F.relu(self.fc1(x))
# final output
return x
# instantiate and print your Net
n_classes = 20 # example number of classes
net = Net(n_classes)
print(net)
可视化四个滤波器输出
mport cv2
import matplotlib.pyplot as plt
%matplotlib inline
# TODO: Feel free to try out your own images here by changing img_path
# to a file path to another image on your computer!
img_path = 'images/udacity_sdc.png'
# load color image
bgr_img = cv2.imread(img_path)
# convert to grayscale
gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)
# normalize, rescale entries to lie in [0,1]
gray_img = gray_img.astype("float32")/255
# plot image
plt.imshow(gray_img, cmap='gray')
plt.show()
定义滤波器
import numpy as np
## TODO: Feel free to modify the numbers here, to try out another filter!
filter_vals = np.array([[-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1], [-1, -1, 1, 1]])
print('Filter shape: ', filter_vals.shape)
# define four filters
filter_1 = filter_vals
filter_2 = -filter_1
filter_3 = filter_1.T
filter_4 = -filter_3
filters = np.array([filter_1, filter_2, filter_3, filter_4])
# For an example, print out the values of filter 1
print('Filter 1: \n', filter_1)
### do not modify the code below this line ###
# visualize all four filters
fig = plt.figure(figsize=(10, 5))
for i in range(4):
ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
ax.imshow(filters[i], cmap='gray')
ax.set_title('Filter %s' % str(i+1))
width, height = filters[i].shape
for x in range(width):
for y in range(height):
ax.annotate(str(filters[i][x][y]), xy=(y,x),
horizontalalignment='center',
verticalalignment='center',
color='white' if filters[i][x][y]<0 else 'black')
初始化单个卷积层,使其包含创建的所有过滤器
import torch
import torch.nn as nn
import torch.nn.functional as F
# define a neural network with a single convolutional layer with four filters
class Net(nn.Module):
def __init__(self, weight):
super(Net, self).__init__()
# initializes the weights of the convolutional layer to be the weights of the 4 defined filters
k_height, k_width = weight.shape[2:]
# assumes there are 4 grayscale filters
self.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False)
self.conv.weight = torch.nn.Parameter(weight)
def forward(self, x):
# calculates the output of a convolutional layer
# pre- and post-activation
conv_x = self.conv(x)
activated_x = F.relu(conv_x)
# returns both layers
return conv_x, activated_x
# instantiate the model and set the weights
weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor)
model = Net(weight)
# print out the layer in the network
print(model)
可视化每个过滤器的输出¶
首先,我们将定义一个辅助函数viz_layer,它接受一个特定的图层和多个过滤器(可选参数),并在图像通过后显示该图层的输出。
# helper function for visualizing the output of a given layer
# default number of filters is 4
def viz_layer(layer, n_filters= 4):
fig = plt.figure(figsize=(20, 20))
for i in range(n_filters):
ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[])
# grab layer outputs
ax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')
ax.set_title('Output %s' % str(i+1))
# plot original image
plt.imshow(gray_img, cmap='gray')
# visualize all filters
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):
ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])
ax.imshow(filters[i], cmap='gray')
ax.set_title('Filter %s' % str(i+1))
# convert the image into an input Tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)
# get the convolutional layer (pre and post activation)
conv_layer, activated_layer = model(gray_img_tensor)
# visualize the output of a conv layer
viz_layer(conv_layer)
# visualize the output of an activated conv layer
viz_layer(activated_layer)
池化层
在几个卷积层(+ ReLu)之后,在VGG-16网络中,会有一个最大化层。
- 池化层会获取图像(通常是过滤后的图像)并输出该图像的缩小版本
- 池化层会降低输入的维度
- 最大池化层会查看输入图像中的区域(如下图所示的4x4像素区域),并选择在新的缩小区域中保留该区域中的最大像素值。
-
Maxpooling是CNN中最常见的池化层类型,但也有其他类型,如平均池化。
全连接层
在一系列卷积和池化层之后的完全连接的层。注意它们的扁平形状
完全连接层的工作是将它看到的输入连接到所需的输出形式。通常,这意味着将图像特征矩阵转换为尺寸为1xC的特征向量,其中C是类的数量。例如,假设我们将图像分类为十个类,您可以为完全连接的图层提供一组[池化,激活]特征图作为输入,并告诉它使用这些特征的组合(将它们相乘,添加它们,将它们组合起来等)输出10项长特征向量。该向量将来自特征映射的信息压缩为单个特征向量。
softmax
网络中看到的最后一层是SOFTMAX功能。softmax函数可以将任何值向量作为输入,并返回相同长度的向量,其值均在范围(0,1)中,并且这些值一起将加起来为1
Train in PyTorch
一旦你加载了一个训练数据集,接下来你的工作将是定义一个CNN网并训练它来对这组图像进行分类。
Loss and Optimizer
要训练一个模型,你需要通过选择一个损失函数和优化器来定义它是如何训练的。这些函数决定了模型在其训练时如何更新其参数,并且可以影响模型收敛的速度。
对于这样的分类问题,通常使用交叉熵损失,这可以在如下代码中定义:criterion = nn.CrossEntropyLoss()。PyTorch还包括一些标准随机优化,如随机梯度下降和亚当。鼓励你尝试不同的优化器,看看你的模型是如何响应这些选择的。
通常,我们通过训练数据集来训练多个时期或周期的任何网络。 下面是训练函数在迭代训练数据集时执行的步骤:
- 准备所有输入图像和标签数据以进行培训
- 通过网络传递输入(正向传递)
- 计算损失(预测类与正确标签的距离)
- 将梯度传播回网络参数(向后传递)
- 更新权重(参数更新)
CNN for Classification
在 _ init_ 定义层
conv / pool层可以像这样定义(在init中):
# 1 input image channel (for grayscale images), 32 output channels/feature maps, 3x3 square convolution kernel
self.conv1 = nn.Conv2d(1, 32, 3)
# 通道1,输出32,卷积核3*3
# maxpool that uses a square window of kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)
# our basic libraries
import torch
import torchvision
# data loading and transforming
from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader
from torchvision import transforms
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors for input into a CNN
## Define a transform to read the data in as a tensor
data_transform = transforms.ToTensor()
# choose the training and test datasets
train_data = FashionMNIST(root='./data', train=True,
download=True, transform=data_transform)
test_data = FashionMNIST(root='./data', train=False,
download=True, transform=data_transform)
# Print out some stats about the training and test data
print('Train data, number of images: ', len(train_data))
print('Test data, number of images: ', len(test_data))
# prepare data loaders, set the batch_size
## TODO: you can try changing the batch_size to be larger or smaller
## when you get to training your network, see how batch_size affects the loss
batch_size = 20
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
该单元格遍历训练数据集,使用dataiter.next()加载随机批次的图像/标签数据。 然后,它在2 x batch_size / 2网格中绘制一批图像和标签。
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(images[idx]), cmap='gray')
ax.set_title(classes[labels[idx]])
对于卷积神经网络,我们将使用一系列简单的层:
- 卷积层
- 最大池化层数
- 完全连接(线性)层
flattening
回想一下,要从卷积/池化层的输出移动到线性层,必须先将提取的特征展平为矢量。 如果您使用了深度学习库Keras,您可能已经看过Flatten()完成此操作,而在PyTorch中,您可以使用x = x.view(x.size(0), - 1)展平输入x。
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel (grayscale), 10 output channels/feature maps
# 3x3 square convolution kernel
## output size = (W-F)/S +1 = (28-3)/1 +1 = 26
# the output Tensor for one image, will have the dimensions: (10, 26, 26)
# after one pool layer, this becomes (10, 13, 13)
self.conv1 = nn.Conv2d(1, 10, 3)
# maxpool layer
# pool with kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)
# second conv layer: 10 inputs, 20 outputs, 3x3 conv
## output size = (W-F)/S +1 = (13-3)/1 +1 = 11
# the output tensor will have dimensions: (20, 11, 11)
# after another pool layer this becomes (20, 5, 5); 5.5 is rounded down
self.conv2 = nn.Conv2d(10, 20, 3)
# 20 outputs * the 5*5 filtered/pooled map size
# 10 output channels (for the 10 classes)
self.fc1 = nn.Linear(20*5*5, 10)
# define the feedforward behavior
def forward(self, x):
# two conv/relu + pool layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
# prep for linear layer
# flatten the inputs into a vector
x = x.view(x.size(0), -1)
# one linear layer
x = F.relu(self.fc1(x))
# a softmax layer to convert the 10 outputs into a distribution of class scores
x = F.log_softmax(x, dim=1)
# final output
return x
# instantiate and print your Net
net = Net()
print(net)
TODO: Specify the loss function and optimizer
import torch.optim as optim
## TODO: specify loss function
# cross entropy loss combines softmax and nn.NLLLoss() in one single class.
criterion = nn.NLLLoss()
## TODO: specify optimizer
# stochastic gradient descent with a small learning rate
optimizer = optim.SGD(net.parameters(), lr=0.001)
A note on accuracy
before training
correct = 0
total = 0
# Iterate through test dataset
for images, labels in test_loader:
# forward pass to get outputs
# the outputs are a series of class scores
outputs = net(images)
# get the predicted class from the maximum value in the output-list of class scores
_, predicted = torch.max(outputs.data, 1)
# count up total number of correct labels
# for which the predicted and true labels are equal
total += labels.size(0)
correct += (predicted == labels).sum()
# calculate the accuracy
# to convert `correct` from a Tensor into a scalar, use .item()
accuracy = 100.0 * correct.item() / total
# print it out!
print('Accuracy before training: ', accuracy)
Here are the steps that this training function performs as it iterates over the training dataset:
- Zero's the gradients to prepare for a forward pass
- Passes the input through the network (forward pass)
- Computes the loss (how far is the predicted classes are from the correct labels)
- Propagates gradients back into the network’s parameters (backward pass)
- Updates the weights (parameter update)
- Prints out the calculated loss
def train(n_epochs):
loss_over_time = [] # to track the loss as the network trains
for epoch in range(n_epochs): # loop over the dataset multiple times
running_loss = 0.0
for batch_i, data in enumerate(train_loader):
# get the input images and their corresponding labels
inputs, labels = data
# zero the parameter (weight) gradients
optimizer.zero_grad()
# forward pass to get outputs
outputs = net(inputs)
# calculate the loss
loss = criterion(outputs, labels)
# backward pass to calculate the parameter gradients
loss.backward()
# update the parameters
optimizer.step()
# print loss statistics
# to convert loss into a scalar and add it to running_loss, we use .item()
running_loss += loss.item()
if batch_i % 1000 == 999: # print every 1000 batches
avg_loss = running_loss/1000
# record and print the avg loss over the 1000 batches
loss_over_time.append(avg_loss)
print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, avg_loss))
running_loss = 0.0
print('Finished Training')
return loss_over_time
# define the number of epochs to train for
n_epochs = 30 # start small to see if your model works, initially
# call train and record the loss over time
training_loss = train(n_epochs)
Visualizing the loss
# visualize the loss as the network trained
plt.plot(training_loss)
plt.xlabel('1000\'s of batches')
plt.ylabel('loss')
plt.ylim(0, 2.5) # consistent scale
plt.show()
Test
# initialize tensor and lists to monitor test loss and accuracy
test_loss = torch.zeros(1)
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
# set the module to evaluation mode
net.eval()
for batch_i, data in enumerate(test_loader):
# get the input images and their corresponding labels
inputs, labels = data
# forward pass to get outputs
outputs = net(inputs)
# calculate the loss
loss = criterion(outputs, labels)
# update average test loss
test_loss = test_loss + ((torch.ones(1) / (batch_i + 1)) * (loss.data - test_loss))
# get the predicted class from the maximum value in the output-list of class scores
_, predicted = torch.max(outputs.data, 1)
# compare predictions to true label
# this creates a `correct` Tensor that holds the number of correctly classified images in a batch
correct = np.squeeze(predicted.eq(labels.data.view_as(predicted)))
# calculate test accuracy for *each* object class
# we get the scalar value of correct items for a class, by calling `correct[i].item()`
for i in range(batch_size):
label = labels.data[i]
class_correct[label] += correct[i].item()
class_total[label] += 1
print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0]))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
classes[i], 100 * class_correct[i] / class_total[i],
np.sum(class_correct[i]), np.sum(class_total[i])))
else:
print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total)))
保存模型
# Saving the model
model_dir = 'saved_models/'
model_name = 'fashion_net_simple.pt'
# after training, save your model parameters in the dir 'saved_models'
# when you're ready, un-comment the line below
torch.save(net.state_dict(), model_dir+model_name)
可视化feature map
Load the data
# our basic libraries
import torch
import torchvision
# data loading and transforming
from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader
from torchvision import transforms
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors for input into a CNN
## Define a transform to read the data in as a tensor
data_transform = transforms.ToTensor()
test_data = FashionMNIST(root='./data', train=False,
download=True, transform=data_transform)
# Print out some stats about the test data
print('Test data, number of images: ', len(test_data))
# prepare data loaders, set the batch_size
## TODO: you can try changing the batch_size to be larger or smaller
## when you get to training your network, see how batch_size affects the loss
batch_size = 20
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)
# specify the image classes
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# obtain one batch of training images
dataiter = iter(test_loader)
images, labels = dataiter.next()
images = images.numpy()
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(batch_size):
ax = fig.add_subplot(2, batch_size/2, idx+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(images[idx]), cmap='gray')
ax.set_title(classes[labels[idx]])
Define the network architecture
The various layers that make up any neural network are documented, here. For a convolutional neural network, we'll use a simple series of layers:
- Convolutional layers
- Maxpooling layers
- Fully-connected (linear) layers
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel (grayscale), 10 output channels/feature maps
# 3x3 square convolution kernel
## output size = (W-F)/S +1 = (28-3)/1 +1 = 26
# the output Tensor for one image, will have the dimensions: (10, 26, 26)
# after one pool layer, this becomes (10, 13, 13)
self.conv1 = nn.Conv2d(1, 10, 3)
# maxpool layer
# pool with kernel_size=2, stride=2
self.pool = nn.MaxPool2d(2, 2)
# second conv layer: 10 inputs, 20 outputs, 3x3 conv
## output size = (W-F)/S +1 = (13-3)/1 +1 = 11
# the output tensor will have dimensions: (20, 11, 11)
# after another pool layer this becomes (20, 5, 5); 5.5 is rounded down
self.conv2 = nn.Conv2d(10, 20, 3)
# 20 outputs * the 5*5 filtered/pooled map size
self.fc1 = nn.Linear(20*5*5, 50)
# dropout with p=0.4
self.fc1_drop = nn.Dropout(p=0.4)
# finally, create 10 output channels (for the 10 classes)
self.fc2 = nn.Linear(50, 10)
# define the feedforward behavior
def forward(self, x):
# two conv/relu + pool layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
# prep for linear layer
# this line of code is the equivalent of Flatten in Keras
x = x.view(x.size(0), -1)
# two linear layers with dropout in between
x = F.relu(self.fc1(x))
x = self.fc1_drop(x)
x = self.fc2(x)
# final output
return x
# instantiate your Net
net = Net()
# load the net parameters by name
net.load_state_dict(torch.load('saved_models/fashion_net_ex.pt'))
print(net)
import torch.optim as optim
## TODO: specify loss function
# using cross entropy whcih combines softmax and NLL loss
criterion = nn.CrossEntropyLoss()
## TODO: specify optimizer
# stochastic gradient descent with a small learning rate AND some momentum
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# Calculate accuracy before training
correct = 0
total = 0
# Iterate through test dataset
for images, labels in test_loader:
# forward pass to get outputs
# the outputs are a series of class scores
outputs = net(images)
# get the predicted class from the maximum value in the output-list of class scores
_, predicted = torch.max(outputs.data, 1)
# count up total number of correct labels
# for which the predicted and true labels are equal
total += labels.size(0)
correct += (predicted == labels).sum()
# calculate the accuracy
# to convert `correct` from a Tensor into a scalar, use .item()
accuracy = 100.0 * correct.item() / total
# print it out!
print('Accuracy before training: ', accuracy)
Train
def train(n_epochs):
loss_over_time = [] # to track the loss as the network trains
for epoch in range(n_epochs): # loop over the dataset multiple times
running_loss = 0.0
for batch_i, data in enumerate(train_loader):
# get the input images and their corresponding labels
inputs, labels = data
# zero the parameter (weight) gradients
optimizer.zero_grad()
# forward pass to get outputs
outputs = net(inputs)
# calculate the loss
loss = criterion(outputs, labels)
# backward pass to calculate the parameter gradients
loss.backward()
# update the parameters
optimizer.step()
# print loss statistics
# to convert loss into a scalar and add it to running_loss, we use .item()
running_loss += loss.item()
if batch_i % 1000 == 999: # print every 1000 batches
avg_loss = running_loss/1000
# record and print the avg loss over the 1000 batches
loss_over_time.append(avg_loss)
print('Epoch: {}, Batch: {}, Avg. Loss: {}'.format(epoch + 1, batch_i+1, avg_loss))
running_loss = 0.0
print('Finished Training')
return loss_over_time
# define the number of epochs to train for
n_epochs = 30 # start small to see if your model works, initially
# call train
training_loss = train(n_epochs)
可视化LOSS
# visualize the loss as the network trained
plt.plot(training_loss)
plt.xlabel('1000\'s of batches')
plt.ylabel('loss')
plt.ylim(0, 2.5) # consistent scale
plt.show()
Test
# initialize tensor and lists to monitor test loss and accuracy
test_loss = torch.zeros(1)
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
# set the module to evaluation mode
net.eval()
for batch_i, data in enumerate(test_loader):
# get the input images and their corresponding labels
inputs, labels = data
# forward pass to get outputs
outputs = net(inputs)
# calculate the loss
loss = criterion(outputs, labels)
# update average test loss
test_loss = test_loss + ((torch.ones(1) / (batch_i + 1)) * (loss.data - test_loss))
# get the predicted class from the maximum value in the output-list of class scores
_, predicted = torch.max(outputs.data, 1)
# compare predictions to true label
# this creates a `correct` Tensor that holds the number of correctly classified images in a batch
correct = np.squeeze(predicted.eq(labels.data.view_as(predicted)))
# calculate test accuracy for *each* object class
# we get the scalar value of correct items for a class, by calling `correct[i].item()`
for i in range(batch_size):
label = labels.data[i]
class_correct[label] += correct[i].item()
class_total[label] += 1
print('Test Loss: {:.6f}\n'.format(test_loss.numpy()[0]))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
classes[i], 100 * class_correct[i] / class_total[i],
np.sum(class_correct[i]), np.sum(class_total[i])))
else:
print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))
print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
100. * np.sum(class_correct) / np.sum(class_total),
np.sum(class_correct), np.sum(class_total)))
# Saving the model
model_dir = 'saved_models/'
model_name = 'fashion_net_ex.pt'
# after training, save your model parameters in the dir 'saved_models'
# when you're ready, un-comment the line below
torch.save(net.state_dict(), model_dir+model_name)