VGG Image Classification Practical
根据VGG教程里的简介,我们可以很清楚的知道关于CNN的以下知识:
卷积神经网络是一类重要的深度学习网络,适用于许多计算机视觉问题。特别是,深度CNN由几个处理层组成,每个处理层涉及线性和非线性操作符,它们是以端到端的方式共同学习的,以解决特定的任务。这些方法现在是从视听和文本数据中提取特征的主要方法。
这一实践探索了学习(深度)CNN的基本知识。
提示:以下是本篇文章正文内容,下面案例可供参考
VGG作业的python版本的运行平台是anaconda,可以用jupyter运行,里面的的readme文档会很清楚教你怎么设置环境。
%load_ext autoreload
%autoreload 2
%matplotlib inline
卷积这里,有基础知识的都差不多看得懂,我就简要说一下,不是很难的知识。
()=(…2(1(;1);2)…),).
前向传播公式
X数据,你原始的图像等一些列数据
W参数,在数据中学习得到
通过公式得到了目标函数
代码:
import lab
from matplotlib import pyplot as plt
# Read an image as a PyTorch tensor
x = lab.imread('data/peppers.png')
# Visualize the input x
plt.figure(1, figsize=(12,12))
lab.imarraysc(x)
# Show the shape of the tensor
print(f"The image tensor shape is {list(x.shape)}")
# Show the data type of the tensor
print(f"The image tensor type is {x.dtype}")
接下来,我们创建一个由10个滤波器组成的组,每个滤波器的维数为3×5×53×5×5,随机初始化它们的系数:
import torch
# Create a bank of linear filters
w = torch.randn(10,3,5,5)
# Visualize the filters
plt.figure(1, figsize=(12,12))
lab.imarraysc(w, spacing=1) ;
下一步是将过滤器应用于图像。这使用torch.n.function中的卷积2d函数:
import torch.nn.functional as F
# Apply the convolution operator
y = F.conv2d(x, w)
# Visualize the convolution result
print(f"The output shape is {y.shape}")
我们现在可以想象出卷积的输出y。为此,使用所提供的lab.imarraysc函数为y中的每个特征通道显示图像:
# Visualize the output y, one channel per image
fig = plt.figure(figsize=(15, 10))
lab.imarraysc(lab.t2im(y)) ;
过滤器保持输入特征映射的分辨率。然而,降低输出样本通常是有用的。这可以通过在卷积2d中使用STRDED选项来实现:
# Try again, downsampling the output
y_ds = F.conv2d(x, w, stride=16)
plt.figure(figsize=(15, 10))
lab.imarraysc(lab.t2im(y_ds)) ;
增大stride,即增大步长,让输出分辨率降低,有利于后期计算
正如您在上面的问题中应该注意到的,对图像或特征映射应用过滤器与边界交互,使输出映射与过滤器的大小成正比。如果这是不可取的,则可以通过使用填充选项将输入数组填充为零:
# Try again, downsampling the output
y_ds = F.conv2d(x, w, padding=4)
plt.figure(figsize=(15, 10))
lab.imarraysc(lab.t2im(y_ds)) ;
现在随意设置一个过滤器:
# Initialize a filter
w = torch.Tensor([
[0, 1, 0 ],
[1, -4, 1 ],
[0, 1, 0 ]
])[None,None,:,:].expand(1,3,3,3)
print(f"The shape of the filter w is {list(w.shape)}")
# Apply convolution
y_lap = F.conv2d(x, w) ;
# Show the output
plt.figure(1,figsize=(12,12))
lab.imsc(y_lap[0])
# Show the output absolute value
plt.figure(2,figsize=(12,12))
lab.imsc(abs(y_lap)[0]) ;
CNN是通过组合几种不同的功能来获得的。除了前面部分所示的线性滤波器外,还有几个非线性算子。
最简单的非线性是通过一个非线性激活函数跟随一个线性滤波器得到的,它同样地应用于一个特征映射的每个分量(即点方向)。最简单的这类函数是修正的线性单位(Relu)。
=max{0,}.
这个函数是由relu实现的;让我们来尝试一下:
# Initialize a filter
w = torch.Tensor([1, 0, -1])[None,None,:].expand(1,3,3,3)
# Convolution
y = F.conv2d(x, w) ;
# ReLU
z = F.relu(y) ;
plt.figure(1,figsize=(12,12))
lab.imsc(y[0])
plt.figure(2,figsize=(12,12))
lab.imsc(z[0]) ;
=max{:≤<+}.
池化操作想象的比喻就是,一个规定的框内的像素,我按照一定算术法则,用一个数来表征这个框,这个数可以是这里面最大的,也可以是平均数。这里是最大数。
MAX池由max_pool 2d函数实现。
y = F.max_pool2d(x, 15)
plt.figure(1, figsize=(12,12))
lab.imarraysc(lab.t2im(y)) ;
=/(+∑∈()^2/p)
可以在下面的代码里调整,,的值。
# LRN with some specially-chosen parameters
y_nrm = F.local_response_norm(x, 5, alpha=5, beta=0.5, k=0)
plt.figure(1,figsize=(12,12))
lab.imarraysc(lab.t2im(y_nrm)) ;
归一化数据意思就是缩放大的数据的权重,扩大小的数据权重,,让他们统一到一个量级,使之影响力不会有太大偏差。
L2范数是指向量各元素的平方和然后求平方根,将 κ=0、 α=5、β =1/2可以得到L2范数。
这里默认的设置就是L2规范化参数
因为下面你会发现不改动参数,两者的不同0.0%,也就是基本一致。
import math
# Another implementation of the same
y_nrm_alt = x / torch.sqrt((x**2).sum(1))
plt.figure(1,figsize=(12,12))
lab.imarraysc(lab.t2im(y_nrm_alt))
# Check that they indeed match
def compare(x,y):
with torch.no_grad():
a2 = torch.mean((x - y)**2)
b2 = torch.mean(x**2)
c2 = torch.mean(y**2)
return 200 * math.sqrt(a2.item()) / math.sqrt(b2.item() + c2.item())
print(f"The difference between y_nrm and y_nrm_alt is {compare(y_nrm,y_nrm_alt):.1f}%")
强烈建议初学者不要通过VGG学关于神经网络的基础知识,因为他这里讲的不清楚,不适合初学者了解概念,建议去李宏毅老师机器学习课程,吴恩达老师机器学习课程,CS231n。VGG更注重整体框架流程,基础知识不是其侧重点。
鉴于博客上不好写公式,一般公式就用图片代替。
上图是CNN的经验损失函数
前面我们学到了梯度下降算法
再看这一段,我想到了一句话
有大智慧的人,不会把问题复杂化,而是把问题简单化。同样的问题,解决的方法越简单说明这个方法越好。
这里先求y值,然后通过调用现成的算法算法算出dx,dw.
y = F.conv2d(x,w) # forward mode (get output y)
p = torch.randn(y.shape) # get a random tensor with the same size as y
# Directly call backward functions for demonstration
dx = torch.nn.grad.conv2d_input(x.shape, w, p)
dw = torch.nn.grad.conv2d_weight(x, w.shape, p)
print(f"The shape of x is {list(x.shape)} and that of dx is {list(dx.shape)}")
print(f"The shape of w is {list(x.shape)} and that of dw is {list(dx.shape)}")
print(f"The shape of y is {list(y.shape)} and that of p is {list(p.shape)}")
x.requires_grad_(True)
w.requires_grad_(True)
if x.grad is not None:
x.grad.zero_()
if w.grad is not None:
w.grad.zero_()
y = F.conv2d(x,w)
y.backward(p)
dx_ = x.grad
dw_ = w.grad
print(f"The difference between dx and dx_ is {compare(dx,dx_):.1f}%")
print(f"The difference between dw and dw_ is {compare(dw,dw_):.1f}%")
# Read an example image
x = lab.imread('data/peppers.png')
x.requires_grad_(True)
# Create a bank of linear filters
w = torch.randn(10,3,5,5)
w.requires_grad_(True)
# Forward
y = F.conv2d(x, w)
# Set the derivative dz to a randmo value
dzdy = torch.randn(y.shape)
# Backward
y.backward(dzdy)
dzdx = x.grad
dzdw = w.grad
print(f"Size of dzdx = {list(dzdx.shape)}")
print(f"Size of dzdw = {list(dzdw.shape)}")
# Check the derivative numerically
with torch.no_grad():
delta = torch.randn(x.shape)
step = 0.0001
xp = x + step * delta
yp = F.conv2d(xp, w)
dzdx_numerical = torch.sum(dzdy * (yp - y) / step)
dzdx_analytical = torch.sum(dzdx * delta)
err = compare(dzdx_numerical,dzdx_analytical)
print(f"dzdx_numerical: {dzdx_numerical}")
print(f"dzdx_analytical: {dzdx_analytical}")
print(f"numerical vs analytical rel. error: {err:.3f}%")
从这里我们可以看到数值梯度和解析梯度并不完全相同,所以还需要修正。
后面还有类似的就不说了。/、
在这一部分,我们将学习一个非常简单的CNN。CNN由两层组成:一层是卷积层,另一层是平均池层:
包含一个3×3平方滤波器,使得是一个标量,输入图像=1有一个单通道。
def tinycnn(x, w, b):
pad1 = (w.shape[2] - 1) // 2
rho2 = 3
pad2 = (rho2 - 1) // 2
x = F.conv2d(x, w, b, padding=pad1)
x = F.avg_pool2d(x, rho2, padding=pad2, stride=1)
return x
# Load a training image and convert to gray-scale
im0 = lab.imread('data/dots.jpg')
im0 = im0.mean(dim=1)[None,:,:,:]
# Compute the location of black blobs in the image
pos, neg, indices = lab.extract_black_blobs(im0)
pos = pos / pos.sum()
neg = neg / neg.sum()
# Display the training data
plt.figure(1, figsize=(8,8))
lab.imsc(im0[0])
plt.plot(indices[1], indices[0],'go', markersize=8, mfc='none')
plt.figure(2, figsize=(8,8))
lab.imsc(pos[0])
plt.figure(3, figsize=(8,8))
lab.imsc(neg[0]) ;
在这里插入图片描述
通过减去图像的平均值对图像进行预处理
# Preprocess the image by subtracting its mean
im = im0 - im0.mean()
im = lab.imsmooth(im, 3)
import torch
num_iterations = 501
rate = 10
momentum = 0.9
shrinkage = 0.01
plot_period = 200
with torch.no_grad():
w = torch.randn(1,1,3,3)
w = w - w.mean()
b = torch.Tensor(1)
b.zero_()
E = []
w.requires_grad_(True)
b.requires_grad_(True)
w_momentum = torch.zeros(w.shape)
b_momentum = torch.zeros(b.shape)
for t in range(num_iterations):
# Evaluate the CNN and the loss
y = tinycnn(im, w, b)
z = (pos * (1 - y).relu() + neg * y.relu()).sum()
# Track energy
E.append(z.item() + 0.5 * shrinkage * (w**2).sum().item())
# Backpropagation
z.backward()
# Gradient descent
with torch.no_grad():
w_momentum = momentum * w_momentum + (1 - momentum) * (w.grad + shrinkage * w)
b_momentum = momentum * b_momentum + (1 - momentum) * b.grad
w -= rate * w_momentum
b -= 0.1 * rate * b_momentum
w.grad.zero_()
b.grad.zero_()
# Plotting
if t % plot_period == 0:
plt.figure(1,figsize=(12,4))
plt.clf()
fig = plt.gcf()
ax1 = fig.add_subplot(1, 3, 1)
plt.plot(E)
ax2 = fig.add_subplot(1, 3, 2)
lab.imsc(w.detach()[0])
ax3 = fig.add_subplot(1, 3, 3)
lab.imsc(y.detach()[0])
Finally, consider the regularisation effect of shrinking:
Task: Restore the learning rate and momentum as given originally. Then increase the shrinkage factor tenfold and a hundred-fold.
What is the effect on the convergence speed?
What is the effect on the final value of the total objective function and of the average loss part of it?
收敛速度加快,但是会影响目标函数和平均损失的优化,导致结果变差,
老规矩先处理处理数据
加载数据
加载一个IMDb结构,其中包含使用从Google字体项目下载的29,094个字体呈现的字符a、b、.、z的图像。
# Load data
imdb = torch.load('data/charsdb.pth')
print(f"imdb['images'] has shape {list(imdb['images'].shape)}")
print(f"imdb['labels'] has shape {list(imdb['labels'].shape)}")
print(f"imdb['sets'] has shape {list(imdb['sets'].shape)}")
# Plot the training data for 'a'
plt.figure(1,figsize=(15,15))
plt.clf()
plt.title('Training data')
sel = (imdb['sets'] == 0) & (imdb['labels'] == 0)
lab.imarraysc(imdb['images'][sel,:,:,:])
# Plot the validation data for 'a'
plt.figure(2,figsize=(12,12))
plt.clf()
plt.title('Validation data')
sel = (imdb['sets'] == 1) & (imdb['labels'] == 0)
lab.imarraysc(imdb['images'][sel,:,:,:]) ;
import torch.nn as nn
def new_model():
return nn.Sequential(
nn.Conv2d(1,20,5),
nn.MaxPool2d(2,stride=2),
nn.Conv2d(20,50,5),
nn.MaxPool2d(2,stride=2),
nn.Conv2d(50,500,4),
nn.ReLU(),
nn.Conv2d(500,26,2),
)
model = new_model()
Task: By inspecting the code above, get a sense of the architecture that will be trained. How many layers are there? How big are the filters? 四层卷积层,两层最大池化层,一层relu非线性优化, 卷积核大小5,5,4,2
网络定义了一个层序列。卷积层隐式包含参数张量(滤波权值和偏差),使用随机数预先初始化.这意味着,尽管输出将是随机的,但我们已经可以将网络应用于字符,如下所示:
# Evaluate the network on three images
y = model(imdb['images'].narrow(0,0,3))
print(f"The size of the network output is {list(y.shape)}")
# Preserves only the first two dimensions
y = y.reshape(*y.shape[:2])
print(f"The size of the network output is now {list(y.shape)}")
# Evaluate the cross-entropy loss on the network output assuming that 'a', 'b', 'c' are the ground-truth chars
loss = nn.CrossEntropyLoss()
c = torch.LongTensor([ord('a'), ord('b'), ord('c')]) - ord('a')
z = loss(y, c)
print(f"The loss value is {z.item():.3f}")
我们现在已经准备好训练CNN了。为此,我们将使用与实际代码一起提供的lab.trianmodel()函数。此函数在提供的LabPython模块中定义,该模块可以在编辑器中打开。该函数被定义为:DEF TRANS_MODEL(Model,IMDb,Batch_Size=100,NUM_EILCHS=15,USE_GPU=false,USE_Jitter=false),在默认情况下,训练将使用100个元素的小批处理大小,运行15次(通过数据),不使用GPU,它将使用0.001的学习速率。在训练开始之前,像以前一样减去平均图像值。我们保持不变,以备日后使用。
# Remove average intensity from input images
im_mean = imdb['images'].mean()
imdb['images'].sub_(im_mean) ;
原文件中代码不兼容
lab.py文件line233,265,296中time.clock()是python3.8之前的用法,3.3废弃,3.8移除 python3.8改用time.perf_counter()
# Initialize the model
model = new_model()
# Run SGD to train the model
model = lab.train_model(model, imdb, num_epochs=15, use_gpu=True)
is the training taking too long? If you have access to a GPU you could go to the next part to train the model for the full 15 epochs. Otherwise, you need to restart the CPU training set for 15 epochs, and wait until the training finishes.
Task: Run the learning code and examine the plots that are produced. As training completes answer the following questions:
How many images per second can you process? (Look at the rate of output on the screen)
There are two sets of curves: energy and prediction error. What do you think is the difference? What is the “loss” and the “accuracy”?
Some curves are labelled “training” and some other “validation”. Should they be equal? Which one should be lower than the other?
初始设置训练代次只有两次确实太少了,导致验证集准确率不高,随着设置10次,15次,在验证集上的准确率有不断地提高。 误差在不断降低,准确率在不断提高,验证集的准确率一般都是低于训练集的准确率。
Once training is finished, the model object contains the trained parameters.
# Visualize the filters in the first layer
plt.figure(1, figsize=(12,12))
plt.title('filters in the first layer')
lab.imarraysc(model[0].weight, spacing=1) ;
We now apply the model to a whole sequence of characters. This is the image data/sentence-lato.png:
# Load a pre-trained model. Do this in a pinch, otherwise train your own.
#model = new_model()
#model.load_state_dict(torch.load('data/charscnn.pth'))
#model.load_state_dict(torch.load('data/charscnn_jitter.pth'))
#这里我缺jitter.pth文件做不了
# Load sentence
im = lab.imread('data/sentence-lato.png')
im.sub_(im_mean)
# Apply the CNN to the larger image
y = model(im)
# Show the string
chars = lab.decode_predicted_string(y)
print(f"Predicted string '{''.join(chars)}'")
这里我缺这个文件所以没做
# Visualize the predicted string
plt.figure(1, figsize=(12,8))
lab.plot_predicted_string(im, y) ;
Tasks: inspect the output of the lab.plot_predicted_string() function and answer the following:
Is the quality of the recognition any good?
Does this match your expectation given the recognition rate in your validation set (as reported by lab.train_model() during training)? 识别验证率好,符合期望
# Show the file
!ls -lh data/alexnet.pth
# Import the model
#pip install --no-deps torchvision可以直接装上torch对应的vision
#pip install torchvision或者pip3则不行,会报错
import torchvision
alexnet = torchvision.models.alexnet(pretrained=False)
alexnet.load_state_dict(torch.load('data/alexnet.pth'))
# Display the model structure
print(alexnet)
from PIL import Image
# Obtain and preprocess an image
im = Image.open('data/peppers.png')
preprocess = torchvision.transforms.Compose([
torchvision.transforms.Resize((224,224),),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
im_normalized = preprocess(im)[None,:]
print(f"The shape of AlexNet input is {list(im_normalized.shape)}")
# Put the model in evaluation mode
alexnet.eval()
# Run the CNN
y = alexnet(im_normalized)
print(f"The shape of AlexNet output is {list(y.shape)}")
import json
# Get the best class index and score
best, bestk = y.max(dim=1)
# Get the corresponding class name
with open('data/imnet_classes.json') as f:
classes = json.load(f)
name = classes[str(bestk.item())][1]
# Plot the results
plt.figure(1, figsize=(8,8))
lab.imsc(im_normalized[0])
plt.title(f"{name} ({bestk.item()}), score {best.item():.3f}") ;
鉴于下面的实验还有很多,暂时自己时间不多,有时间再更。
希望有问题的可以和我多多讨论。