大学生个人通过搜集csdn中的各种信息得到的一篇文章,里面如果有什么问题,感谢大佬们的指正。很多都是我从其他大佬的文章中提取出来的一些信息的解读。大佬还是多呀,很多问题他们都有详细的解释,一下是个人的总结。啥都有!!! cnn 有三层 卷积层 池化层 全连接层
卷积层也叫做滤波器(filter)也叫做内核(kernel)。它的作用是对输入的图像进行特征提取。
这里有一个叫做sequential。一般就这样写 model=Sequential()
Sequential是序列模型,是模型的线性组合,可以按照顺序依次添加相应的网络层。在第一层中需要指定输入的尺寸,其余层中不必指定,模型能够自动推导中间过程的尺寸。这个函数是在keras.model这个库里面的
from keras.models import Sequential
在这个后面建立最初模型还有一个函数,叫Dense(也可以不要)。
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding='same', activation='relu'))
如果用:
model.add(Dense(32, input_shape=(500,)))
他的作用是实现神经网络里的全连接层。
先解释什么是全连接层:预测所有输入和权重的总和。它将每一层的神经元连接到下一层的所有神经元上。和一些激活函数一起使用得到结果进行分类。
Dense也在keras库中
from keras.layers import Dropout, Dense, Flatten
具体解释可看这个大佬写的文章 Better Bench写的【Python-Keras】keras.layers.Dense层的解析与使用。
然后就会有一个函数叫做dropout函数。dropout的正则化可以有效减少过拟合。
什么是过拟合?
过拟合也叫overfit。拟合的时候一般会给一个数据集,我们根据这个数据集使用线性递归来拟合。但是,在除这个数据集之外,我们拟合的曲线在其他数据集中是完全不合适的。例如,给一个多i维函数的例子,我们先只用函数一小段的数据集来拟合函数,拟合的曲线在其他段上是完全不合适的。这个就叫做过拟合。
过拟合产生的原因:数据集太少了,我们上面举得例子就是。
还有可能是拟合的复杂度高于问题的复杂度。例如,我们将一张图片上的车辆进计数,但是我们看中了车的后视镜,车的天窗,车的车牌号,以至于电脑在看到一辆车没有以上条件就会判定这不是一辆车。
还有就是噪音太大了,还是上面那个例子,比如来了几辆货车,这个就算是噪音了。
所以我们需要消除过拟合。
消除过拟合的方法:
首先肯定是增加数据集的大小,我们如果能有一个函数的整个数据集,那么拟合效果就不会出现只能拟合一小段数据的情况。
然后,我们可以先观察这个数据集的特点,不要设置这么多的分支,一辆车,如果有车的外形,有人坐在驾驶室,那么就算一辆车,不要考虑太多情况。
其次就是正则化了,正则化强调的是惩罚。比如一个高维函数,每项都有自己的系数。我们设为θ带上下角标。通过一个数据集的训练之后,我们发现只需要三次函数就能将数据集拟合的比较好,但是我们的m最高次是大于三的,而且在得到函数之后会发现这个高次函数在拟合其他数据集时表现不好,所以我们就要正则化这个函数。我们在损失函数后面加上比三次函数更高次项的项数的函数,这个函数我们将他定义为θ的平方。
其中叫做正则化参数,越大则惩罚力度也越大,但并不是越大越好当太大时就会造成拟合函数中的参数太小以至于拟合函数就等于0变成一条直线,造成欠拟合。
还有种就是dropout函数咯。
详细请看https://blog.csdn.net/qq_53430308/article/details/122503184
里面有三种正则化方式。最为常用的用为L1正则化与L2正则化。L0可以最自然的实现“稀疏约束”,但其求解性质不太良好,利用L1范数是L0范数的最优凸近似,又有较好的求解特性(可利用近端梯度下降求解),因此可以得到较为稀疏的解,利用L2范数进行正则化则可以大大降低计算量,拥有计算方便的特性。
这里请看 https://blog.csdn.net/weixin_42109859/article/details/102967608
以下是dropout函数的详解。
dropout,我们其实可以根据他的名称知道,这个函数会丢一些东西。丢的是啥,就是神经网络中的一些节点。为什么要丢弃一些节点,这个优点很多。简而言之就是让模型泛化性更强,不会过分依赖某些特征。就上述汽车这个例子来说,我们可以忽略车牌号这个特点。而且这样能够极大的减少计算量。还可以减少个个神经元之间的依赖性。
具体请看:https://blog.csdn.net/program_developer/article/details/80737724?spm=1001.2014.3001.5506
这里有个概念叫做鲁棒性,也就是robust,鲁棒性简言之就是这个模型的健壮性,对外来干扰的抵抗性强弱。
然后是flatten函数。
flatten函数是将矩阵展开,原本可能是3×2×2的矩阵,在用完flatten以后可以变成1×12,3×4之类的矩阵。
from keras.layers.convolutional import Conv2D, MaxPooling2D
接下来说Conv2D函数。
conv2d也就是卷积操作,之间有讲到卷积核--kernel,这个是必不可少的,还有就是input,input也是矩阵,conv2d就是处理两个矩阵,然后得到一个具有两个矩阵关系的新矩阵。多次进行这个操作,可以让一张图像处理起来更简单。
一下是GitHub上的一段代码,我将分析这段代码。
# -*- coding: utf-8 -*-
"""
Training a Classifier
=====================
This is it. You have seen how to define neural networks, compute loss and make
updates to the weights of the network.
Now you might be thinking,
What about data1?
----------------
Generally, when you have to deal with image, text, audio or video data1,
you can use standard python packages that load data1 into a numpy array.
Then you can convert this array into a ``torch.*Tensor``.
- For images, packages such as Pillow, OpenCV are useful
- For audio, packages such as scipy and librosa
- For text, either raw Python or Cython based loading, or NLTK and
SpaCy are useful
Specifically for vision, we have created a package called
``torchvision``, that has data1 loaders for common datasets such as
ImageNet, CIFAR10, MNIST, etc. and data1 transformers for images, viz.,
``torchvision.datasets`` and ``torch.utils.data1.DataLoader``.
This provides a huge convenience and avoids writing boilerplate code.
For this tutorial, we will use the CIFAR10 dataset.
It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,
‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of
size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.
.. figure:: /_static/img/cifar10.png
:alt: cifar10
cifar10
Training an image classifier
----------------------------
We will do the following steps in order:
1. Load and normalize the CIFAR10 training and test datasets using
``torchvision``
2. Define a Convolutional Neural Network
3. Define a loss function
4. Train the network on the training data1
5. Test the network on the test data1
1. Load and normalize CIFAR10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using ``torchvision``, it’s extremely easy to load CIFAR10.
"""
import torch
import torchvision
import torchvision.transforms as transforms
########################################################################
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1].
########################################################################
# .. note::
# If running on Windows and you get a BrokenPipeError, try setting
# the num_worker of torch.utils.data1.DataLoader() to 0.
transform = transforms.Compose(
[transforms.ToTensor(),#将其他类型数据转化成tensor类型。
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
#compose函数是将torch.transforms库中的多个函数个综合起来。
'''ToTensor()能够把灰度范围从0-255变换到0-1之间,
而后面的transform.Normalize()则把0-1变换到(-1,1).
具体地说,对每个通道而言,Normalize执行以下操作:
image=(image-mean)/std
其中mean和std分别通过(0.5,0.5,0.5)和(0.5,0.5,0.5)进行指定。原来的0-1最小值0则变成(0-0.5)/0.5=-1,而最大值1则变成(1-0.5)/0.5=1.
'''
batch_size = 4#每次的测试长度
trainset = torchvision.datasets.CIFAR10(root='./data1', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data1', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False, num_workers=2)
#shuffle=true就是将图片打乱,每次测试的图片随机。
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
########################################################################
# Let us show some of the training images, for fun.
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))#具体情况具体分析,有些需要分析数据的结构。这个地方没有这行也没有问题。
plt.show()
if __name__=='__main__':#加上这个程序才不会报错
# get some random training images
dataiter =iter(trainloader)
#iter是python自带的迭代函数,下面的next函数必须对iterator的类型的数据使用。
images, labels = next(dataiter)
#images 和labels都是tensor类型,labels包含的是每张图像代表的动物在classes中的位置。
# show images
imshow(torchvision.utils.make_grid(images))#这个函数就是将多个图像合并成一张图片,后面没有参数padding就是图片之间没有间距。
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))
########################################################################
# 2. Define a Convolutional Neural Network
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Copy the neural network from the Neural Networks section before and modify it to
# take 3-channel images (instead of 1-channel images as it was defined).
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):#建立神经网络
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)#卷积操作,三通道,6*5的kernel
self.pool = nn.MaxPool2d(2, 2)#池化层,减少数据
self.conv2 = nn.Conv2d(6, 16, 5)#卷积操作
self.fc1 = nn.Linear(16 * 5 * 5, 120)#矩阵乘法,但是会加上一个偏置
self.fc2 = nn.Linear(120, 84)#矩阵成法,上一行的120列成了下一行的120行。
self.fc3 = nn.Linear(84, 10)#最终变成120*10这个矩阵。
def forward(self, x):#也就是上面所解释的。
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()#初始化
########################################################################
# 3. Define a Loss function and optimizer
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Let's use a Classification Cross-Entropy loss and SGD with momentum.
import torch.optim as optim#优化器
criterion = nn.CrossEntropyLoss()#计算损失函数
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
#sgd优化器,学习速率是0.001,动量参数是0.9,这个会影响下降到最低点的速度。
########################################################################
# 4. Train the network
# ^^^^^^^^^^^^^^^^^^^^
#
# This is when things start to get interesting.
# We simply have to loop over our data1 iterator, and feed the inputs to the
# network and optimize.
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data1 is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()#将原有的梯度改成0
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()# 反向传播,得到每个可调节参数对应的梯度(grad不再是none)
optimizer.step() #对每个参数进行改变,weight-data被改变
#上面几句代码是固定的。
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
########################################################################
# Let's quickly save our trained model:
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)#将训练完的参数保留下来,下次还能接着使用。
########################################################################
# See `here `_
# for more details on saving PyTorch models.
#
# 5. Test the network on the test data1
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# We have trained the network for 2 passes over the training dataset.
# But we need to check if the network has learnt anything at all.
#
# We will check this by predicting the class label that the neural network
# outputs, and checking it against the ground-truth. If the prediction is
# correct, we add the sample to the list of correct predictions.
#
# Okay, first step. Let us display an image from the test set to get familiar.
dataiter = iter(testloader)
images, labels = next(dataiter)
# print images
imshow(torchvision.utils.make_grid(images))#显示图片
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))
########################################################################
# Next, let's load back in our saved model (note: saving and re-loading the model
# wasn't necessary here, we only did it to illustrate how to do so):
net = Net()
net.load_state_dict(torch.load(PATH))#再次使用上次训练完的数据。
########################################################################
# Okay, now let us see what the neural network thinks these examples above are:
outputs = net(images)
########################################################################
# The outputs are energies for the 10 classes.
# The higher the energy for a class, the more the network
# thinks that the image is of the particular class.
# So, let's get the index of the highest energy:
_, predicted = torch.max(outputs, 1)
#torch.max函数,当后面没有小括号时,会返回两组值,一组是outputs后面这个参数维度的最大值,然后再返回这个维度最大值的索引。
print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'
for j in range(4)))
########################################################################
# The results seem pretty good.
#
# Let us look at how the network performs on the whole dataset.
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
for data in testloader:
images, labels = data
# calculate outputs by running images through the network
outputs = net(images)
# the class with the highest energy is what we choose as prediction
_, predicted = torch.max(outputs.data, 1)#最大值的索引就是预测它在classes里的哪个位置
total += labels.size(0)#size函数,计算labels第一维元素个数。
correct += (predicted == labels).sum().item()
print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')
########################################################################
# That looks way better than chance, which is 10% accuracy (randomly picking
# a class out of 10 classes).
# Seems like the network learnt something.
#
# Hmmm, what are the classes that performed well, and the classes that did
# not perform well:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}
# again no gradients needed
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predictions = torch.max(outputs, 1)
# collect the correct predictions for each class
for label, prediction in zip(labels, predictions):
if label == prediction:
correct_pred[classes[label]] += 1
total_pred[classes[label]] += 1
# print accuracy for each class
for classname, correct_count in correct_pred.items():
accuracy = 100 * float(correct_count) / total_pred[classname]
print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')
########################################################################
# Okay, so what next?
#
# How do we run these neural networks on the GPU?
#
# Training on GPU
# ----------------
# Just like how you transfer a Tensor onto the GPU, you transfer the neural
# net onto the GPU.
#
# Let's first define our device as the first visible cuda device if we have
# CUDA available:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)
########################################################################
# The rest of this section assumes that ``device`` is a CUDA device.
#
# Then these methods will recursively go over all modules and convert their
# parameters and buffers to CUDA tensors:
#
# .. code:: python
#
# net.to(device)
#
#
# Remember that you will have to send the inputs and targets at every step
# to the GPU too:
#
# .. code:: python
#
# inputs, labels = data1[0].to(device), data1[1].to(device)
#
# Why don't I notice MASSIVE speedup compared to CPU? Because your network
# is really small.
#
# **Exercise:** Try increasing the width of your network (argument 2 of
# the first ``nn.Conv2d``, and argument 1 of the second ``nn.Conv2d`` –
# they need to be the same number), see what kind of speedup you get.
#
# **Goals achieved**:
#
# - Understanding PyTorch's Tensor library and neural networks at a high level.
# - Train a small neural network to classify images
#
# Training on multiple GPUs
# -------------------------
# If you want to see even more MASSIVE speedup using all of your GPUs,
# please check out :doc:`data_parallel_tutorial`.
#
# Where do I go next?
# -------------------
#
# - :doc:`Train neural nets to play video games `
# - `Train a state-of-the-art ResNet network on imagenet`_
# - `Train a face generator using Generative Adversarial Networks`_
# - `Train a word-level language model using Recurrent LSTM networks`_
# - `More examples`_
# - `More tutorials`_
# - `Discuss PyTorch on the Forums`_
# - `Chat with other users on Slack`_
#
# .. _Train a state-of-the-art ResNet network on imagenet: https://github.com/pytorch/examples/tree/master/imagenet
# .. _Train a face generator using Generative Adversarial Networks: https://github.com/pytorch/examples/tree/master/dcgan
# .. _Train a word-level language model using Recurrent LSTM networks: https://github.com/pytorch/examples/tree/master/word_language_model
# .. _More examples: https://github.com/pytorch/examples
# .. _More tutorials: https://github.com/pytorch/tutorials
# .. _Discuss PyTorch on the Forums: https://discuss.pytorch.org/
# .. _Chat with other users on Slack: https://pytorch.slack.com/messages/beginner/
# %%%%%%INVISIBLE_CODE_BLOCK%%%%%%
del dataiter
# %%%%%%INVISIBLE_CODE_BLOCK%%%%%%
里面还有太多的函数细节。我会在其他博客上写。
当然这段代码有前提条件,就是得装好torch相关库。直接在pycharm下载是不行,建议用anaconda上下载。