cnn图像分类_使用cnn进行图像分类

cnn图像分类

PyTorch on CIFAR10

CIFAR10上的PyTorch

In my previous posts we have gone through

在我以前的帖子中,我们已经看过

  1. Deep Learning — Artificial Neural Network(ANN)

    深度学习-人工神经网络(ANN)

  2. Tensors — Basics of PyTorch programming

    张量-PyTorch编程基础

  3. Linear Regression with PyTorch

    使用PyTorch进行线性回归

  4. Image Classification with PyTorch — logistic regression

    使用PyTorch进行图像分类-Logistic 回归

  5. Training Deep Neural Networks on a GPU with PyTorch

    使用PyTorch在GPU上训练深度神经网络

Let us try to classify images using Convolution

让我们尝试使用卷积对图像进行分类

2D卷积:运算 (2D Convolutions: The operation)

Source 资源

The idea behind convolution is the use of image kernels. A kernel is a small matrix (usually of size 3 by 3) used to apply effect to an image (like sharpening, blurring…).They’re also used in machine learning for ‘feature extraction’, a technique for determining the most important portions of an image.

卷积背后的想法是使用图像内核。 核是用于对图像施加效果(如锐化,模糊等)的小矩阵(通常为3×3)。它们还用于机器学习中的“特征提取”,这是一种确定最重要的技术。图片的一部分。

Common techniques used in CNN : Padding and Striding

CNN中使用的常用技术:填充和跨步

Source 资源

Padding: If you see the animation above, notice that during the sliding process, the edges essentially get “trimmed off”, converting a 5×5 feature matrix to a 3×3 one.

填充 :如果您看到上面的动画,请注意,在滑动过程中,边缘实际上被“修剪掉了”,将5×5特征矩阵转换为3×3特征矩阵。

This is how padding works, pad the edges with extra, “fake” pixels (usually of value 0, hence the oft-used term “zero padding”). This way, the kernel when sliding can allow the original edge pixels to be at its center, while extending into the fake pixels beyond the edge, producing an output the same size as the input.

这就是填充的工作方式,用额外的“假”像素(通常值为0,因此经常使用的术语“零填充”)填充边缘。 这样,内核在滑动时可以允许原始边缘像素位于其中心,同时扩展到边缘以外的伪像素,从而产生与输入大小相同的输出。

Striding: Often when running a convolution layer, you want an output with a lower size than the input. This is commonplace in convolutional neural networks, where the size of the spatial dimensions are reduced when increasing the number of channels. One way of accomplishing this is by using a pooling layer (eg. taking the average/max of every 2×2 grid to reduce each spatial dimensions in half). Yet another way to do is is to use a stride:

跨步 :通常在运行卷积层时,您希望输出的大小小于输入的大小。 这在卷积神经网络中很常见,在卷积神经网络中,当增加通道数时,空间尺寸的大小会减小。 实现此目的的一种方法是使用池化层(例如,采用每2×2网格的平均值/最大值将每个空间尺寸减半)。 另一方法是大步前进:

Source 资源

The idea of the stride is to skip some of the slide locations of the kernel. A stride of 1 means to pick slides a pixel apart, so basically every single slide, acting as a standard convolution. A stride of 2 means picking slides 2 pixels apart, skipping every other slide in the process, downsizing by roughly a factor of 2, a stride of 3 means skipping every 2 slides, downsizing roughly by factor 3, and so on.

跨步的想法是跳过内核的某些滑动位置。 跨度为1表示将幻灯片分割为一个像素,因此基本上每个幻灯片都作为标准卷积。 跨度为2表示选择相距2个像素的幻灯片,在此过程中跳过其他每张幻灯片,将尺寸缩小约2倍,跨度为3表示将每2张幻灯片跳过,将尺寸缩小3倍,依此类推。

The Multi Channel RGB Image version

多通道RGB图像版本

We deal with RGB images most of the time.In practicality, most input images have 3 channels, and that number only increases the deeper you go into a network

我们大多数时候都处理RGB图像。实际上,大多数输入图像都有3个通道,并且这个数字只会增加您进入网络的深度

Andre Mouton) 安德烈·木顿 )

Each filter actually happens to be a collection of kernels, with there being one kernel for every single input channel to the layer, and each kernel being unique.Hence for a RGB image we have 3 input channels and 3 kernels which together makes a filter.

每个过滤器实际上都是一个内核集合 该层的每个输入通道都有一个内核,并且每个内核都是唯一的。因此对于RGB图像,我们有3个输入通道和3个内核一起构成一个滤镜。

Each of the kernels of the filter “slides” over their respective input channels, producing a processed version of each. Some kernels may have stronger weights than others, to give more emphasis to certain input channels than others.

过滤器的每个内核在其各自的输入通道上“滑动”,从而生成每个的处理后版本。 一些内核可能比其他内核具有更强的权重,从而比其他内核更强调某些输入通道。

source 资源

Each of the per-channel processed versions are then summed together to form one channel. The kernels of a filter each produce one version of each channel, and the filter as a whole produces one overall output channel.

然后将每个按通道处理的版本相加在一起以形成一个通道。 过滤器的内核各自为每个通道生成一个版本,而过滤器作为一个整体将生成一个整体输出通道。

source 资源

Finally, then there’s the bias term. The way the bias term works here is that each output filter has one bias term. The bias gets added to the output channel so far to produce the final output channel.

最后,还有偏差项。 偏置项在这里的工作方式是每个输出滤波器都有一个偏置项。 到目前为止,偏置已添加到输出通道,以产生最终的输出通道。

Source 资源

To dive more into CNN please go through this wonderful article,Intuitively understanding Convolutions for Deep Learning by Irhum Shafkat.

要深入研究CNN,请阅读这篇精彩的文章,Irhum Shafkat的《 直观理解深度学习的卷积》 。

Lets get into coding of CNN with PyTorch.

让我们用PyTorch编码CNN。

Step 1 : Import necessary libraries & Explore the data set

第1步:导入必要的库并浏览数据集

We are importing the necessary libraries pandas , numpy , matplotlib ,torch ,torchvision. With basic EDA we could infer that CIFAR-10 data set contains 10 classes of image, with training data set size of 50000 images , test data set size of 10000.Each image is of [3 x 32 x 32 ]. Which represents 3 channels RGB,32 x 32 pixel size.

我们正在导入必要的库pandas,numpy,matplotlib,torch,torchvision。 使用基本的EDA,我们可以推断出CIFAR-10数据集包含10类图像,训练数据集大小为50000图像,测试数据集大小为10000。每个图像为[3 x 32 x 32]。 代表3通道RGB,32 x 32像素大小。

Step 2 : Prepare data for training

步骤2:准备训练资料

We using training set , validation set , Test set. Why we need them ?

我们使用训练集,验证集,测试集。 为什么我们需要它们?

Training set : used to train our model,computing loss & adjust weights Validation set : To evaluate the model with hyper parameters & pick the best model during training. we are using 10% of training data as validation set Test data set : Used to compare different models & report the final accuracy.

训练集:用于训练我们的模型,计算损失和调整权重验证集:使用超参数评估模型并在训练过程中选择最佳模型。 我们使用10%的训练数据作为验证集测试数据集:用于比较不同的模型并报告最终准确性。

These 2 steps are exactly the same which we have done in our previous logistic regression model.

这两个步骤与我们先前的逻辑回归模型中所做的完全相同。

Step 3: Define CNN model

步骤3:定义CNN模型

The Conv2d layer transforms a 3-channel image to a 16-channel feature map, and the MaxPool2d layer halves the height and width. The feature map gets smaller as we add more layers, until we are finally left with a small feature map, which can be flattened into a vector. We can then add some fully connected layers at the end to get vector of size 10 for each image.

Conv2d图层将3通道图像转换为16通道特征图,而MaxPool2d图层将高度和宽度减半。 随着我们添加更多图层,特征图会变小,直到最终剩下一个小的特征图,可以将其展平为矢量。 然后,我们可以在最后添加一些完全连接的层,以获取每个图像的大小为10的向量。

Source 资源
class Cifar10CnnModel(ImageClassificationBase):
def __init__(self):
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 64 x 16 x 16
nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 128 x 8 x 8
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 256 x 4 x 4
nn.Flatten(),
nn.Linear(256*4*4, 1024),
nn.ReLU(),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Linear(512, 10))
def forward(self, xb):
return self.network(xb)

we’d like these outputs to represent probabilities, but for that the elements of each output row must lie between 0 to 1 and add up to 1, which is clearly not the case here.To convert the output rows into probabilities, we use the softmax function.The 10 outputs for each image can be interpreted as probabilities for the 10 target classes (after applying softmax), and the class with the highest probability is chosen as the label predicted by the model for the input image.

我们希望这些输出代表概率,但为此,每个输出行的元素必须位于0到1之间并加1,这显然不是这种情况。要将输出行转换为概率,我们使用softmax函数。每个图像的10个输出可以解释为10个目标类别的概率(应用softmax之后),并且选择概率最高的类别作为模型为输入图像预测的标签。

source 资源

Step 4 : Training on a GPU if available

步骤4:在GPU上进行训练(如果有)

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.Same as used in previous posts.

为了无缝使用GPU(如果有的话),我们定义了几个辅助函数(get_default_device和to_device)和一个辅助类DeviceDataLoader,以根据需要将模型和数据移至GPU。

def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device
def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)

Defining DeviceDataLoader for automatically transferring batches of data to the GPU (if available), and use to_device to move our model to the GPU (if available).

定义DeviceDataLoader以自动将一批数据传输到GPU(如果有),并使用to_device将我们的模型移动到GPU(如果有)。

train_dl = DeviceDataLoader(train_dl, device)
val_dl = DeviceDataLoader(val_dl, device)
to_device(model, device);

Step 5: Training model

步骤5:训练模型

Now define an evaluate function, which will perform the validation phase, and a fit function which will perform the entire training process.

现在定义一个评估函数,它将执行验证阶段,以及一个拟合函数,它将执行整个培训过程。

@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):# Training Phase
model.train()
train_losses = []
for batch in train_loader:
loss = model.training_step(batch)
train_losses.append(loss)
loss.backward()
optimizer.step()
optimizer.zero_grad()# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
model.epoch_end(epoch, result)
history.append(result)
return history

Before training let us instantiate the model and see how it performs on the validation set with the initial set of parameters.

在训练之前,让我们实例化模型,并查看其如何在具有初始参数集的验证集中执行。

The initial accuracy is around 10%, which is what one might expect from a randomly initialized model.

初始精度约为10%,这可能是随机初始化模型所期望的。

We are using hyperparmeters , learning rate =0.001 usually set between 0.0001 and 0.1,number of epochs = 10,optimization function used is adam,batch size 128 is used. These values can be changed and experimented to achieve higher accuracy in shorter time.

我们使用超参数表,学习率= 0.001,通常设置在0.0001和0.1之间,历元数= 10,使用的优化函数是adam,使用批处理大小128。 可以更改这些值,并进行实验以在较短的时间内获得更高的精度。

fit function performing training 适合功能表演训练

Evaluating the accuracy vs number of epochs.Our model reaches an accuracy of around 75%, and by looking at the graph, it seems unlikely that the model will achieve an accuracy higher than 80% even after training for a long time. This suggests that we might need to use a more powerful model to capture the relationship between the images and the labels more accurately. This can be done by adding more convolutional layers to our model, or incrasing the no. of channels in each convolutional layer, or by using regularization techniques.

评估精度与历时数的比较我们的模型达到了约75%的精度,通过查看图表,即使经过长时间的训练,该模型也不太可能达到80%以上的精度。 这表明我们可能需要使用功能更强大的模型来更准确地捕获图像和标签之间的关系。 这可以通过在模型中添加更多卷积层或增加No来完成。 每个卷积层中的通道数,或使用正则化技术。

Accuracy vs no of epochs 准确度与历时

Plotting losses vs number of epochs. Initially, both the training and validation losses seem to decrease over time. However, if you train the model for long enough, you will notice that the training loss continues to decrease, while the validation loss stops decreasing, and even starts to increase after a certain point!

绘制损失与历时数的关系。 最初,训练和验证损失似乎都随着时间而减少。 但是,如果对模型进行足够长时间的训练,您会注意到训练损失继续减少,而验证损失则停止减少,甚至在某个点之后开始增加!

This phenomenon is called overfitting, and it is the no. 1 why many machine learning models give rather terrible results on real-world data. It happens because the model, in an attempt to minimize the loss, starts to learn patters are are unique to the training data, sometimes even memorizing specific training examples.

这种现象称为过拟合 ,但不是。 1为什么许多机器学习模型会在真实数据上给出相当糟糕的结果。 之所以发生这种情况,是因为该模型试图将损失降到最低,从而开始学习训练数据所特有的模式,有时甚至记住特定的训练示例。

Following are some common strategies for avoiding overfitting:

以下是一些避免过度拟合的常用策略:

  • Gathering and generating more training data, or adding noise to it

    收集并生成更多训练数据,或在其中添加噪音
  • Using regularization techniques like batch normalization & dropout

    使用正则化技术(例如批量归一化和辍学)
  • Early stopping of model’s training, when validation loss starts to increase

    当验证损失开始增加时,提早停止模型训练

I will explain in the next post about reaching an accuracy of more than 90% with minor changes in our model.

我将在下一篇文章中解释,通过对模型进行微小的更改,可以达到90%以上的准确性。

Step 6: Saving and loading the model

步骤6:保存和加载模型

Since we’ve trained our model for a long time and achieved a reasonable accuracy, it would be a good idea to save the weights of the model to disk, so that we can reuse the model later and avoid retraining from scratch.

由于我们已经对模型进行了很长时间的训练并获得了合理的精度,因此将模型的权重保存到磁盘上是一个好主意,这样我们以后就可以重用模型并避免从头开始进行训练。

torch.save(model.state_dict(), 'cifar10-cnn.pth')
model2 = to_device(Cifar10CnnModel(), device)
model2.load_state_dict(torch.load('cifar10-cnn.pth'))
evaluate(model2, test_loader)

Please look into the entire code on notebook Github,Stay connected with me on Linked in.

请仔细阅读笔记本Github上的全部代码,保持联系 已连结

Credits & references :

学分和参考:

  1. https://jovian.ml/aakashns/05-cifar10-cnn

    https://jovian.ml/aakashns/05-cifar10-cnn

  2. https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1

    https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1

  3. https://sgugger.github.io/convolution-in-depth.html

    https://sgugger.github.io/convolution-in-depth.html

  4. https://pytorch.org/docs/stable/tensors.html

    https://pytorch.org/docs/stable/tensors.html

翻译自: https://medium.com/swlh/image-classification-with-cnn-4f2a501faadb

cnn图像分类

你可能感兴趣的:(计算机视觉,人工智能,python,opencv,机器学习)