今天学习AI (LEARN AI TODAY)
This is the 3rd story in the Learn AI Today series! These stories, or at least the first few, are based on a series of Jupyter notebooks I’ve created while studying/learning PyTorch and Deep Learning. I hope you find them as useful as I did!
这是《 今日学习AI》中的第三个故事 系列! 这些故事,或者至少是前几篇小说,是基于我在学习/学习PyTorch和Deep Learning时创建的一系列Jupyter笔记本的 。 希望您发现它们和我一样有用!
If you have not already, make sure to check the previous story!
如果您还没有,请确保检查以前的故事!
您将从这个故事中学到什么: (What you will learn in this story:)
- Potatoes Are Not All the Same 土豆不是都一样
- Using Kaggle Datasets 使用Kaggle数据集
- How Convolutional Neural Networks Work 卷积神经网络如何工作
- Using fastai2 to Make Your Life Easier 使用fastai2让您的生活更轻松
1. Kaggle数据集 (1. Kaggle Datasets)
Kaggle Datasets page is a good place to start if you want to find a public dataset. There are almost 50 thousand datasets on Kaggle, a number growing every day as users create and upload new datasets to share with the world.
如果您要查找公共数据集,则Kaggle数据集页面是一个不错的起点。 Kaggle上有近5万个数据集 ,随着用户创建和上传新数据集以与世界共享,这一数据每天都在增长。
After having this idea of creating a Potato Classifier for this lesson, I quickly found this dataset that contains 4 classes of potatoes and also a lot of other fruits and vegetables.
有这种想法创造这一课土豆分类之后,我很快就发现这个数据集,其中包含4类土豆,也有很多其他的水果和蔬菜。
fruits 360 dataset. 水果360数据集中的图像样本。2.卷积神经网络(CNN) (2. Convolutional Neural Networks (CNNs))
The building blocks for computer vision are the Convolutional Neural Networks. These networks usually combine several layers of kernel convolution operations and downscaling.
卷积神经网络是计算机视觉的基础。 这些网络通常结合了几层内核卷积运算和缩减规模。
The animation below is a great visualization of the kernel convolution operations. The kernel, which is a small matrix, usually 3x3, moves over the entire image. Instead of calling it an image let’s refer to it as the input feature map to be more general.
下面的动画很好地展示了内核卷积操作。 内核是一个很小的矩阵,通常为3x3,在整个图像上移动。 与其将其称为图像,不如将其称为输入要素图,以使其更为通用。
Theano documentation. Theano文档中的卷积示例。At each step, the values of the kernel 3x3 matrix are multiplied elementwise to the corresponding values of the input feature map (blue matrix in the animation above) and the sum of those 9 products is the value for the output, resulting on the green matrix in the animation. The numbers in the kernel are parameters of the model to be learned. That way the model can learn to identify spatial patterns that are the basis of computer vision. By having multiple layers and gradually downscaling the images, the patterns learned by each convolutional layer are more and more complex. To get a deeper intuition of CNNs I recommend this story by Irhum Shafkat.
在每个步骤中,将内核3x3矩阵的值逐元素地乘以输入特征图的相应值(上面的动画中的蓝色矩阵),并且这9个乘积之和是输出的值,产生绿色矩阵在动画中。 内核中的数字是要学习的模型的参数。 这样,模型就可以学习识别作为计算机视觉基础的空间模式。 通过具有多层并逐渐缩小图像的尺寸, 每个卷积层学习的模式变得越来越复杂。 为了更深入地了解CNN,我推荐Irhum Shafkat 讲的这个故事 。
The idea of CNNs has been around since the 80s but it started to gain momentum in 2012 when the winners of ImageNet competition used such approach and ‘crushed’ the competition. Their paper describing the solution has the following abstract:
CNN的想法自80年代开始就出现了,但是在2012年ImageNet竞赛的获胜者使用这种方法并“压垮”了比赛时,它就开始流行。 他们描述解决方案的论文摘要如下:
“We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.”
“我们训练了一个大型的深度卷积神经网络,将ImageNet LSVRC-2010竞赛中的120万个高分辨率图像分类为1000个不同的类别。 在测试数据上,我们实现了前1个和前5个错误率分别为37.5%和17.0%,这比以前的最新技术要好得多。 该神经网络具有6000万个参数和65万个神经元,它由五个卷积层组成,其中一些层是最大卷积层,以及三个完全连接的层,最终具有1000路softmax。 为了使训练更快,我们使用了非饱和神经元和卷积运算的非常高效的GPU实现。 为了减少全连接层的过度拟合,我们采用了一种新近开发的正则化方法,称为“丢包”,这种方法被证明非常有效。 我们还在ILSVRC-2012竞赛中输入了该模型的变体,并获得了最高的前5名测试错误率15.3%,而第二名则达到了26.2%。”
A top 5 error rate of 15.3% compared to 26.2% for the second-best entry is a huge breakthrough. Fast forward to today and the current top result for top 5 accuracy is 98.7% (error rate 1.3%).
前五名的错误率为15.3%,而第二名的错误率为26.2%,这是一个巨大的突破。 快进到今天, 当前排名 前5位的准确性 最高的结果 是98.7%(错误率1.3%)。
Let’s now code a very simple CNN with just two convolutional layers and use it to create a potato classifier!
现在,让我们编写一个只有两个卷积层的非常简单的CNN,并使用它来创建马铃薯分类器!
The first convolutional layer
nn.Conv2d
has 3 input channels and 32 output channels with a kernel size of 3x3. The number of input channels of 3 corresponds to the RGB image channels. The output channels number is just a choice.nn.Conv2d
积层nn.Conv2d
具有3个输入通道和32个输出通道 , 内核大小为3x3 。 输入通道数为3时,对应于RGB图像通道。 输出通道号只是一个选择。The second convolutional layer has 32 input channels to match the number of outputs channels of the previous layer and 64 output channels.
第二个卷积层具有32个输入通道以匹配上一层的输出通道数和64个输出通道 。
Notice in lines 9 and 10 that after the convolutional layers I apply a
F.max_pool2d
and aF.relu
. The max-pooling operation will simply downscale the image by selecting the maximum value of each 2x2 pixels. That way the resulting image has half the size. The ReLU is a non-linear activation function, as I mentioned in lesson 1 of this series.注意,在第9和10行中,在卷积层之后,我应用了
F.max_pool2d
和F.relu
。 通过选择每个2x2像素的最大值, 最大合并操作将简单地缩小图像的比例。 这样,生成的图像只有一半大小。 正如我在本系列的第1课中提到的那样, ReLU是一种非线性激活函数。After two convolutions and max poolings with the size of 2, the resulting feature map has 1/4 the size of the original image. I will be working with 64x64 images therefore this will result in a feature map of 16x16. I could add more of these convolutional layers but at some point when the feature map is already quite small, usually, the next step is to use an Average Pooling to reduce the feature map to 1x1 simply by computing the average. Notice that as we have 64 channels the resulting tensor will have a shape of
(batch-size, 64, 1, 1)
that then is reshaped to to(batch-size, 64)
before applying the final linear layer.经过两次卷积和大小为2的最大池化后,所得特征图的大小为原始图像的1/4。 我将使用64x64图像,因此这将导致16x16的特征图。 我可以添加更多的这些卷积层,但是在某个时候特征图已经很小的时候,通常,下一步是使用平均池通过简单地计算平均值将特征图减少到1x1。 请注意,由于我们具有64个通道,因此生成的张量将具有
(batch-size, 64, 1, 1)
64,1,1(batch-size, 64, 1, 1)
的形状,然后在应用最终线性层之前将其重塑为(batch-size, 64)
。The final linear layer has an input size of 64 and an output size equal to the number of classes to predict. For this case, it will be 4 types of potatoes.
最终的线性层的输入大小为64, 输出大小等于要预测的类数 。 对于这种情况,将是4种土豆。
Note: A good way to understand how everything work is to use the Python debugger. You can import pdb
and include pdb.set_trace()
right in the forward method. Then you can move step by step and check the shapes of each layer to give you a better intuition or help to debug problems.
注意:了解一切工作原理的一个好方法是使用Python调试器 。 您可以import pdb
并在forward方法中包含pdb.set_trace()
。 然后,您可以逐步移动并检查每一层的形状,以使您拥有更好的直觉或帮助调试问题。
3.使用fastai2使您的生活更轻松 (3. Using fastai2 to Make Your Life Easier)
It doesn’t worth wasting your time coding every step of the deep learning pipeline when there are tools that can make your life easier. That’s why in this story I’ll use fastai2 library to do most of the work. Nevertheless, I will use the basic CNN model defined in the previous section. Note that fastai2 uses PyTorch and makes customization of every step easy, making it useful for both beginner and advanced deep learning practitioners and researchers.
如果有可以使您的生活更轻松的工具,那么浪费您的时间在深度学习管道的每个步骤上进行编码是不值得的。 这就是为什么在本故事中,我将使用fastai2库来完成大部分工作。 不过,我将使用上一节中定义的基本CNN模型。 请注意,fastai2使用PyTorch并使每个步骤的自定义变得容易, 这对于初学者和高级深度学习从业人员以及研究人员都非常有用。
The following 12 lines of code are the entire Deep Learning pipeline in fastai2, using the BasicCNN defined in the previous section! You can find here the notebook with all the code for this lesson.
以下12行代码是使用上一节中定义的BasicCNN组成的fastai2整个深度学习管道! 您可以在此处找到带有本课程所有代码的笔记本。
Lines 1 — 6: The fastai DataBlock is defined. I covered the topic of fastai DataBlock in this and this stories. The ImageBlock and CategoryBlock indicate that the dataloaders will have an input of image type and a target of categorical type.
第1至6行:定义了fastai DataBlock 。 我浑身fastai数据块的话题这个和这个故事。 ImageBlock和CategoryBlock指示数据加载器将具有图像类型的输入和类别类型的目标。
Lines 2 and 3: The
get_x
andget_y
are the arguments where the function to process the input and targets is given. In this case, I will be reading from a pandas dataframe the columns ‘file’ (with the path to each image file) and ‘id’ with the type of potato.第2行和第3行:
get_x
和get_y
是给出处理输入和目标的函数的参数。 在这种情况下,我将从熊猫数据框中读取“文件”列(带有每个图像文件的路径)和“ id”列以及马铃薯的类型。Line 4: The
splitter
is the argument where you can tell how to split the data into train and validation sets. Here I usedRandomSplitter
that by default selects 20% of the data randomly to create the validation set.第4行:
splitter
是一个参数,您可以在其中告诉如何将数据拆分为训练集和验证集。 在这里,我使用RandomSplitter
,默认情况下会随机选择20%的数据来创建验证集。Line 5: A transformation is added to resize the images to 64x64.
第5行:添加了一种转换,以将图像调整为64x64。
Line 6: Normalization and image augmentations are included. Notice that I’m using the default augmentations. One nice thing about fastai is that most of the time you can use the default and it works. This is very good for learning because you don’t need to understand all the details before you start doing interesting work.
第6行:包括标准化和图像增强。 请注意,我正在使用默认扩充。 关于fastai的一件好事是,大多数时候您都可以使用默认值并且它可以工作。 这对学习非常有好处,因为在开始有趣的工作之前,您不需要了解所有细节。
Line 8: The dataloaders object is created. (The
train_df
is the dataframe withfile
andid
columns, check the full code here).第8行:创建了dataloaders对象。 (
train_df
是具有file
和id
列的数据train_df
,请在此处查看完整代码)。Line 9: Creating an instance of the BasicCNN model with a number of classes of 4 (notice that
dls.c
indicates the number of classes automatically).第9行:创建具有4个类的类的BasicCNN模型的实例(注意
dls.c
自动指示类的数量)。Line 10: The fastai Learner object is defined. This is where you indicate the model, loss function, optimizer and validation metrics. The loss function I will use is
nn.CrossEntropyLoss
that as covered in the previous lesson is the first choice for classification problems with more than 2 categories.第10行:定义了fastai 学习者对象。 您可以在此处指示模型,损失函数,优化器和验证指标。 我将使用的损失函数是
nn.CrossEntropyLoss
,如上一课所述,它是解决2个以上类别的分类问题的首选。Line 12: The model is trained for 30 epochs using a once-cycle learning rate schedule (the learning rate increases fast up to the
lr_max
and then gradually decreases) and a weight decay of 0.01.第12行:使用一次周期学习率计划(学习率快速增加到
lr_max
,然后逐渐减小)并权重衰减为0.01,对模型训练了30个纪元。
After training for 30 epochs I got a validation accuracy of 100% with this simple CNN model! This is what the training and validation loss looks like a train progresses:
在训练了30个纪元后,我使用此简单的CNN模型获得了100%的验证准确性! 这是训练和验证损失看起来像火车前进的样子:
Train and validation loss evolution over the training. Image by the author. 在培训过程中进行培训和验证损失的演变。 图片由作者提供。And that’s it! If you followed along with the code you can now identify among 4 types of potatoes very accurately. And most importantly, nothing in this example is specific about potatoes! You can apply a similar approach to virtually anything you want to classify!
就是这样! 如果遵循了代码,您现在可以非常准确地在4种土豆中进行识别。 最重要的是,在此示例中,没有什么是土豆特有的! 您可以对几乎所有您想要分类的东西都应用类似的方法!
家庭作业 (Homework)
I can show you a thousand examples but you will learn the most if you can make one or two experiments by yourself! The complete code for this story is available on this notebook.
我可以向您展示一千个示例,但如果您自己进行一两个实验,您将学到最多的知识! 有关此故事的完整代码,请参阅此笔记本 。
- As in the previous lesson, try to play with the learning rate, number of epochs, weight decay and the size of the model. 与上一课一样,尝试发挥学习率,历元数,权重衰减和模型的大小。
Instead of the BasicCNN model, try using a Resnet34 pretrained on ImageNet (take a look at fastai
cnn_learner
) How do results compare? You can try larger image sizes and activate the GPU on the Kaggle kernel to make training faster! (Kaggle provides you with 30h/week of GPU usage for free)代替BasicCNN模型,尝试使用在ImageNet上经过预训练的Resnet34(看看fastai
cnn_learner
)结果如何比较? 您可以尝试更大的图像尺寸,并在Kaggle内核上激活GPU,以加快训练速度! (Kaggle免费为您提供每周30小时的GPU使用量)- Train now the model using all fruits and vegetables in the dataset and take a look of the results. The dataset also includes a test set that you can use to further test the trained model! 现在使用数据集中的所有水果和蔬菜训练模型并查看结果。 数据集还包含一个测试集,可用于进一步测试训练后的模型!
And as always, if you create interesting notebooks with nice animations as a result of your experiments, go ahead and share them on GitHub, Kaggle or write a Medium story!
而且,像往常一样,如果您通过实验创建了带有精美动画的有趣笔记本,请继续在GitHub,Kaggle上共享它们,或撰写一个中型故事!
结束语 (Final remarks)
This ends the third story in the Learn AI Today series!
到此为止,《今日学习AI》系列的第三个故事!
Please consider joining my mailing list in this link so that you won’t miss any of my upcoming stories!
请考虑通过此链接加入我的邮件列表 这样您就不会错过任何我即将发表的故事!
I will also be listing the new stories at learn-ai-today.com, the page I created for this learning journey, and at this GitHub repository!
我还将在learning-ai-today.com ,为此学习旅程创建的页面以及此GitHub存储库中列出新故事!
And in case you missed it before, this is the link for the Kaggle notebook with the code for this story!
万一您之前错过了它, 这是Kaggle笔记本的链接以及此故事的代码 !
Feel free to give me some feedback in the comments. What did you find most useful or what could be explained better? Let me know!
请随时在评论中给我一些反馈。 您觉得最有用的是什么? 让我知道!
You can read more about my Deep Learning journey on the following stories!
您可以在以下故事中阅读有关我的深度学习之旅的更多信息!
Thanks for reading! Have a great day!
谢谢阅读! 祝你有美好的一天!
翻译自: https://towardsdatascience.com/learn-ai-today-03-potato-classification-using-convolutional-neural-networks-4481222f2806