图像分割和目标检测
This article was originally written by Jakub Cieślik posted on the Neptune blog.
本文最初由 JakubCieślik 撰写, 发布在 Neptune博客上 。
I’ve been working with object detection and image segmentation problems for many years. An important realization I made is that people don’t put the same amount of effort and emphasis on data exploration and results analysis as they would normally in any other non-image machine learning project.
我从事对象检测和图像分割问题已有多年了。 我取得的一个重要认识是,人们没有像通常在其他任何非图像机器学习项目中那样付出大量的精力和精力来进行数据探索和结果分析。
Why is it so?
为什么会这样呢?
I believe there are two major reasons for it:
我相信有两个主要原因:
People don’t understand object detection and image segmentation models in depth and treat them as black boxes, in that case they don’t even know what to look at and what the assumptions are.
人们不了解对象检测和图像分割模型的深度 ,将它们视为黑匣子,在这种情况下,他们甚至不知道要看什么和假设是什么。
It can be quite tedious from a technical point of view as we don’t have good image data exploration tools.
从技术的角度来看, 这可能非常繁琐,因为我们没有好的图像数据探索工具。
In my opinion image datasets are not really an exception, understanding how to adjust the system to match our data is a critical step to success.
我认为图像数据集并不是真正的例外,了解如何调整系统以匹配我们的数据是成功的关键一步。
In this article I will share with you how I approach data exploration for image segmentation and object detection problems. Specifically:
在本文中,我将与您分享如何解决图像分割和对象检测问题的数据探索。 特别:
- Why you should care about image and object dimensions, 为什么要关心图像和对象的尺寸,
- Why small objects can be problematic for many deep learning architectures, 为什么小对象可能对许多深度学习架构有问题,
- Why tackling class imbalances can be quite hard, 为什么解决班级失衡可能非常困难,
- Why a good visualization is worth a thousand metrics, 为什么一个好的可视化值得一千个指标,
- The pitfalls of data augmentation. 数据扩充的陷阱。
需要数据探索以进行图像分割和目标检测 (The need for data exploration for image segmentation and object detection)
Data exploration is key to a lot of machine learning processes. That said, when it comes to object detection and image segmentation datasets there is no straightforward way to systematically do data exploration.
数据探索是许多机器学习过程的关键。 就是说,当涉及对象检测和图像分割数据集时, 没有直接的方法来系统地进行数据探索。
There are multiple things that distinguish working with regular image datasets from object and segmentation ones:
有很多东西可以区分使用常规图像数据集与对象图像和分割图像数据集:
- The label is strongly bound to the image. Suddenly you have to be careful of whatever you do to your images as it can break the image-label-mapping. 标签与图像牢固结合。 突然,您必须对图像执行任何操作时要小心,因为它可能会破坏图像标签映射。
- Usually much more labels per image. 通常每个图像有更多标签。
- Much more hyperparameters to tune (especially if you train on your custom datasets) 还有更多需要调整的超参数(特别是如果您对自定义数据集进行训练)
This makes evaluation, results exploration and error analysis much harder. You will also find that choosing a single performance measure for your system can be quite tricky — in that case manual exploration might still be a critical step.
这使得评估,结果探索和错误分析变得更加困难。 您还将发现,为系统选择单个性能指标可能非常棘手-在这种情况下,手动探索可能仍然是关键的一步。
数据质量和常见问题 (Data Quality and Common Problems)
The first thing you should do when working on any machine learning problem (image segmentation, object detection included) is assessing quality and understanding your data.
处理任何机器学习问题(图像分割,包括对象检测)时, 您应该做的第一件事 是评估质量并理解您的数据。
Common data problems when training Object Detection and Image Segmentation models include:
训练对象检测和图像分割模型时的常见数据问题包括:
- Image dimensions and aspect ratios (especially dealing with extreme values) 图像尺寸和宽高比(特别是极值)
- Labels composition — imbalances, bounding box sizes, aspect ratios (for instance a lot of small objects) 标签组成-不平衡,边框大小,长宽比(例如许多小物体)
- Data preparation not suitable for your dataset. 数据准备不适合您的数据集。
- Modelling approach not aligned with the data. 建模方法与数据不一致。
Those will be especially important if you train on custom datasets that are significantly different from typical benchmark datasets such as COCO. In the next chapters, I will show you how to spot the problems I mentioned and how to address them.
如果您训练的自定义数据集与典型基准数据集(例如COCO)明显不同,那么这些将特别重要 。 在下一章中,我将向您展示如何发现我提到的问题以及如何解决它们。
General Data Quality
通用数据质量
This one is simple and rather obvious, also this step would be the same for all image problems not just object detection or image segmentation. What we need to do here is:
这一步骤很简单,而且很明显,对于所有图像问题(不仅是对象检测或图像分割),此步骤都将是相同的。 我们在这里需要做的是:
- get the general feel of a dataset and inspect it visually. 获得数据集的总体感觉并进行直观检查。
- make sure it’s not corrupt and does not contain any obvious artifacts (for instance black only images) 确保它没有损坏并且不包含任何明显的伪像(例如,仅黑色图像)
make sure that all the files are readable — you don’t want to find that out in the middle of your training.
确保所有文件都是可读的-您不想在培训过程中发现这些文件。
My tip here is to visualize as many pictures as possible. There are multiple ways of doing this. Depending on the size of the datasets some might be more suitable than the others.
我在这里的提示是要可视化尽可能多的图片。 有多种方法可以做到这一点。 根据数据集的大小,某些数据可能比其他数据更合适。
- Plot them in a jupyter notebook using matplotlib. 使用matplotlib将它们绘制在jupyter笔记本中。
Use dedicated tooling like google facets to explore image data (https://pair-code.github.io/facets/)
使用Google构面等专用工具来浏览图像数据( https://pair-code.github.io/facets/ )
Use HTML rendering to visualize and explore in a notebook.
使用HTML渲染在笔记本中可视化和浏览 。
I’m a huge fan of the last option, it works great in jupyter notebooks (even for thousands of pictures at the same time!) Try doing that with matplotlib. There is even more: you can install a hover-zoom extension that will allow you to zoom in into individual pictures to inspect them in high-resolution.
我是最后一个选择的忠实拥护者,它在jupyter笔记本电脑上非常有效(即使同时可处理数千张图片!)请尝试使用matplotlib进行操作。 还有更多功能:您可以安装一个悬停缩放扩展程序,该扩展程序将允许您放大单个图片以高分辨率查看它们。
Image sizes and aspect Ratios
图像尺寸和宽高比
In the real world, datasets are unlikely to contain images of the same sizes and aspect ratios. Inspecting basic datasets statistics such as aspect ratios, image widths and heights will help you make important decisions:
在现实世界中,数据集不太可能包含相同大小和纵横比的图像。 检查基本的数据集统计信息(例如长宽比,图像宽度和高度) 将帮助您做出重要的决定 :
- Can you and should you? do destructive resizing ? (destructive means resizing that changes the AR) 你能而且应该吗? 破坏性调整大小吗? (破坏性意味着调整AR的大小)
- For non-destructive resizing what should be your desired output resolution and amount of padding? 对于非破坏性调整大小,您所需的输出分辨率和填充量应该是多少?
- Deep Learning models might have hyper parameters you have to tune depending on the above (for instance anchor size and ratios) or they might even have strong requirements when it comes to minimum input image size. 深度学习模型可能需要根据上面的参数来调整超级参数(例如锚点大小和比率),或者在最小输入图像大小方面甚至可能有很强的要求。
Good resources about anchors.
关于锚的良好资源 。
A special case would be if your dataset consists of images that are really big (4K+), which is not that unusual in satellite imagery or some medical modalities. For most cutting edge models in 2020, you will not be able to fit even a single 4K image per (server grade) GPU due to memory constraints. In that case, you need to figure out what realistically will be useful for your DL algorithms.
一种特殊情况是,如果您的数据集包含非常大的图像 (4K +),这在卫星图像或某些医疗模式中并不罕见。 对于2020年的大多数尖端型号,由于内存限制,每个(服务器级)GPU甚至无法容纳单个4K图像。 在这种情况下,您需要弄清楚什么对您的DL算法有用。
Two approaches that I saw are:
我看到的两种方法是:
- Training your model on image patches (randomly selected during training or extracted before training) 在图像补丁上训练模型(在训练过程中随机选择或在训练之前提取模型)
- resizing the entire dataset to avoid doing this every time you load your data. 调整整个数据集的大小,以避免每次加载数据时执行此操作。
In general I would expect most datasets to fall into one of 3 categories.
总的来说,我希望大多数数据集可以归为3类之一。
Uniformly distributed where most of the images have the same dimensions — here the only decision you will have to make is how much to resize (if at all) This will mainly depend on objects area, size and aspect ratios)
均匀分布在大多数图像具有相同尺寸的地方 -在这里,您唯一要做的决定就是调整大小(如果有的话),这主要取决于对象的面积,大小和纵横比)
Slightly bimodal distribution but most of the images are in the aspect ratio range of (0.7 … 1.5) similar to the COCO dataset. I believe other “natural-looking” datasets would follow a similar distribution — for those type of datasets you should be fine by going with a non-destructive resize -> Pad approach. Padding will be necessary but to a degree that is manageable and will not blow the size of the dataset too much.
轻微的双峰分布,但大多数图像的纵横比范围为(0.7…1.5),类似于COCO数据集。 我相信其他“看起来自然”的数据集也会遵循类似的分布-对于那些类型的数据集,您应该采用无损调整大小->填充方法。 填充是必需的,但在一定程度上是可管理的,不会过多影响数据集的大小。
Dataset with a lot of extreme values (very wide images mixed with very narrow ones) — this case is much more tricky and there are more advanced techniques to avoid excessive padding. You might consider sampling batches of images based on the aspect ratio. Remember that this can introduce a bias to your sampling process — so make sure its acceptable or not strong enough.
具有很多极值的数据集 (非常宽的图像和非常狭窄的图像混合在一起)—这种情况更加棘手,并且有更先进的技术可以避免过度填充。 您可能考虑根据宽高比对图像批次进行采样。 请记住,这会给您的采样过程带来偏差,因此请确保其可接受程度或不足。
The mmdetection framework supports this out of the box by implementing a GroupSampler that samples based on AR’s
mmdetection框架通过实现可基于AR采样的GroupSampler来开箱即用地支持这一点。
Fig. 3. Example Image (resized and padded) with a extreme aspect ratios from the coco dataset (source: neptune.ai). 图3.来自coco数据集的具有极高纵横比的示例图像(调整大小和填充) (来源:neptune.ai)。Label (objects) sizes and dimensions
标签(对象)的大小和尺寸
Here we start looking at our targets (labels). Particularly we are interested in knowing how the sizes and aspect ratios are distributed.
在这里,我们开始查看目标(标签)。 特别是, 我们有兴趣了解尺寸和纵横比的分布方式。
Why is this important?
为什么这很重要?
Depending on your modelling approach most of the frameworks will have design limitations. As I mentioned earlier, those models are designed to perform well on benchmark datasets. If for whatever reason your data is different, training them might be impossible. Let’s have a look at a default config for Retinanet from detectron2:
根据您的建模方法, 大多数框架都会有设计限制 。 如前所述,这些模型旨在在基准数据集上表现良好。 如果出于任何原因您的数据不同,则可能无法对其进行培训。 让我们看一下detectron2中Retinanet的默认配置 :
ANCHOR_GENERATOR:
SIZES: !!python/object/apply:eval ["[[x, x * 2**(1.0/3), x * 2**(2.0/3) ] for x in [32, 64, 128, 256, 512 ]]"]
What you can see there is, that for different feature maps the anchors we generate will have a certain size range:
您会看到, 对于不同的要素地图,我们生成的锚点将具有一定的大小范围 :
- for instance, if your dataset contains only really big objects — it might be possible to simplify the model a lot, 例如,如果您的数据集仅包含非常大的对象,则可以大大简化模型,
- on the other side let’s assume you have small images with small objects (for instance 10x10px) given this config it can happen you will not be able to train the model. 另一方面,假设您具有带有小对象(例如10x10px)的小图像,则使用此配置可能会导致您无法训练模型。
The most important things to consider when it comes to box or mask dimensions are:
关于盒子或面罩尺寸,要考虑的最重要的事情是:
- Aspect ratios 长宽比
- Size (Area) 尺寸(面积)
The tail of this distribution (fig. 3) is quite long. There will be instances with extreme aspect ratios. Depending on the use case and dataset it might be fine to ignore it or not, this should be further inspected.
该分布的尾部(图3)很长。 某些情况下会出现极高的宽高比。 根据用例和数据集,是否忽略它可能会很好,应对此进行进一步检查。
Fig. 5. Mean area of bounding box per category (source: neptune.ai). 图5.每个类别的包围盒平均面积 (来源:neptune.ai)。This is especially true for anchor-based models (most of object detection / image segmentation models) where there is a step of matching ground truth labels with predefined anchor boxes (aka. Prior boxes).
对于基于锚点的模型(大多数对象检测/图像分割模型)尤其如此,在该模型中,需要将地面真相标签与预定义的锚点框(也称为先验框)进行匹配。
Remember that you control how those prior boxes are generated with hyperparameters like the number of boxes, their aspect ratio, and size. Not surprisingly you need to make sure those settings are aligned with your dataset distributions and expectations.
请记住,您可以控制如何使用超参数(如框的数量,其长宽比和大小)生成这些先前的框。 毫不奇怪,您需要确保这些设置与您的数据集分布和期望保持一致。
An important thing to keep in mind is that labels will be transformed together with the image. So if you are making an image smaller during a preprocessing step the absolute size of the ROI’s will also shrink.
要记住的重要一点是,标签将与图像一起转换 。 因此,如果您在预处理步骤中将图像缩小,则ROI的绝对大小也会缩小。
If you feel that object size might be an issue in your problem and you don’t want to enlarge the images too much (for instance to keep desired performance or memory footprint) you can try to solve it with a Crop -> Resize approach. Keep in mind that this can be quite tricky (you need to handle cases what happens if you cut through a bounding box or segmentation mask)
如果您觉得对象大小可能是问题中的一个问题,并且您不想太大地放大图像 (例如,保持所需的性能或内存占用),则可以尝试使用Crop-> Resize方法解决它 。 请记住,这可能非常棘手(您需要处理穿过边界框或分割蒙版时发生的情况)
Big objects on the other hand are usually not problematic from a modelling perspective (although you still have to make sure that will be matched with anchors). The problem with them is more indirect, essentially the more big objects a class has the more likely it is that it will be underrepresented in the dataset. Most of the time the average area of objects in a given class will be inversely proportional to the (label) count.
另一方面,从建模的角度来看,大对象通常没有问题(尽管您仍然必须确保将其与锚点匹配)。 它们的问题更为间接,本质上是一类具有的较大对象越可能在数据集中表示不足。 在大多数情况下,给定类中对象的平均面积将与(标签)计数成反比。
Partially labeled data
部分标记的数据
When creating and labeling an image detection dataset missing annotations are potentially a huge issue. The worst scenario is when you have false negatives already in your ground truth. So essentially you did not annotate objects even though they are present in the dataset.
在创建和标记图像检测数据集时,缺少注释可能是一个巨大的问题。 最糟糕的情况是当您的基本事实中已经存在假阴性时。 因此,即使对象存在于 数据集。
In most of the modeling approaches, everything that was not labeled or did not match with an anchor is considered background. This means that it will generate conflicting signals that will hurt the learning process a LOT.
在大多数建模方法中,所有未标记或与锚点不匹配的东西都被视为背景。 这意味着它将产生相互矛盾的信号,从而损害很多学习过程。
This is also a reason why you can’t really mix datasets with non-overlapping classes and train one model (there are some way to mix datasets though — for instance by soft labeling one dataset with a model trained on another one)
这也是为什么您不能真正地将数据集与不重叠的类混合并训练一个模型的原因(尽管有某种方法可以混合数据集,例如,用一个在另一个模型上训练过的模型对一个数据集进行软标记)
Fig. 7. Shows the problem of mixing datasets — notice for example that on the right image a person is not labeled. One way to solve this problem is to soft label the dataset with a model trained on the other one (source: OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation ). 图7.显示了混合数据集的问题-请注意,例如,在右侧图像上未标记人。 解决此问题的一种方法是使用在另一模型上训练过的模型对数据集进行软标记(来源: OMNIA Faster R-CNN:通过数据集合并和软蒸馏在野外进行检测 )。Imbalances
失衡
Class imbalances can be a bit of a problem when it comes to object detection. Normally in image classification for example, one can easily oversample or downsample the dataset and control each class contribution to the loss.
当涉及对象检测时,类不平衡可能会有点问题。 例如,通常在图像分类中,可以轻松地对数据集进行过采样或降采样,并控制每一类对损失的贡献。
Fig. 8. Object counts per class (source: neptune.ai). 图8.每个类的对象计数(来源:neptune.ai)。You can imagine this is more challenging when you have co-occurring classes object detection dataset since you can’t really drop some of the labels (because you would send mixed signals as to what the background is).
您可以想象,在同时存在类对象检测数据集的情况下,这更具挑战性,因为您无法真正删除某些标签(因为您将发送关于背景是什么的混合信号)。
In that case you end up having the same problem as shown in the partially labeled data paragraph. Once you start resampling on an image level you have to be aware of the fact that multiple classes will be upsampled at the same time.
在这种情况下,您最终会遇到与部分标记的数据段落中所示相同的问题。 一旦开始在图像级别上重新采样,就必须意识到多个类将同时被上采样的事实。
Note:
注意:
You may want to try other solutions like:
您可能想尝试其他解决方案,例如:
- Adding weights to the loss (making the contributions of some boxes or pixels higher) 增加损失的权重(使某些方框或像素的贡献更高)
- Preprocessing your data differently: for example you could do some custom cropping that rebalances the dataset on the object level 以不同的方式预处理数据:例如,您可以进行一些自定义裁剪,以在对象级别重新平衡数据集
了解扩充和预处理序列 (Understanding augmentation and preprocessing sequences)
Preprocessing and data augmentation is an integral part of any computer vision system. If you do it well you can gain a lot but if you screw up it can really cost you.
预处理和数据增强是任何计算机视觉系统的组成部分 。 如果做得好,你会收获很多,但是如果搞砸了,那确实会花钱。
Data augmentation is by far the most important and widely used regularization technique (in image segmentation / object detection ).
迄今为止,数据增强是最重要且使用最广泛的正则化技术(在图像分割/对象检测中)。
Applying it to object detection and segmentation problems is more challenging than in simple image classification because some transformations (like rotation, or crop) need to be applied not only to the source image but also to the target (masks or bounding boxes). Common transformations that require a target transform include:
与简单的图像分类相比,将其应用于对象检测和分割问题更具挑战性,因为不仅需要将某些转换(例如旋转或裁剪)应用于源图像,还需要应用于目标(蒙版或边界框)。 需要目标转换的常见转换包括:
- Affine transformations, 仿射变换
- Cropping, 播种
- Distortions, 失真,
- Scaling, 缩放比例
- Rotations 轮换
- and many more. 还有很多。
It is crucial to do data exploration on batches of augmented images and targets to avoid costly mistakes (dropping bounding boxes, etc).
对成批的增强图像和目标进行数据探索,以避免代价高昂的错误(丢弃边界框等)至关重要。
Note:
注意:
Basic augmentations are a part of deep learning frameworks like PyTorch or Tensorflow but if you need more advanced functionalities you need to use one of the augmentation libraries available in the python ecosystem. My recommendations are:
基本扩充是PyTorch或Tensorflow等深度学习框架的一部分,但如果您需要更高级的功能,则需要使用python生态系统中可用的扩充库之一。 我的建议是:
Albumentations (I’ll use it in this post)
专辑化 (我将在本文中使用它)
Imgaug
英高
Augmentor
增强器
最小的预处理设置 (The minimal preprocessing setup)
Whenever I’m building a new system I want to keep it very basic on the preprocessing and augmentation level to minimize the risk of introducing bugs early on. Basic principles I would recommend you to follow is:
每当我构建一个新系统时,我都希望在预处理和扩充级别上保持非常基础,以最大程度地降低早期引入错误的风险。 我建议您遵循的基本原则是:
- Disable augmentation 禁用增强
- Avoid destructive resizing 避免破坏性调整大小
- Always inspect the outputs visually 始终目视检查输出
Let’s continue our COOC example. From the previous steps we know that:the majority of our images have:
让我们继续我们的COOC示例。 通过前面的步骤,我们知道:大多数图像具有:
- aspect ratios = width / height = 1.5 宽高比=宽度/高度= 1.5
- the average avg_width is = 600 and avg_height = 500. 平均avg_width = 600和avg_height = 500。
Setting the averages as our basic preprocessing resize values seems to be a reasonable thing to do ( unless there is a strong requirement on the model side to have bigger pictures ) for instance a resnet50 backbone model has a minimum size requirement of 32×32 (this is related to the number of downsampling layers)
将平均值设置为我们的基本预处理调整大小值似乎是一件合理的事情(除非在模型方面强烈要求拥有更大的图片),例如,resnet50主干模型的最小大小要求为32×32(这是与下采样层数有关)
In Albumentations the basic setup implementation will look something like this:
在Albumentations中,基本设置实现将如下所示:
- LongestMaxSize(avg_height) — this will rescale the image based on the longest side preserving the aspect ratio LongestMaxSize(avg_height)-这将基于保留长宽比的最长边来重新缩放图像
- PadIfNeeded(avg_height, avg_width, border_mode=’FILL’, value=0) PadIfNeeded(avg_height,avg_width,border_mode ='FILL',value = 0)
As you can see on figure 10 and 11 the preprocessing results in an image of 500×600 with reasonable 0-padding for both pictures.
如您在图10和11上所看到的,预处理将得到500×600的图像,并且两张图片都具有合理的0填充。
When you use padding there are many options in which you can fill the empty space. In the basic setup I suggest that you go with default constant 0 value,
使用填充时,有很多选项可以填充空白区域。 在基本设置中,我建议您使用默认的常量0值,
When you experiment with more advanced methods like reflection padding always explore your augmentations visually. Remember that you are running the risk of introducing false negatives especially in object detection problems (reflecting an object without having a label for it)
当您尝试使用诸如反射填充之类的更高级的方法时,请始终以可视方式探索您的增强。 请记住,您冒着引入假阴性的风险,尤其是在物体检测问题中(反射物体而没有标签)
Fig. 10. Notice how reflection-padding creates false negative errors in our annotations. The cat’s reflection (top of the picture) has no label! (source: neptune.ai). 图10。 请注意反射填充如何在我们的注释中产生假的负错误。 猫的反射(图片顶部)没有标签! (来源:neptune.ai)。Augmentation — Rotations
增强—旋转
Rotations are powerful and useful augmentations but they should be used with caution. Have a look at fig 11. below which was generated using a Rotate(45)->Resize->Pad pipeline.
轮换是功能强大且有用的增强功能,但应谨慎使用。 看一下图11,下面是使用Rotate(45)-> Resize-> Pad管道生成的。
Fig. 11. Rotations can be harmful to your bounding box labels (source: neptune.ai). 图11.旋转可能对边界框标签有害(来源:neptune.ai)。The problem is that if we use standard bounding boxes (without an angle parameter), covering a rotated object can be less efficient (box-area to object-area will increase). This happens during rotation augmentations and it can harm the data. Notice that we have also introduced false positive labels in the top left corner. This is because we crop-rotated the image.
问题是,如果我们使用标准的边界框(不带角度参数),则覆盖旋转的对象的效率可能会降低(框面积到对象面积会增加)。 这种情况在旋转增强期间发生,并且可能会损坏数据。 请注意,我们还在左上角引入了误报标签。 这是因为我们裁剪了图像。
My recommendation is:
我的建议是:
- You might want to give up on those if you have a lot of objects with aspect ratios far from one. 如果您有许多对象的宽高比远非一个,则可能要放弃这些对象。
Another thing you can consider is using 90,180, 270 degree non-cropping rotations (if they make sense) for your problem (they will not destroy any bounding boxes).
您可以考虑的另一件事是针对问题使用90,180、270度非裁剪旋转(如果有意义)(它们不会破坏任何边界框)。
Augmentations — Key takeaways
增强—关键要点
As you see, spatial transforms can be quite tricky and a lot of unexpected things can happen (especially for object detection problems).
如您所见,空间变换可能非常棘手,并且可能会发生很多意料之外的事情(尤其是对于对象检测问题)。
So if you decide to use those spatial augmentations make sure to do some data exploration and visually inspect your data.
因此,如果您决定使用这些空间扩充功能,请确保进行一些数据探索并直观地检查您的数据。
Note:
注意:
Do you really need spatial augmentations? I believe that in many scenarios you will not need them and as usual keep things simpler and gradually add complexity.
您真的需要空间扩充吗? 我相信,在许多情况下,您将不需要它们,并且通常会使事情变得更简单,并逐渐增加复杂性。
From my experience a good starting point (without spatial transforms) and for natural looking datasets (similar to coco) is the following pipeline:
根据我的经验,以下管道是一个很好的起点(没有空间变换),并且对于自然外观的数据集(类似于coco):
transforms = [
LongestMaxSize(max_size=500),
HorizontalFlip(p=0.5),
PadIfNeeded(500, 600, border_mode=0, value=0),
JpegCompression(quality_lower=70, quality_upper=100, p=1),
RandomBrightnessContrast(0.3, 0.3),
Cutout(max_h_size=32, max_w_size=32, p=1)
]
Of course things like max_size or cutout sizes are arbitrary and have to be adjusted.
当然,诸如max_size或cutout大小之类的东西是任意的,必须进行调整。
Best Practice:
最佳实践:
One thing I did not mention yet that I feel is pretty important: Always load the whole dataset (together with your preprocessing and augmentation pipeline) .
我还没有提到的一件事我觉得非常重要: 始终加载整个数据集(以及您的预处理和扩充管道) 。
%%timeit -n 1 -r 1for b in data_loader: pass
Two lines of code that will save you a lot of time. First of all, you will understand what the overhead of the data loading is and if you see a clear performance bottleneck you might consider fixing it right away. More importantly, you will catch potential issues with:
两行代码可以节省大量时间。 首先,您将了解数据加载的开销,并且如果看到明显的性能瓶颈,则可以考虑立即修复它。 更重要的是,您将发现以下潜在问题:
- corrupted files, 损坏的文件,
- labels that can’t be transformed etc 无法转换的标签等
- anything fishy that can interrupt training down the line. 任何会中断训练的鱼腥味。
结果理解 (Results understanding)
Inspecting model results and performing error analysis can be a tricky process for those types of problems. Having one metric rarely tells you the whole story and if you do have one interpreting it can be a relatively hard task.
对于这些类型的问题,检查模型结果和执行错误分析可能是一个棘手的过程。 拥有一个度量标准很少能告诉您整个故事,如果您确实有一个解释,这可能是一个相对艰巨的任务。
Let’s have a look at the official coco challenge and how the evaluation process looks there (all the results i will be showing are for a MASK R-CNN model with a resnet50 backbone).
让我们看看官方的可可挑战以及那里的评估流程(我将显示的所有结果都是针对带有resnet50主干的MASK R-CNN模型)。
Fig. 13. Coco evaluation output (source: neptune.ai). 图13.可可评估输出(来源:neptune.ai)。It returns the AP and AR for various groups of observations partitioned by IOU (Intersection over Union of predictions and ground truth) and Area. So even the official COCO evaluation is not just one metric and there is a good reason for it.
它返回按IOU(预测与地面真理的交集)和Area划分的各种观测组的AP和AR。 因此,即使是官方的COCO评估也不只是一个指标,并且有充分的理由。
Lets focus on the IoU=0.50:0.95 notation.
让我们关注IoU = 0.50:0.95表示法。
What this means is the following: AP and AR is calculated as the average of precisions and recalls calculated for different IoU settings (from 0.5 to 0.95 with a 0.05 step). What we gain here is a more robust evaluation process, in such a case a model will score high if its pretty good at both (localizing and classifying)
这意味着:AP和AR是针对不同IoU设置(从0.05到0.95,以0.05步长)计算的精度和召回率的平均值。 我们在这里获得的是一个更健壮的评估过程,在这种情况下,如果模型在两个方面都非常出色(本地化和分类),则得分会很高
Of course, your problem and dataset might be different. Maybe you need an extremely accurate detector, in that case, choosing [email protected] might be a good idea.
当然,您的问题和数据集可能有所不同。 也许您需要一个非常精确的检测器,在这种情况下,选择[email protected]可能是一个好主意。
The downside (of the coco eval tool) is that by default all the values are averaged for all the classes and all images. This might be fine in a competition-like setup where we want to evaluate the models on all the classes but in real-life situations where you train models on custom datasets (often with fewer classes) you really want to know how your model performs on a per-class basis. Looking at per-class metrics is extremely valuable, as it might give you important insights:
(可可评估工具的)缺点是,默认情况下,所有类别和所有图像的所有值均取平均值。 在类似竞赛的设置中,这可能很好,我们想要评估所有类的模型,但在现实生活中,您需要在自定义数据集上训练模型(通常使用较少的类),因此您真的想知道模型的性能每个班级。 查看每个类别的指标非常有价值,因为它可以为您提供重要的见解:
- help you compose a new dataset better 帮助您更好地撰写新数据集
- make better decisions when it comes to data augmentation, data sampling etc. 在数据扩充,数据采样等方面做出更好的决策。
Figure 14. gives you a lot of useful information there are few things you might consider:
图14.为您提供了许多有用的信息,您可能需要考虑的几件事:
- Add more data to low performing classes 向表现不佳的班级添加更多数据
- For classes that score well, maybe you can consider downsampling them to speed up the training and maybe help with the performance of other less frequent classes. 对于得分较高的课程,也许您可以考虑对它们进行下采样以加快培训速度,并可能对其他不常上课的课程有所帮助。
- Spot any obvious correlations for instance classes with small objects performing poorly. 找出带有小对象性能差的实例类的任何明显关联。
Visualizing results
可视化结果
Ok, so if looking at single metrics is not enough what should you do?
好的,所以如果仅看一个指标还不够,该怎么办?
I would definitely suggest spending some time on manual results exploration, with the combination of hard metrics from the previous analysis — visualizations will help you get the big picture.
我绝对建议您花一些时间来进行手动结果探索,并结合之前分析中的严格指标-可视化将帮助您全面了解。
Since exploring predictions of image detection and image segmentation models can get quite messy I would suggest you do it step by step. On the gif below I show how this can be done using the coco inspector tool.
由于探索图像检测和图像分割模型的预测会变得非常混乱,因此建议您逐步进行操作。 在下面的gif上,我显示了如何使用coco检查器工具完成此操作。
On the gif we can see how all the important information is visualized:
在gif上,我们可以看到所有重要信息是如何可视化的:
- Red masks — predictions 红色面具-预测
- Orange masks — overlap of predictions and ground truth masks 橙色面具-预测和地面真相面具的重叠
- Green masks — ground truth 绿口罩-地面真理
- Dashed bounding boxes — false positives (predictions without a match) 虚线框-误报(预测不匹配)
- Orange boxes true positive 橙色盒子真阳性
- Green boxes — ground truth 绿箱-地面真理
Results understanding — per image scores
了解结果-每个图像得分
By looking at the hard metrics and inspecting images visually we most likely have a pretty good idea of what’s going on. But looking at results of random images (or grouped by class) is likely not an optimal way of doing this. If you want to really dive in and spot edge cases of your model, I suggest calculating per image metrics (for instance AP or Recall).
通过查看严格的指标并以视觉方式检查图像,我们很可能会对发生的事情有一个很好的了解。 但是,查看随机图像(或按类别分组)的结果可能不是实现此目的的最佳方法。 如果您想深入研究并发现模型的边缘情况,建议您按图像指标 (例如AP或召回)进行计算。
Below and example of an image I found by doing exactly that.
下面是我通过该操作找到的图像示例。
Fig. 16. Image with a very low AP score (source: neptune.ai). 图16. AP得分非常低的图像(来源:neptune.ai)。In the example above (Fig. 16) we can see two false positive stop sign predictions — from that we can deduce that our model understands what a stop sign is but not what other traffic signs are.
在上面的示例中(图16),我们可以看到两个错误的正停车标志预测-由此可以推断出我们的模型可以理解停车标志是什么,而其他交通标志不是。
Perhaps we can add new classes to our dataset or use our “stop sign detector” to label other traffic signs and then create a new “traffic sign” label to overcome this problem.
也许我们可以向数据集中添加新的类,或者使用“停车标志检测器”标记其他交通标志,然后创建新的“交通标志”标签来解决此问题。
Fig. 17. Example of an image with a good score > 0.5 AP (source: neptune.ai). 图17.得分> 0.5 AP的图像示例(来源:neptune.ai)。Sometimes we will also learn that our model is doing better that it would seem from the scores alone. That’s also useful information, for instance in the example above our model detected a keyboard on the laptop but this is actually not labeled in the original dataset.
有时,我们还会发现我们的模型表现得更好,仅从分数来看就可以了。 这也是有用的信息,例如在上面的示例中,我们的模型检测到便携式计算机上的键盘,但实际上未在原始数据集中标记该键盘。
COCO格式 (COCO format)
The way a coco dataset is organized can be a bit intimidating at first.
首先,可可数据集的组织方式可能有点令人生畏。
It consists of a set of dictionaries mapping from one to another. It’s also intended to be used together with the pycocotools / cocotools library that builds a rather confusing API on top of the dataset metadata file.
它由一组相互映射的字典组成。 它还打算与pycocotools / cocotools库一起使用,该库在数据集元数据文件的顶部构建了一个相当混乱的API。
Nonetheless, the coco dataset (and the coco format) became a standard way of organizing object detection and image segmentation datasets.
尽管如此, 可可数据集(和可可格式)成为组织对象检测和图像分割数据集的标准方法。
In COCO we follow the xywh convention for bounding box encodings or as I like to call it tlwh: (top-left-width-height) that way you can not confuse it with for instance cwh: (center-point, w, h). Mask labels (segmentations) are run-length encoded (RLE explanation).
在COCO中,我们遵循xywh约定进行边界框编码,或者我喜欢将其称为tlwh : (top-left-width-height) ,这样您就不会将其与例如cwh : (center-point,w,h)混淆。 。 掩码标签(分段)是游程长度编码的( RLE说明 )。
There are still very important advantages of having a widely adopted standard:
拥有广泛采用的标准仍具有非常重要的优点:
- Labeling tools and services export and import COCO-like datasets 标签工具和服务导出和导入类似COCO的数据集
- Evaluation and scoring code (used for the coco competition) is pretty well optimized and battle tested. 评估和评分代码(用于可可比赛)经过了很好的优化和测试。
- Multiple open source datasets follow it. 随之而来的是多个开源数据集。
In the previous paragraph, I used the COCO eval functionality which is another benefit of following the COCO standard. To take advantage of that you need to format your predictions in the same way as your coco dataset is constructed- then calculating metrics is as simple as calling: COCOeval(gt_dataset, pred_dataset)
在上一段中,我使用了COCO评估功能,这是遵循COCO标准的另一个好处。 要利用这一点,您需要以与构建可可数据集相同的方式来格式化您的预测,然后计算指标就像调用一样简单:COCOeval(gt_dataset,pred_dataset)
COCO数据集浏览器 (COCO dataset explorer)
In order to streamline the process of data and results exploration (especially for object detection) I wrote a tool that operates on COCO datasets.
为了简化数据和结果探索的过程(尤其是用于对象检测),我编写了一个可对COCO数据集进行操作的工具 。
Essentially you provide it with the ground truth dataset and the predictions dataset (optionally) and it will do the rest for you:
本质上,您为它提供了地面真实数据集和预测数据集(可选),它将为您完成其余工作:
- Calculate most of the metrics I presented in this post 计算我在这篇文章中介绍的大多数指标
- Easily visualize the datasets ground truths and predictions 轻松可视化数据集的基本事实和预测
- Inspect coco metrics, per class AP metrics 检查每类AP可可指标
- Inspect per-image scores 检查每个图像的分数
To use COCO dataset explorer tool you need to:
要使用COCO数据集资源管理器工具,您需要:
Clone the project repository
克隆项目存储库
git clone https://github.com/i008/COCO-dataset-explorer.git
- Download example data I used for the examples or use your own data in the COCO format: 下载我用于示例的示例数据或以COCO格式使用您自己的数据:
Example COCO format dataset with predictions.
带有预测的示例COCO格式数据集。
If you downloaded the example data you will need to extract it.
如果下载了示例数据,则需要将其提取。
tar -xvf coco_data.tar
You should have the following directory structure:
您应该具有以下目录结构:
COCO-dataset-explorer
|coco_data
|images
|000000000139.jpg
|000000000285.jpg
|000000000632.jpg
|...
|ground_truth_annotations.json
|predictions.json
|coco_explorer.py
|Dockerfile
|environment.yml
|...
- Set up the environment with all the dependencies 使用所有依赖项设置环境
conda env update;
conda activate cocoexplorer
- Run streamlit app specifying a file with ground truth and predictions in the COCO format and the image directory: 运行streamlit app,以COCO格式和图像目录指定具有地面真实性和预测的文件:
streamlit run coco_explorer.py -- \
--coco_train coco_data/ground_truth_annotations.json \
--coco_predictions coco_data/predictions.json \
--images_path coco_data/images/
Note: You can also run this with docker:
注意:您也可以使用docker运行它:
sudo docker run -p 8501:8501 -it -v "$(pwd)"/coco_data:/coco_data i008/coco_explorer \
streamlit run coco_explorer.py -- \
--coco_train /coco_data/ground_truth_annotations.json \
--coco_predictions /coco_data/predictions.json \
--images_path /coco_data/images/
explore the dataset in the browser. By default, it will run on http://localhost:8501/
在浏览器中浏览数据集。 默认情况下,它将在http:// localhost:8501 /上运行
最后的话 (Final words)
I hope that with this post I convinced you that data exploration in object detection and image segmentation is as important as in any other branch of machine learning.
我希望通过这篇文章使您确信,在对象检测和图像分割中进行数据探索与在机器学习的任何其他分支中一样重要。
I’m confident that the effort we make at this stage of the project pays off in the long run.
我有信心,从长期来看,我们在项目现阶段所付出的努力是有回报的。
The knowledge we gather allows us to make better-informed modeling decisions, avoid multiple training pitfalls and gives you more confidence in the training process, and the predictions your model produces.
我们收集的知识使我们能够做出更明智的建模决策,避免出现多个训练陷阱,并使您对训练过程以及模型产生的预测更有信心。
This article was originally written by Jakub Cieślik and posted on the Neptune blog. You can find more in-depth articles for machine learning practitioners there.
本文最初由 JakubCieślik 撰写, 并发布在 Neptune博客上。 您可以在此处找到针对机器学习从业人员的更多深入文章。
翻译自: https://medium.com/neptune-ai/how-to-do-data-exploration-for-image-segmentation-and-object-detection-things-i-had-to-learn-the-148ed34e8895
图像分割和目标检测