CVPR2019:Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

摘要
Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that IoU can be directly used as a regression loss. However, IoU has a plateau making it infeasible to optimize in the case of nonoverlapping bounding boxes. In this paper, we address the weaknesses of IoU by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized IoU (GIoU) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, IoU based, and new, GIoU based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.

联合交叉(IoU)是在对象检测基准中使用的最流行的评估度量。 然而,在优化常用距离损失以回归边界框的参数和最大化该度量值之间存在差距。 度量的最佳目标是度量本身。 在轴对齐的2D边界框的情况下,可以显示IoU可以直接用作回归损失。 然而,IoU具有一个平台,使得在不重叠的边界框的情况下优化是不可行的。 在本文中,我们通过引入广义版本作为新损失和新指标来解决IoU的弱点。 通过将这种通用IoU(GIoU)作为一种损失纳入最先进的对象检测框架,我们使用基于标准,基于IoU和基于GIoU的新的性能度量对流行对象的性能进行了一致的改进 检测基准,如PASCAL VOC和MS COCO。

引言
Bounding box regression is one of the most fundamental components in many 2D/3D computer vision tasks. Tasks such as object localization, multiple object detection, object tracking and instance level segmentation rely on accurate bounding box regression. The dominant trend for improving performance of applications utilizing deep neural networks is to propose either a better architecture backbone [15, 13] or a better strategy to extract reliable local features [6]. However, one opportunity for improvement that is widely ignored is the replacement of the surrogate regression losses such as ‘ 1 and ‘ 2 -norms, with a metric loss calculated based on Intersection over Union (IoU).

边界框回归是许多2D / 3D计算机视觉任务中最基本的组件之一。 目标定位,多目标检测,对象跟踪和实例级别分割等任务依赖于精确的边界框回归。 利用深度神经网络提高应用性能的主要趋势是提出更好的架构或更好的策略来提取可靠的局部特征。 然而,一个被广泛忽视的改进机会是改变回归损失,例如L1L2范数,其中包括根据IOU计算的度量损失。

IoU, also known as Jaccard index, is the most commonly used metric for comparing the similarity between two arbitrary shapes. IoU encodes the shape properties of the objects under comparison, e.g. the widths, heights and locations of two bounding boxes, into the region property and then calculates a normalized measure that focuses on their areas (or volumes). This property makes IoU invariant to the scale of the problem under consideration. Due to this appealing property, all performance measures used to evaluate for segmentation [2,1,25,14], object detection[14,4],and tracking [11, 10] rely on this metric.

IoU,也称为Jaccard索引,是用于比较两个任意形状之间的相似性的最常用度量。 IoU对比较的对象的形状区域属性进行编码,例如 两个边界框的宽度,高度和位置,然后计算一个关注其区域(或体积)的标准化度量。 此属性使IoU对所考虑问题的规模不变。 由于这种吸引人的特性,用于评估分割,物体检测和跟踪的所有性能测量依赖于该度量。

However, it can be shown that there is not a strong correlation between minimizing the commonly used losses,e.g. ‘ n -norms, defined on parametric representation of two bounding boxes in 2D/3D and improving their IoU values.For example, consider the simple 2D scenario in Fig. 1 (a),where the predicted bounding box (black rectangle), and the ground truth box (green rectangle), are represented by their top-left and bottom-right corners, i.e. ( x 1 , y 1 , x 2 , y 2 ) (x_1 ,y_1 ,x_2 ,y_2 ) (x1,y1,x2,y2). For simplicity, let’s assume that the distance, e.g. ‘ 2 -norm, between one of the corners of two boxes is fixed. Therefore any predicted bounding box where the second corner lies on a circle with a fixed radius centered on the second corner of the green rectangle (shown by a gray dashed line circle) will have exactly the same ‘ 2 -norm distance from the ground truth box; however their IoU values can be significantly different (Fig. 1 (a)). The same argument can be extended to any other representation and loss, e.g. Fig. 1 (b). It is intuitive that a good local optimum for these types of objectives may not necessarily be a local optimum for IoU. Moreover, in contrast to IoU, ‘ n -norm objectives defined based on the aforementioned parametric representations are not invariant to the scale of the problem. To this end, several pairs of bounding boxes with the same level of overlap, but different scales due to e.g. perspective, will have different objective values. In addition, some representations may suffer from lack of regularization between the different types of parameters used for the representation. For example, in the center and size representation, ( x c , y c ) (x_c ,y_c ) (xc,yc) is defined on the location space while (w,h) belongs to the size space. Complexity increases as more parameters are incorporated, e.g.rotation, or when adding more dimensions to the problem. To alleviate some of the aforementioned problems, state-of-the-art object detectors introduce the concept of an anchor box [22] as a hypothetically good initial guess. They also define a non-linear representation [19, 5] to naively compensate for the scale changes. Even with these handcrafted changes, there is still a gap between optimizing the regression losses and IoU values.

然而,可以证明在IOU和最小化常用损失函数之间没有很强的相关性,例如。 ’ L n L_n Ln 正则化,定义在2D / 3D中两个边界框的参数化表示,并改善它们的IoU值。例如,考虑图1(a)中的简单2D场景,用左上角和右下角表示预测的边界框(黑色矩形)和真实框(绿色矩形)。为简单起见,我们用 L 2 L_2 L2表示距离,假设两个盒子的一个角之间是固定的。因此,任何预测的边界框,其中第二个角落位于以绿色矩形的第二个角为中心的固定半径的圆上(由灰色虚线圆圈表示)将具有与真实框完全相同的 L 2 L_2 L2距离”;但是他们的IoU值可能会有很大差异(图1(a))。相同的论点可以扩展到任何其他表示和损失,例如图1(b)。直观的是,对于这些类型的目标而言,良好的局部最优可能不一定是IoU的局部最优。此外,与IoU相比,基于上述参数表示定义的n-范数目标对于问题的规模不是不变的。为此,有几对具有相同重叠水平的边界框,但由于例如不同的尺度。透视,会有不同的目标价值。另外,一些表示可能在用于表示的不同类型的参数之间缺乏正则化。例如,在中心和大小表示中, ( x c , y c ) (x_c,y_c) xcyc在位置空间上定义,而 ( w , h ) (w,h) wh属于大小空间。随着更多参数的合并,例如旋转,或者在为问题增加更多维度时,复杂性增加。为了缓解上述一些问题,最先进的物体探测器引入了anchor box的概念作为假设好的初步猜测,还定义了一个非线性表示[19,5],以天真地补偿规模变化。即使有这些手工改变,在优化回归损失和IoU值之间仍然存在差距。

In this paper, we explore the calculation of IoU between two axis aligned rectangles, or generally two axis aligned n-orthotopes, which has a straightforward analytical solution and in contrast to the prevailing belief, IoU in this case can be backpropagated [24], i.e. it can be directly used as the objective function to optimize. It is therefore preferable to use IoU as the objective function for 2D object detection tasks. Given the choice between optimizing a metric itself vs. a surrogate loss function, the optimal choice is the metric itself. However, IoU as both a metric and a loss has two major issues: (i) if two objects do not overlap, the IoU value will be zero and will not reflect how far the two shapes are from each other. In this case of non-overlapping objects,if IoU is used as a loss, its gradient will be zero and cannot be optimized; (ii) IoU cannot properly distinguish between different alignments of two objects. More precisely,IoU for two objects overlapping in several different orientations with the same intersection level will be exactly equal(Fig. 2). Therefore, the value of the IoU function does not reflect how overlap between two objects occurs. We will further elaborate on this issue in the paper.

在本文中,我们探讨了两个轴对齐矩形之间的IoU计算,或者通常是两个轴对齐的正交正交,它具有直接的解析解,与普遍的观点相反,在这种情况下,IoU可以反向传播[24],即它可以直接用作优化的目标函数。因此,优选使用IoU作为2D对象检测任务的目标函数。鉴于优化度量本身与代理丢失函数之间的选择,最佳选择是度量本身。然而,IoU作为度量和损失都有两个主要问题:(i)如果两个对象不重叠,则IoU值将为零,并且不会反映两个形状彼此之间的距离。在非重叠对象的情况下,如果将IoU用作损失,则其梯度将为零并且无法优化; (ii)IoU无法正确区分两个物体的不同方向。更确切地说,两个物体在相同交叉水平的几个不同方向上重叠的IoU将完全相等(图2)。因此,IoU函数的值不反映两个对象之间的重叠是如何发生的。我们将在论文中进一步阐述这个问题。

In this paper, we will address these two weaknesses of IoU by extending the concept to non-overlapping cases. We ensure this generalization (a) follows the same definition as IoU, i.e. encoding the shape properties of the compared objects into the region property; (b) maintains the scale invariant property of IoU, and © ensures a strong correlation with IoU in the case of overlapping objects. We introduce
this generalized version of IoU, named GIoU, as a new metric for comparing any two arbitrary convex shapes. We also provide an analytical solution for calculating GIoU between two axis aligned rectangles, allowing it to be used as a loss in this case. Incorporating GIoU loss into state-of-the art object detection algorithms, we consistently improve their performance on popular object detection benchmarks such as PASCAL VOC [4] and MS COCO [14] using both the standard, i.e. IoU based [4, 14], and the new, GIoU based, performance measures.
The main contribution of the paper is summarized as follows:
• We introduce this generalized version of IoU, as a new metric for comparing any two arbitrary shapes.
• We provide an analytical solution for using GIoU as loss between two axis-aligned rectangles or generally n-orthotopes 1 .
• We incorporate GIoU loss into the most popular object detection algorithms such as Faster R-CNN,Mask R-CNN and YOLO v3, and show their performance improvement on standard object detection benchmarks.

在本文中,我们将通过将概念扩展到非重叠案例来解决IoU的这两个弱点。我们确保这种推广(a)遵循与IoU相同的定义,即将比较对象的形状属性编码为region属性; (b)维持IoU的尺度不变性,(c)确保在重叠物体的情况下与IoU的强相关性。介绍
IoU的这个通用版本,名为GIoU,作为比较任意两个任意凸形状的新度量。我们还提供了一种用于计算两个轴对齐矩形之间的GIoU的分析解决方案,允许在这种情况下将其用作损耗。将GIoU损失纳入最先进的物体检测算法,我们使用标准(即基于IoU的[4,14])不断提高其在流行物体检测基准测试基准测试中的性能,如PASCAL VOC [4]和MS COCO [14]。 ,以及新的,基于GIo的绩效衡量标准。
该论文的主要贡献概括如下:
•我们介绍了这种IoU的通用版本,作为比较任意两种任意形状的新指标。
•我们提供了一种分析解决方案,用于将GIoU用作两个轴对齐矩形或通常为n-原位的丢失。
•我们将GIoU损失纳入最流行的对象检测算法,如更快的R-CNN,Mask R-CNN和YOLO v3,并展示其在标准对象检测基准上的性能改进。

相关工作
Object detection accuracy measures: Intersection over Union (IoU) is the defacto evaluation metric used in object detection. It is used to determine true positives and false positives in a set of predictions. When using IoU as an evaluation metric an accuracy threshold must be chosen. For instance in the PASCAL VOC challenge [4], the widely reported detection accuracy measure, i.e. mean Average Precision (mAP), is calculated based on a fixed IoU threshold,i.e. 0.5. However, an arbitrary choice of the IoU threshold does not fully reflect the localization performance of different methods. Any localization accuracy higher than the threshold is treated equally. In order to make this performance measure less sensitive to the choice of IoU threshold, the MS COCO Benchmark challenge [14] averages mAP across multiple IoU thresholds.

对象检测精度测量:联合交叉(IoU)是在对象检测中使用的事实评估度量。 它用于确定一组预测中的真阳性和误报。 使用IoU作为评估指标时,必须选择准确度阈值。 例如,在PASCAL VOC挑战[4]中,广泛报道的检测精度测量值,即平均 平均精度(mAP),是基于固定的IoU阈值计算的,即。0.5。 然而,任意选择IoU阈值并不能完全反映不同方法的定位性能。 任何高于阈值的定位精度都被平等对待。 为了使此性能测量对IoU阈值的选择不那么敏感,MS COCO基准测试[14]在多个IoU阈值上进行平均mAP。

Bounding box representations and losses: In 2D object detection, learning bounding box parameters is crucial.Various bounding box representations and losses have been proposed in the literature. Redmon et al. in YOLO v1[19] propose a direct regression on the bounding box parameters with a small tweak to predict square root of the bounding box size to remedy scale sensitivity. Girshick et al. [5] in R-CNN parameterize the bounding box representation by predicting location and size offsets from a prior bounding box calculated using a selective search algorithm [23]. To alleviate scale sensitivity of the representation, the bounding box size offsets are defined in log-space. Then, an ‘ 2 -norm objective, also known as MSE loss, is used as the objective to optimize. Later, in Fast R-CNN [7], Girshick proposes ‘ 1 -smooth loss to make the learning more robust against outliers. Ren et al. [22] propose the use of a set of dense prior bounding boxes, known as anchor boxes, followed by a regression to small variations on bounding box locations and sizes. However, this makes training the bounding box scores more difficult due to significant class imbalance between positive and negative samples. To mitigate this problem, the authors later introduce focal loss [13], which is orthogonal to the main focus of our paper.
Most popular object detectors [20, 21, 3, 12, 13, 16] utilize some combination of the bounding box representations and losses mentioned above. These considerable efforts have yielded significant improvement in object detection.We show there may be some opportunity for further improvement in localization with the use of GIoU, as their bounding box regression losses are not directly representative of the core evaluation metric, i.e. IoU.

边界框表示和损失:在2D对象检测中,学习边界框参数是至关重要的。文献中已经提出了各种边界框表示和损失。 Redmon等。在YOLO v1 [19]中提出了对边界框参数的直接回归,并用一个小的调整来预测边界框大小的平方根,以弥补尺度灵敏度。 Girshick等。 [5]在R-CNN中通过预测来自使用选择性搜索算法计算的先前边界框的位置和大小偏移来参数化边界框表示[23]。为了减轻表示的缩放灵敏度,在日志空间中定义边界框大小偏移。然后,使用’2范数目标,也称为MSE损失,作为优化的目标。后来,在快速R-CNN [7]中,Girshick提出了“1 - 光滑的损失,使学习对异常值更加强大。任等人。 [22]建议使用一组密集的先前边界框,称为锚框,然后回归到边界框位置和大小的小变化。然而,由于正样本和负样本之间显着的类别不平衡,这使得训练边界框分数更加困难。为了缓解这个问题,作者后来引入了焦点丢失[13],这与我们论文的主要焦点正交。
最流行的物体检测器[20,21,3,12,13,16]利用上述边界框表示和损失的某种组合。这些相当大的努力已经在对象检测方面取得了显着的改进。我们表明,使用GIoU可能有一些机会进一步改进本地化,因为它们的边界框回归损失并不直接代表核心评估指标,即IoU。

Optimizing IoU using an approximate or a surrogate function: In the semantic segmentation task, there have been some efforts to optimize IoU using either an approximate function [18] or a surrogate loss [17]. Similarly, for the object detection task, recent works [8, 24] have attempted to directly or indirectly incorporate IoU to better perform bounding box regression. However, they suffer from either an approximation or a plateau which exist in optimizing IoU in nonoverlapping cases. In this paper we address the weakness of IoU by introducing a generalized version of IoU, which is directly incorporated as a loss for the object detection problem.

使用近似函数或替代函数优化IoU:在语义分割任务中,已经尝试使用近似函数[18]或代理丢失[17]来优化IoU。 类似地,对于对象检测任务,最近的工作[8,24]已经尝试直接或间接地合并IoU以更好地执行边界框回归。 然而,它们在非重叠情况下在优化IoU时存在近似或平台。 在本文中,我们通过引入IoU的通用版本来解决IoU的弱点,IoU直接作为对象检测问题的损失。

广义IOU
IOU作为比较两个任意形状(体积)A,B之间的相似度被定义为 I O U = ∣ A ∩ B ∣ ∣ A ∪ B ∣ IOU=\frac{|A\cap B|}{|A\cup B|} IOU=ABAB

Two appealing features, which make this similarity mea-sure popular for evaluating many 2D/3D computer vision tasks are as follows:
• IoU as a distance, e.g.L IoU = 1−IoU, is a metric(by mathematical definition) [9]. It means L IoU fulfills all properties of a metric such as non-negativity, identity of indiscernibles, symmetry and triangle inequality.
• IoU is invariant to the scale of the problem. This means that the similarity between two arbitrary shapes A and B is independent from the scale of their space S (the proof is provided in supp. material).

两个吸引人的功能,使得这种相似性在评估许多2D / 3D计算机视觉任务时非常受欢迎,如下所示:
•作为距离的IoU,例如 L I o U = 1 − I o U L_{IoU} = 1-IoU LIoU=1IoU,是度量(通过数学定义)[9]。 这意味着L IoU满足度量的所有属性,如非负性,不可分辨的身份,对称性和三角不等式。
•IoU对问题的规模不变。 这意味着两个任意形状A和B之间的相似性与它们的空间S的尺度无关(证明在支持材料中提供)。

However, IoU has two weaknesses:
• If |A∩B| = 0, IoU(A,B) = 0. In this case, IoU does not reflect if two shapes are in vicinity of each other or very far from each other.
• IoU value for different alignments of two shapes is identical as long as the volume (area) of their intersection in each case is equal. Therefore, IoU does not reflect how overlap between two objects occurs (Fig. 2).

但是,IoU有两个缺点:
•如果|A∩B| = 0,IoU(A,B)= 0.在这种情况下,IoU不反映两个形状是彼此相邻还是彼此相距很远。
•两种形状的不同对齐的IoU值是相同的,只要它们在每种情况下的交叉点的体积(面积)相等即可。 因此,IoU没有反映出两个物体之间是如何重叠的(图2)。

To address these issues, we propose a general extension to IoU, namely Generalized Intersection over Union GIoU.
For two arbitrary convex shapes (volumes) A,B ⊆ S ∈R , we first find the smallest convex shapes C ⊆ S ∈ R enclosing both A and B. For comparing two specific types
of geometric shapes, C can be from the same type. For example, two arbitrary ellipsoids, C could be the smallest ellipsoids enclosing them. Then we calculate a ratio between the volume (area) occupied by C excluding A and B and divide by the total volume (area) occupied by C. This represents a normalized measure that focuses on the empty volume (area) between A and B. Finally GIoU is attained by subtracting this ratio from the IoU value. The calculation of GIoU is summarized in Alg.

为了解决这些问题,我们建议对IoU进行一般性扩展,即联盟基础上的广义交叉。对于两个任意凸形(体积)A,B⊆S∈R,我们首先找到包含A和B的最小凸形C⊆S∈R。为了比较两种特定类型的几何形状,C可以来自相同类型。 例如,两个任意椭圆体C可以是包围它们的最小椭圆体。 然后我们计算除A和B之外的C占据的体积(面积)除以C占据的总体积(面积)。这表示关注A和B之间的空体积(面积)的归一化度量。 最后,通过从IoU值中减去该比率来获得GIoU。 在Alg中总结了GIoU的计算。

GIoU as a new metric has the following properties:
Similar to IoU, GIoU as a distance, e.g. L GIoU =1 − GIoU, holding all properties of a metric such as non-negativity, identity of indiscernibles, symmetry and triangle inequality.
Similar to IoU, GIoU is invariant to the scale of the problem.
GIoU is always a lower bound for IoU, i.e.∀A,B ⊆S GIoU(A,B) ≤ IoU(A,B), and this lower bound becomes tighter when A and B have a stronger shape similarity and proximity, i.e. lim A→B GIoU(A,B) =IoU(A,B).

作为新指标的GIoU具有以下属性:
1.类似于IoU,GIoU作为距离,例如 L GIoU = 1 - GIoU,包含度量的所有属性,例如非负性,不可分辨的身份,对称性和三角不等式。
2.与IoU类似,GIoU对问题的规模不变。
3.GIoU始终是IoU的下限,即∀A,B⊆SGooU(A,B)≤IOU(A,B),当A和B具有更强的形状相似性和接近度时,该下限变得更紧密,即lim A→B GIOU(A,B)= IoU(A,B)。
4. ∀ A , B ⊆ S , 0 ≤ I O U ( A , B ) ≤ 1 \forall A,B\subseteq S,0\leq IOU(A,B)\leq 1 A,BS,0IOU(A,B)1,但是GIOU有一个对称的取值范围,即 ∀ A , B ⊆ S , − 1 ≤ G I O U ( A , B ) ≤ 1 \forall A,B\subseteq S,-1\leq GIOU(A,B)\leq 1 A,BS,1GIOU(A,B)1
I) 和IOU相同,只有两个目标完全重合时GIOU才等于1。
II) 当两个区域占据的面积和 ∣ A ∪ B ∣ |A\cup B| AB与包含AB最小凸形 ∣ C ∣ |C| C的比值接近于0时,GIOU的值收敛到-1。
5.与IoU相比,GIoU不仅关注重叠区域。 当A和B相对于彼此没有很好地对准时,封闭形状C中的两个对称形状A和B之间的空白空间增加(图2)。 因此,GIoU的值可以更好地反映两个对称物体之间如何发生重叠。

The reason we care about the last property is that a metric that reflects changes in orientation between two shapes allows differentiation between results that would otherwise be identical.
In summary, this generalization keeps the major properties of IoU while rectifying its weaknesses. Therefore,GIoU can be a proper substitute for IoU in all performance
measures used in 2D/3D computer vision tasks. In this paper, we only focus on 2D object detection where we can easily derive an analytical solution for GIoU to apply it as
both metric and loss. The extension to non-axis aligned 3D cases is left as future work.

我们关心最后一个属性的原因是反映两个形状之间方向变化的度量允许区分结果,否则这些结果将是相同的。
总之,GIOU保留了IoU的主要属性,同时纠正了它的弱点。 因此,在2D / 3D计算机视觉任务中使用的所有性能测量中,GIoU可以是IoU的适当替代品。 在本文中,我们只关注2D对象检测,我们可以很容易地为GIoU推导出一个分析解决方案,将其应用为度量和损失。 非轴对齐3D案例的扩展将作为未来的工作。
3.1 GIOU作为边界框回归的损失

So far, we introduced GIoU as a metric for any two arbitrary shapes. However as is the case with IoU, there is no analytical solution for calculating intersection between two
arbitrary shapes and/or for finding the smallest enclosing convex object for them.
Fortunately, for the 2D object detection task where the task is to compare two axis aligned bounding boxes, we can show thatGIoU has a straightforward solution. In this case,the intersection and the smallest enclosing objects both have rectangular shapes. It can be shown that the coordinates of their vertices are simply the coordinates of one of the two bounding boxes being compared, which can be attained by comparing each vertices’ coordinates using min and max functions. To check if two bounding boxes overlap, a condition must also be checked. Therefore, we have an exact solution to calculate IoU and GIoU.

到目前为止,我们引入了GIoU作为任意两种任意形状的度量。 然而,与IoU的情况一样,没有分析解决方案来计算两个任意形状之间的交叉和/或为它们找到最小的封闭凸形物体。
幸运的是,对于2D对象检测任务,其任务是比较两个轴对齐的边界框,我们可以证明GIoU有一个简单的解决方案。 在这种情况下,交叉点和最小的封闭对象都具有矩形形状。 可以看出,它们的顶点的坐标只是被比较的两个边界框之一的坐标,这可以通过使用最小和最大函数比较每个顶点的坐标来实现。 要检查两个边界框是否重叠,还必须检查条件。 因此,我们有一个精确的解决方案来计算IoU和GIoU。

Since back-propagating min, max and piece-wise linear functions, e.g. Relu, are feasible, it can be shown that every component in Alg. 2 has a well-behaved derivative. Therefore, IoU or GIoU can be directly used as a loss, i.e. L IoU or L GIoU , for optimizing deep neural network based object detectors. In this case, we are directly optimizing a metric as loss, which is an optimal choice for the metric. However,in all non-overlapping cases, IoU has zero gradient, which affects both training quality and convergence rate. GIoU,in contrast, has a gradient in all possible cases, including
non-overlapping situations. In addition, using property 3,we show that GIoU has a strong correlation with IoU,especially in high IoU values. We also demonstrate this
correlation qualitatively in Fig. 3 by taking over 10K random samples from the parameters of two 2D rectangles. In Fig. 3, we also observe that in the case of low overlap, e.g.IoU ≤ 0.2 and GIoU ≤ 0.2, GIoU has the opportunity to change more dramatically compared to IoU. To this end,GIoU can potentially have a steeper gradient in any possible state in these cases compared to IoU. Therefore, optimizing GIoU as loss, L GIoU can be a better choice compared to L IoU , no matter which IoU-based performance measure is ultimately used. Our experimental results verify this claim.

由于反向传播最小,最大和分段线性函数,例如, Relu,是可行的,它可以显示Alg中的每个组件。 2具有良好的导数。因此,IoU或GIoU可以直接用作损失,即L IoU或L Goo,用于优化基于深度神经网络的物体检测器。在这种情况下,我们直接将度量标准优化为损失,这是度量标准的最佳选择。然而,在所有非重叠的情况下,IoU具有零梯度,这既影响训练质量又影响收敛速度。相比之下,GIoU在所有可能的情况下都有一个梯度,包括不重叠的情况。此外,使用属性3,我们表明,GIoU与IoU具有很强的相关性,尤其是在高IoU值时。我们还通过从两个2D矩形的参数中获取超过10K的随机样本,在图3中定性地证明了这种相关性。在图3中,我们还观察到在低重叠的情况下,例如, IoU≤0.2且GIoU≤0.2,与IoU相比,GIoU有机会发生更大的变化。为此,与IoU相比,在这些情况下,GIoU可能在任何可能的状态下具有更陡峭的梯度。因此,将GIoU优化为损耗,与L IoU相比,L GIoU可以是更好的选择,无论最终使用哪种基于IoU的性能测量。我们的实验结果证实了这一说法。

Loss Stability: We also investigate if there exist any extreme cases which make the loss unstable / undefined given any value for the predicted outputs.

损失稳定性:我们还研究是否存在任何极端情况,如果预测输出的任何值,则使损失不稳定/未定义。

4.实验结果
我们将新的边框回归损失 L G I O U L_{GIOU} LGIOU应用于最流行的2D目标检测中。 为此,我们用 L G I O U L_{GIOU} LGIOU替换它们的默认回归损失,即我们用FOL中的’1-smooth-Mask-RCNN [22,6]和YOLO v3中的MSE [21]代替。 我们还将基线损失与L IoU 4进行比较。

Dataset. We train all detection baselines and report all the results on two standard object detection benchmarks,i.e. the PASCAL VOC [4] and the Microsoft Common Objects in Context (MS COCO) [14] challenges. The details of their training protocol and their evaluation have been provided in their own sections.PASCAL VOC 2007: The Pascal Visual Object Classes(VOC)[4] benchmark is one of the most widely used datasets for classification, object detection and semantic segmentation. It consists of 9963 images with a 50/50 split for training and test, where objects from 20 pre-defined categories have been annotated with bounding boxes.
MS COCO: Another popular benchmark for image captioning, recognition, detection and segmentation is the more recent Microsoft Common Objects in Context(MS-COCO) [14]. The COCO dataset consists of over 200,000 images across train, validation and test sets with over 500,000 annotated object instances from 80 categories.

数据集。 我们训练所有检测基线,并在两个标准物体检测基准上报告所有结果,即。 PASCAL VOC [4]和Microsoft上下文公共对象(MS COCO)[14]面临挑战。 他们的培训协议及其评估的详细信息已在各自的章节中提供.PASCAL VOC 2007:Pascal视觉对象类(VOC)[4]基准是用于分类,对象检测和语义分割的最广泛使用的数据集之一。 它由9963个图像和50/50分割组成,用于训练和测试,其中来自20个预定义类别的对象已使用边界框注释。
MS COCO:图像字幕,识别,检测和分割的另一个流行基准是最新的Microsoft上下文公共对象(MS-COCO)[14]。 COCO数据集包括列车,验证和测试集中的200,000多个图像,其中包含来自80个类别的500,000多个带注释的对象实例。

Evaluation protocol. In this paper, we adopt the same performance measure as the MS COCO 2018 Challenge [14] to report all our results. This includes the calculation of mean Average precision (mAP) over different class labels for a specific value of IoU threshold in order to determine true positives and false positives.The main performance measure used in this benchmark is shown by AP, which is averaging mAP across different value of IoU thresholds, i.e. IoU = {.5,.55,··· ,.95}.Additionally, we modify this evaluation script to use GIoU instead of IoU as a metric to decide about true positives
and false positives. Therefore, we report another value for AP by averaging mAP across different values of GIoU thresholds, GIoU = {.5,.55,··· ,.95}. We also report the
mAP value for IoU and GIoU thresholds equal to 0.75,shown as AP75 in the tables.

评估协议。 在本文中,我们采用与MS COCO 2018 Challenge [14]相同的性能指标来报告我们的所有结果。 这包括针对特定IoU阈值的不同类标签计算平均平均精度(mAP)以确定真阳性和误报。该基准中使用的主要性能指标由AP显示,其平均mAP跨越不同的IoU阈值,即IoU = {.5,.55,…,.95}。另外,我们修改此评估脚本以使用GIoU而不是IoU作为度量来确定真阳性和误报。因此 ,我们通过在不同的GIoU阈值值上平均mAP来报告AP的另一个值,即GIo = {.5,.55,…,.95}。 我们还报告了IoU和GIoU阈值的mAP值等于0.75,在表中显示为AP75。

All detection baselines have also been evaluated using the test set of the MS COCO 2018 dataset, where the annotations are not accessible for the evaluation. Therefore in this case, we are only able to report results using the standard performance measure, i.e. IoU.
还使用MS COCO 2018数据集的测试集评估了所有检测基线,其中评注无法获得注释。 因此,在这种情况下,我们只能使用标准性能指标(即IoU)报告结果。

4.1YOLO V3
Training protocol. We used the original Darknet implementation of YOLO v3 released by the authors . For baseline results (training using MSE loss), we used DarkNet-608 as backbone network architecture in all experiments and followed exactly their training protocol using the reported default parameters and the number of iteration on each benchmark. To train YOLO v3 using IoU and GIoU losses, we simply replace the bounding box regression MSE loss with L IoU and L GIoU losses explained in Alg. 2. Considering the additional MSE loss on classification and since we replace an unbounded distance loss such as MSE distance with a bounded distance, e.g. L IoU or L G I o U L_{GIoU} LGIoU , we need to regularize the new bounding box regression against the classification loss. However, we performed a very minimal effort to regularize these new regression losses against the MSE classification loss.

训练规则。 我们使用了作者发布的YOLO v3的原始Darknet实现。 对于基线结果(使用MSE损失进行培训),我们在所有实验中使用DarkNet-608作为骨干网络架构,并使用报告的默认参数和每个基准的迭代次数完全遵循其训练协议。 为了使用IoU和GIoU损失来训练YOLO v3,我们简单地用L IoU替换边界框回归MSE损失和Alg中解释的L GIOU损失。 2.考虑到分类上的额外MSE损失,并且因为我们用有限距离替换无限距离损失,例如MSE距离,例如, L IoU或$ L_ {GIoU} $,我们需要根据分类丢失来规范新的边界框回归。 然而,我们进行了极少的努力来规范这些新的回归损失与MSE分类损失。

PASCAL VOC 2007. Following the original code’s training protocol, we trained the network using each loss on both training and validation set of the dataset up to 50K
iterations. Their performance using the best network model for each loss has been evaluated using the PASCAL VOC 2007 test and the results have been reported in Tab. 1.Considering both standard IoU based and new GIoU based performance measures, the results in Tab. 1 show that training YOLO v3 using L GIoU as regression loss can
considerably improve its performance compared to its own regression loss (MSE). Moreover, incorporating L IoU as regression loss can slightly improve the performance of YOLO v3 on this benchmark. However, the improvement is inferior compared to the case where it is trained by L GIoU .MS COCO. Following the original code’s training protocol, we trained YOLO v3 using each loss on both the training set and 88% of the validation set of MS COCO 2014 up to 502k iterations. Then we evaluated the results using the remaining 12% of the validation set and reported the results in Tab. 2. We also compared them on the MS COCO 2018 Challenge by submitting the results to
the COCO server. All results using the IoU based performance measure are reported in Tab. 3. Similar to the PASCALVOCexperiment, the results show consistent improvement in performance for YOLO v3 when it is trained using L GIoU as regression loss. We have also investigated how each component, i.e. bounding box regression and classification losses, contribute to the final AP performance measure. We believe the localization accuracy for YOLO v3 significantly improves when L GIoU loss is used (Fig. 4 (a)).However, with the current naive tuning of regularization parameters, balancing bounding box loss vs. classification loss, the classification scores may not be optimal, compared to the baseline (Fig. 4 (b)). Since AP based performance measure is considerably affected by small classification error, we believe the results can be further improved with a better search for regularization parameters.

PASCAL VOC 2007.遵循原始代码的培训协议,我们使用数据集的训练和验证集上的每次损失训练网络,最多50K次迭代。使用PASCAL VOC 2007测试评估了每种损耗使用最佳网络模型的性能,结果已在表1中报告。考虑到基于标准IoU和基于GIoU的新性能测量,结果见Tab。图1显示使用L GIoU作为回归损失训练YOLO v3与其自身的回归损失(MSE)相比可以显着改善其性能。此外,将L IoU纳入回归损失可以略微提高YOLO v3在此基准测试中的性能。然而,与由L GIo.MS COCO训练的情况相比,改进较差。遵循原始代码的培训协议,我们使用训练集和MS COCO 2014的88%验证集中的每个损失训练YOLO v3,最多502k次迭代。然后我们使用剩余的12%验证集评估结果,并在Tab中报告结果。 2.我们还通过将结果提交给COCO服务器,在MS COCO 2018 Challenge上对它们进行了比较。 Tab中报告了使用基于IoU的性能测量的所有结果。 3.与PASCALVOC实验类似,结果显示,当使用L GIoU作为回归损失训练时,YOLO v3的性能始终如一地改善。我们还研究了每个组件(即边界框回归和分类损失)如何对最终的AP性能测量做出贡献。我们相信当使用L GIOU损失时,YOLO v3的定位精度显着提高(图4(a))。但是,通过当前正规化参数的初始调整,平衡边界框损失与分类损失,分类分数可能不会与基线相比是最佳的(图4(b))。由于基于AP的性能测量受到小分类误差的显着影响,我们相信通过更好地搜索正则化参数可以进一步改善结果。

Training protocol. We used the latest PyTorch implementations of Faster R-CNN [22] and Mask R-CNN [6] ,released by Facebook research. This code is analogous to
the original Caffe2 implementation . For baseline results(trainedusing‘ 1 -smooth), we used ResNet-50 the backbone network architecture for both Faster R-CNN and Mask R-CNN in all experiments and followed their training protocol using the reported default parameters and the number of iteration on each benchmark. To train Faster R-CNN and
Mask R-CNN using IoU and GIoU losses, we replaced their ‘ 1 -smooth loss for bounding box regression with L IoU and L GIoU losses explained in Alg. 2. Similar to the YOLOv3 experiment, we undertook minimal effort to regularize the new regression loss against the other losses such as classification and segmentation losses. We simply multiplied L IoU and L GIoU losses by a factor of 10 for all experiments.

训练协议。 我们使用了最新的PyTorch实现的Faster R-CNN [22]和Mask R-CNN [6],由Facebook研究发布。 此代码类似于原始Caffe2实现。 对于基线结果(使用’1-smooth’进行训练),我们在所有实验中使用ResNet-50作为更快的R-CNN和Mask R-CNN的骨干网络架构,并使用报告的默认参数和迭代次数遵循其训练协议。 每个基准。 为了使用IoU和GIoU损失训练更快的R-CNN和Mask R-CNN,我们用2中解释的L IoU和L GIOU损失替换了他们的’1平滑损失的边界框回归。与YOLOv3实验类似,我们进行了极少的努力 将新的回归损失与分类和细分损失等其他损失进行规范化。 对于所有实验,我们简单地将L IoU和L GIoU损失乘以因子10。

PASCAL VOC 2007. Since there is no instance mask annotation available in this dataset, we did not evaluate Mask R-CNN on this dataset. Therefore, we only trained
Faster R-CNN using the aforementioned bounding box regression losses on the training set of the dataset for 20k iterations. Then, we searched for the best-performing model on the validation set over different parameters such as the number of training iterations and bounding box regression loss regularizer. The final results on the test set of the dataset have been reported in Tab. 4.
According to both standard IoU based and new GIoU based performance measure, the results in Tab. 4 show that training Faster R-CNN using L GIoU as the bounding box
regression loss can consistently improve its performance compared to its own regression loss (‘ 1 -smooth). Moreover, incorporating L IoU as the regression loss can slightly improve the performance of Faster R-CNN on this benchmark. The improvement is inferior compared to the case where it is trained using L GIoU , see Fig. 5, where we visualized different values of mAP against different value of IoU thresholds, i.e. .5 ≤ IoU ≤ .95.

PASCAL VOC 2007.由于此数据集中没有可用的实例蒙版注释,因此我们未在此数据集上评估Mask R-CNN。因此,我们仅使用数据集的训练集上的上述边界框回归损失训练更快的R-CNN,进行20k次迭代。然后,我们在不同参数上搜索验证集上性能最佳的模型,例如训练迭代次数和边界框回归损失正则化器。表4中报告了数据集测试集的最终结果。
根据标准IoU和新的基于GIoU的性能测量,结果在Tab。图4显示使用L GIoU作为边界框回归损失训练更快的R-CNN与其自身的回归损失('1平滑)相比可以持续改善其性能。此外,将L IoU作为回归损失并入可以略微提高该基准上更快的R-CNN的性能。与使用L GIo训练的情况相比,改进较差,参见图5,其中我们将不同的mAP值与不同的IoU阈值值进行可视化,即.5≤IO≤0.95。

5.结果
In this paper, we introduced a generalization to IoU as a new metric, namely GIoU, for comparing any two arbitrary convex shapes. We showed that this new metric has all of
the appealing properties which IoU has while addressing its weaknesses. Therefore it can be a good alternative in all performance measures in 2D/3D vision tasks relying on the IoU metric.
We also provided an analytical solution for calculating GIoU between two axis-aligned rectangles. We showed that the derivative of GIoU as a distance can be computed
and it can be used as a bounding box regression loss. By incorporating it into the state-of-the art object detection algorithms, we consistently improved their performance on popular object detection benchmarks such as PASCAL VOC and MS COCO using both the commonly used performance measures and also our new accuracy measure, i.e. GIoU based average precision. Since the optimal loss for a metric is the metric itself, our GIoU loss can be used as the optimal bounding box regression loss in all applications which require 2D bounding box regression.
In the future, we plan to investigate the feasibility of deriving an analytic solution for GIoU in the case of two rotating rectangular cuboids. This extension and incorporating it as a loss could have great potential to improve the performance of 3D object detection frameworks.

在本文中,我们引入了对IoU的推广作为新的度量,即GIoU,用于比较任意两个任意凸形状。我们展示了这个新指标具有IoU在解决其弱点时所具有的所有吸引人的特性。因此,它可以成为依赖IoU指标的2D / 3D视觉任务中所有性能指标的良好替代方案。
我们还提供了一种用于计算两个轴对齐矩形之间的GIoU的分析解决方案。我们表明,可以计算出作为距离的GIoU的导数,并且它可以用作边界框回归损失。通过将其融入最先进的物体检测算法,我们使用常用的性能测量和新的精度测量(即基于GIoU的平均值)不断提高其在流行物体检测基准测试(如PASCAL VOC和MS COCO)上的性能。精确。由于度量的最优损失是度量本身,因此我们的GIoU损失可以用作需要2D边界框回归的所有应用程序中的最佳边界框回归损失。
在未来,我们计划研究在两个旋转矩形长方体的情况下为GIoU推导分析解的可行性。这种扩展并将其作为一种损失加入其中可能具有改善3D对象检测框架性能的巨大潜力。

你可能感兴趣的:(计算机视觉)