IoU就是我们所说的交并比,是目标检测中最常用的指标,在anchor-based的方法中,他的作用不仅用来确定正样本和负样本,还可以用来评价输出框(predict box)和ground-truth的距离。
import numpy as np
def Iou(box1, box2, wh=False):
if wh == False:
xmin1, ymin1, xmax1, ymax1 = box1
xmin2, ymin2, xmax2, ymax2 = box2
else:
xmin1, ymin1 = int(box1[0]-box1[2]/2.0), int(box1[1]-box1[3]/2.0)
xmax1, ymax1 = int(box1[0]+box1[2]/2.0), int(box1[1]+box1[3]/2.0)
xmin2, ymin2 = int(box2[0]-box2[2]/2.0), int(box2[1]-box2[3]/2.0)
xmax2, ymax2 = int(box2[0]+box2[2]/2.0), int(box2[1]+box2[3]/2.0)
# 获取矩形框交集对应的左上角和右下角的坐标(intersection)
xx1 = np.max([xmin1, xmin2])
yy1 = np.max([ymin1, ymin2])
xx2 = np.min([xmax1, xmax2])
yy2 = np.min([ymax1, ymax2])
# 计算两个矩形框面积
area1 = (xmax1-xmin1) * (ymax1-ymin1)
area2 = (xmax2-xmin2) * (ymax2-ymin2)
inter_area = (np.max([0, xx2-xx1])) * (np.max([0, yy2-yy1])) #计算交集面积
iou = inter_area / (area1+area2-inter_area+1e-6) #计算交并比
return iou
论文:https://arxiv.org/abs/1902.09630
代码:https://github.com/generalized-iou/g-darknet
在CVPR2019中,论文提出了GIoU的思想。由于IoU是比值的概念,对目标物体的scale是不敏感的。然而检测任务中的BBox的回归损失(MSE loss, L1-smooth loss等)优化和IoU优化不是完全等价的,而且 Ln 范数对物体的scale也比较敏感,IoU无法直接优化没有重叠的部分。
这篇论文提出可以直接把IOU设为回归的loss。
如图所示,用( a )两个角( x1 , y1 , x2 , y2)和( b )圆心和尺寸( xc , yc , w , h)表示边界框的两组例子( a )和( b )。对于每个集合中的所有三种情况,( a ) 2 - 范数距离| | . | |2 ,和( b ) 1 - 范数距离| | . | | 1,表示两个矩形之间的值完全相同,但是它们的IOU和GIOU值有很大的不同。
范围:-1 <= GIOU <= 1
上面公式的意思是:先计算两个框的最小闭包区域面积Ac,U为交集面积(通俗理解:同时包含了预测框和真实框的最小框的面积),再计算出IOU,再计算闭包区域中不属于两个框的区域占闭包区域的比重,最后用IoU减去这个比重得到GIoU。
def Giou(rec1,rec2):
#分别是第一个矩形左右上下的坐标
x1,x2,y1,y2 = rec1
x3,x4,y3,y4 = rec2
iou = Iou(rec1,rec2)
area_C = (max(x1,x2,x3,x4)-min(x1,x2,x3,x4))*(max(y1,y2,y3,y4)-min(y1,y2,y3,y4))
area_1 = (x2-x1)*(y1-y2)
area_2 = (x4-x3)*(y3-y4)
sum_area = area_1 + area_2
w1 = x2 - x1 #第一个矩形的宽
w2 = x4 - x3 #第二个矩形的宽
h1 = y1 - y2
h2 = y3 - y4
W = min(x1,x2,x3,x4)+w1+w2-max(x1,x2,x3,x4) #交叉部分的宽
H = min(y1,y2,y3,y4)+h1+h2-max(y1,y2,y3,y4) #交叉部分的高
Area = W*H #交叉的面积
add_area = sum_area - Area #两矩形并集的面积
end_area = (area_C - add_area)/area_C #闭包区域中不属于两个框的区域占闭包区域的比重
giou = iou - end_area
return giou
论文:https://arxiv.org/pdf/1911.08287.pdf
DIoU要比GIou更加符合目标框回归的机制,将目标与anchor之间的距离,重叠率以及尺度都考虑进去,使得目标框回归变得更加稳定,不会像IoU和GIoU一样出现训练过程中发散等问题。
基于IoU和GIoU存在的问题,作者提出了两个问题:
- 直接最小化anchor框与目标框之间的归一化距离是否可行,以达到更快的收敛速度?
- 如何使回归在与目标框有重叠甚至包含时更准确、更快?
其中,b和bgt,分别代表了预测框和真实框的中心点,且ρ代表的是计算两个中心点间的欧式距离。 c代表的是能够同时包含预测框和真实框的最小闭包区域的对角线距离。
虽然DIOU能够直接最小化预测框和真实框的中心点距离加速收敛,但是Bounding box的回归还有一个重要的因素纵横比暂未考虑。如下图,三个红框的面积相同,但是长宽比不一样,红框与绿框中心点重合,这时三种情况的DIoU相同,证明DIoU不能很好的区分这种情况。
def Diou(bboxes1, bboxes2):
rows = bboxes1.shape[0]
cols = bboxes2.shape[0]
dious = torch.zeros((rows, cols))
if rows * cols == 0:#
return dious
exchange = False
if bboxes1.shape[0] > bboxes2.shape[0]:
bboxes1, bboxes2 = bboxes2, bboxes1
dious = torch.zeros((cols, rows))
exchange = True
# #xmin,ymin,xmax,ymax->[:,0],[:,1],[:,2],[:,3]
w1 = bboxes1[:, 2] - bboxes1[:, 0]
h1 = bboxes1[:, 3] - bboxes1[:, 1]
w2 = bboxes2[:, 2] - bboxes2[:, 0]
h2 = bboxes2[:, 3] - bboxes2[:, 1]
area1 = w1 * h1
area2 = w2 * h2
center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2
center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2
center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2
center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2
inter_max_xy = torch.min(bboxes1[:, 2:],bboxes2[:, 2:])
inter_min_xy = torch.max(bboxes1[:, :2],bboxes2[:, :2])
out_max_xy = torch.max(bboxes1[:, 2:],bboxes2[:, 2:])
out_min_xy = torch.min(bboxes1[:, :2],bboxes2[:, :2])
inter = torch.clamp((inter_max_xy - inter_min_xy), min=0)
inter_area = inter[:, 0] * inter[:, 1]
inter_diag = (center_x2 - center_x1)**2 + (center_y2 - center_y1)**2
outer = torch.clamp((out_max_xy - out_min_xy), min=0)
outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2)
union = area1+area2-inter_area
dious = inter_area / union - (inter_diag) / outer_diag
dious = torch.clamp(dious,min=-1.0,max = 1.0)
if exchange:
dious = dious.T
return dious
论文和DIOU一同提出的,论文考虑到bbox回归三要素中的长宽比还没被考虑到计算中,因此,进一步在DIoU的基础上提出了CIoU。其惩罚项如下面公式:
而v用来度量长宽比的相似性,定义为:
完整的 CIoU 损失函数定义:
def bbox_overlaps_ciou(bboxes1, bboxes2):
rows = bboxes1.shape[0]
cols = bboxes2.shape[0]
cious = torch.zeros((rows, cols))
if rows * cols == 0:
return cious
exchange = False
if bboxes1.shape[0] > bboxes2.shape[0]:
bboxes1, bboxes2 = bboxes2, bboxes1
cious = torch.zeros((cols, rows))
exchange = True
w1 = bboxes1[:, 2] - bboxes1[:, 0]
h1 = bboxes1[:, 3] - bboxes1[:, 1]
w2 = bboxes2[:, 2] - bboxes2[:, 0]
h2 = bboxes2[:, 3] - bboxes2[:, 1]
area1 = w1 * h1
area2 = w2 * h2
center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2
center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2
center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2
center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2
inter_max_xy = torch.min(bboxes1[:, 2:],bboxes2[:, 2:])
inter_min_xy = torch.max(bboxes1[:, :2],bboxes2[:, :2])
out_max_xy = torch.max(bboxes1[:, 2:],bboxes2[:, 2:])
out_min_xy = torch.min(bboxes1[:, :2],bboxes2[:, :2])
inter = torch.clamp((inter_max_xy - inter_min_xy), min=0)
inter_area = inter[:, 0] * inter[:, 1]
inter_diag = (center_x2 - center_x1)**2 + (center_y2 - center_y1)**2
outer = torch.clamp((out_max_xy - out_min_xy), min=0)
outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2)
union = area1+area2-inter_area
u = (inter_diag) / outer_diag
iou = inter_area / union
with torch.no_grad():
arctan = torch.atan(w2 / h2) - torch.atan(w1 / h1)
v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(w2 / h2) - torch.atan(w1 / h1)), 2)
S = 1 - iou
alpha = v / (S + v)
w_temp = 2 * w1
ar = (8 / (math.pi ** 2)) * arctan * ((w1 - w_temp) * h1)
cious = iou - (u + alpha * ar)
cious = torch.clamp(cious,min=-1.0,max = 1.0)
if exchange:
cious = cious.T
return cious
论文:https://arxiv.org/pdf/2101.08158.pdf
为了解决CIoU的问题,有学者在CIOU的基础上将纵横比拆开,提出了EIOU Loss,并且加入Focal聚焦优质的预测框,与CIoU相似的,EIoU是损失函数的解决方案,只用于训练。
EIOU的惩罚项是在CIOU的惩罚项基础上将纵横比的影响因子拆开分别计算目标框和预测框的长和宽,该损失函数包含三个部分:重叠损失,中心距离损失,宽高损失,前两部分延续CIoU中的方法,但是宽高损失直接使目标框与预测框的宽度和高度之差最小,使得收敛速度更快。惩罚项公式如下:
=,
过整合EIoU Loss和FocalL1 loss,最终得到了最终的Focal-EIoU loss,其中 γ是一个用于控制曲线弧度的超参。
import torch
import math
import numpy as np
def bbox_iou(box1, box2, xywh=False, giou=False, diou=False, ciou=False, eiou=False, eps=1e-7):
"""
实现各种IoU
Parameters
----------
box1 shape(b, c, h, w,4)
box2 shape(b, c, h, w,4)
xywh 是否使用中心点和wh,如果是False,输入就是左上右下四个坐标
GIoU 是否GIoU
DIoU 是否DIoU
CIoU 是否CIoU
EIoU 是否EIoU
eps 防止除零的小量
Returns
-------
"""
# 获取边界框的坐标
if xywh:
# 将 xywh 转换成 xyxy
b1_x1, b1_x2 = box1[..., 0] - box1[..., 2] / 2, box1[..., 0] + box1[..., 2] / 2
b1_y1, b1_y2 = box1[..., 1] - box1[..., 3] / 2, box1[..., 1] + box1[..., 3] / 2
b2_x1, b2_x2 = box2[..., 0] - box2[..., 2] / 2, box2[..., 0] + box2[..., 2] / 2
b2_y1, b2_y2 = box2[..., 1] - box2[..., 3] / 2, box2[..., 1] + box2[..., 3] / 2
else:
# x1, y1, x2, y2 = box1
b1_x1, b1_y1, b1_x2, b1_y2 = box1[..., 0], box1[..., 1], box1[..., 2], box1[..., 3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[..., 0], box2[..., 1], box2[..., 2], box2[..., 3]
# 区域交集
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# 区域并集
w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
union = w1 * h1 + w2 * h2 - inter + eps
# 计算iou
iou = inter / union
if giou or diou or ciou or eiou:
# 计算最小外接矩形的wh
cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1)
ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1)
if ciou or diou or eiou:
# 计算最小外接矩形角线的平方
c2 = cw ** 2 + ch ** 2 + eps
# 计算最小外接矩形中点距离的平方
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +
(b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4
if diou:
# 输出DIoU
return iou - rho2 / c2
elif ciou:
v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
with torch.no_grad():
alpha = v / (v - iou + (1 + eps))
# 输出CIoU
return iou - (rho2 / c2 + v * alpha)
elif eiou:
rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2
rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2
cw2 = cw ** 2 + eps
ch2 = ch ** 2 + eps
# 输出EIoU
return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2)
else:
c_area = cw * ch + eps # convex area
# 输出GIoU
return iou - (c_area - union) / c_area
else:
# 输出IoU
return iou
if __name__ == '__main__':
box1 = torch.from_numpy(np.asarray([170, 110, 310, 370]))
box1 = box1.expand(1, 1, 1, 1, 4)
# 有交集
box2 = torch.from_numpy(np.asarray([250, 60, 375, 300]))
box2 = box2.expand(1, 1, 1, 1, 4)
# 无交集
box3 = torch.from_numpy(np.asarray([730, 420, 1000, 700]))
box3 = box3.expand(1, 1, 1, 1, 4)
print('iou有交集:', bbox_iou(box1, box2))
print('giou有交集:', bbox_iou(box1, box2, giou=True))
print('diou有交集:', bbox_iou(box1, box2, diou=True))
print('ciou有交集:', bbox_iou(box1, box2, ciou=True))
print('eiou有交集:', bbox_iou(box1, box2, eiou=True))
print("=" * 20)
print('iou无交集:', bbox_iou(box1, box3))
print('giou无交集:', bbox_iou(box1, box3, giou=True))
print('diou无交集:', bbox_iou(box1, box3, diou=True))
print('ciou无交集:', bbox_iou(box1, box3, ciou=True))
print('eiou无交集:', bbox_iou(box1, box3, eiou=True))
论文:https://arxiv.org/abs/2205.12740
SIoU进一步考虑了真实框和预测框之间的向量角度,重新定义相关损失函数,具体包含四个部分:角度损失(Angle cost)、距离损失(Distance cost)、形状损失(Shape cost)、IoU损失(IoU cost)。
与真实框和预测框的最小外接矩形有关
cw,ch为真实框和预测框最小外接矩形的宽和高。
w,h,wgt,hgt分别为预测框和真实框的宽和高,θ控制对形状损失的关注程度,为了避免过于关注形状损失而降低对预测框的移动,作者使用遗传算法计算出接近4,因此作者定于参数范围为[2, 6]
import torch
import math
# [[x, y, w, h]]
# [[x, y, w, h]] : 中心点坐标和宽高
pred = torch.tensor([[1, 2, 3, 4]])
gt = torch.tensor([[5, 6, 7, 8]])
iou = 0.5 # 这里设置iou为0.5
def siou1():
# --------------------角度损失(Angle cost)------------------------------
import torch
import math
# [[x, y, w, h]]
# [[x, y, w, h]] : 中心点坐标和宽高
pred = torch.tensor([[1, 2, 3, 4]])
gt = torch.tensor([[5, 6, 7, 8]])
iou = 0.5 # 这里设置iou为0.5
def siou1():
# --------------------角度损失(Angle cost)------------------------------
gt_p_center_D_value_w = torch.abs((gt[:, 0] - pred[:, 0])) # 真实框和预测框中心点的宽度差
gt_p_center_D_value_h = torch.abs((gt[:, 1] - pred[:, 1])) # 真实框和预测框中心点的高度差
sigma = torch.pow(gt_p_center_D_value_w ** 2 + gt_p_center_D_value_h ** 2, 0.5) # 真实框和预测框中心点的距离
sin_alpha = torch.abs(gt_p_center_D_value_h) / sigma # 真实框和预测框中心点的夹角α
sin_beta = torch.abs(gt_p_center_D_value_w) / sigma # 真实框和预测框中心点的夹角β
threshold = torch.pow(torch.tensor(2.), 0.5) / 2 # 夹角阈值 0.7071068 = sin45° = 二分之根二
# torch.where(condition,a,b)其中
# 输入参数condition:条件限制,如果满足条件,则选择a,否则选择b作为输出。
sin_alpha = torch.where(sin_alpha < threshold, sin_beta, sin_alpha) # α小于45°则考虑优化β,否则优化α
angle_cost = torch.cos(2 * (torch.arcsin(sin_alpha) - math.pi / 4))
# -----------------距离损失(Distance cost)-----------------------------
# min_enclosing_rec_tl:最小外接矩形左上坐标
# min_enclosing_rec_br:最小外接矩形右下坐标
min_enclosing_rec_tl = torch.min(
(pred[:, :2] - pred[:, 2:] / 2), (gt[:, :2] - gt[:, 2:] / 2))
min_enclosing_rec_br = torch.max(
(pred[:, :2] + pred[:, 2:] / 2), (gt[:, :2] + gt[:, 2:] / 2))
# 最小外接矩形的宽高
min_enclosing_rec_br_w = (min_enclosing_rec_br - min_enclosing_rec_tl)[:, 0]
min_enclosing_rec_br_h = (min_enclosing_rec_br - min_enclosing_rec_tl)[:, 1]
# 真实框和预测框中心点的宽度(高度)差 / 以最小外接矩形的宽(高) 的平方
rho_x = (gt_p_center_D_value_w / min_enclosing_rec_br_w) ** 2
rho_y = (gt_p_center_D_value_h / min_enclosing_rec_br_h) ** 2
gamma = 2 - angle_cost
# 距离损失
distance_cost = 2 - torch.exp(-gamma * rho_x) - torch.exp(-gamma * rho_y)
# ----------------形状损失(Shape cost)----------------------
w_pred = pred[:, 2] # 预测框的宽
w_gt = gt[:, 2] # 真实框的宽
h_pred = pred[:, -1] # 预测框的高
h_gt = gt[:, -1] # 真实框的高
# 预测框的宽 - 真实框的宽的绝对值 / 预测框的宽和真实框的宽中的最大值
omiga_w = torch.abs(w_pred - w_gt) / torch.max(w_pred, w_gt)
omiga_h = torch.abs(h_pred - h_gt) / torch.max(h_pred, h_gt)
# 作者使用遗传算法计算出θ接近4,因此作者定于θ参数范围为[2, 6]
theta = 4
# 形状损失
shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), theta) + torch.pow(1 - torch.exp(-1 * omiga_h), theta)
#------------------loss_siou----------------------------
siou = 1.0 - iou + 0.5 * (distance_cost + shape_cost)
print(siou)
siou1()
论文:https://arxiv.org/abs/2301.10051
代码:https://github.com/Instinct323/wiou
Focal EIoU v1被提出来解决质量较好和质量较差的样本间的BBR平衡问题,但由于其静态聚焦机制(FM),非单调FM的潜力没有被充分利用,基于这一思想,作者提出了一种基于IoU的损失,该损失具有动态非单调FM,名为Wise IoU(WIoU)。
notice: WIoU 是一个依赖超参数的方法,所以建议大家根据自己的实验数据调整超参数.
本文所涉及的聚焦机制有以下几种:
WIoU v1 构造了基于注意力的边界框损失,WIoU v2 和 v3 则是在此基础上通过构造梯度增益 (聚焦系数) 的计算方法来附加聚焦机制
因为训练数据中难以避免地包含低质量示例,所以如距离、纵横比之类的几何度量都会加剧对低质量示例的惩罚从而使模型的泛化性能下降。好的损失函数应该在锚框与目标框较好地重合时削弱几何度量的惩罚,不过多地干预训练将使模型有更好的泛化能力。在此基础上,我们根据距离度量构建了距离注意力,得到了具有两层注意力机制的 WIoU v1:
其中Wg,Hg为最小包围盒的尺寸(图1 )。为了防止RWIoU产生阻碍收敛的梯度,将Wg,Hg从计算图(上标*表示这个运算)中剥离出来。因为它有效地消除了阻碍收敛的因素,所以我们没有引入新的度量指标,如纵横比。
Focal Loss 设计了一种针对交叉熵的单调聚焦机制,有效降低了简单示例对损失值的贡献。这使得模型能够聚焦于困难示例,获得分类性能的提升。类似WIoU v2:
单调聚焦系数:
所以
其反向传播发生变化:
注意到梯度增益为r = Lγ* IoU∈[ 0 , 1]。在模型训练过程中,梯度增益随着LIoU的减小而减小,导致训练后期收敛速度较慢。因此,引入LIoU的均值作为归一化因子:
锚点框的离群程度用L*IoU与LIoU的比值来表征:
离群度小意味着锚框质量高。我们为其分配一个小的梯度增益,以便将BBR聚焦到普通质量的锚框上。此外,将较小的梯度增益分配给离群度较大的锚框,将有效防止低质量示例产生较大的有害梯度。我们利用β构造了一个非单调聚焦系数并将其应用于WIoU v1:
其中,当β = δ时,δ使得r = 1。如图8所示,当锚框的离群度满足β = C( C为定值)时,其梯度增益最大。由于是动态的,锚点框的质量划分标准也是动态的,这使得WIoU v3可以在每一时刻做出最符合当前情况的梯度增益分配策略
为了防止低质量锚框在训练初期被遗留,我们初始化,使得的锚框具有最高的梯度增益。为了在训练的早期阶段保持这样的策略,需要设置一个小的动量m来延迟接近真实值的时间。对于批次数为n的训练,在t次迭代时AP的提升速度明显变慢(图9 ),建议将动量设为:
在训练的中后期,WIoU v3将较小的梯度增益分配给低质量的锚框以减少有害梯度。同时,也关注普通质量锚框,以提高模型的定位性能。
import numpy as np
import torch, math
class WIoU_Scale:
''' monotonous: {
None: origin v1
True: monotonic FM v2
False: non-monotonic FM v3
}
momentum: The momentum of running mean'''
iou_mean = 1.
monotonous = False
_momentum = 1 - 0.5 ** (1 / 7000)
_is_train = True
def __init__(self, iou):
self.iou = iou
self._update(self)
@classmethod
def _update(cls, self):
if cls._is_train: cls.iou_mean = (1 - cls._momentum) * cls.iou_mean + \
cls._momentum * self.iou.detach().mean().item()
@classmethod
def _scaled_loss(cls, self, gamma=1.9, delta=3):
if isinstance(self.monotonous, bool):
if self.monotonous:
return (self.iou.detach() / self.iou_mean).sqrt()
else:
beta = self.iou.detach() / self.iou_mean
alpha = delta * torch.pow(gamma, beta - delta)
return beta / alpha
return 1
def bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, SIoU=False, EIoU=False, WIoU=False, Focal=False, alpha=1, gamma=0.5, scale=False, eps=1e-7):
# Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)
# Get the coordinates of bounding boxes
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
else: # x1, y1, x2, y2 = box1
b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
w1, h1 = b1_x2 - b1_x1, (b1_y2 - b1_y1).clamp(eps)
w2, h2 = b2_x2 - b2_x1, (b2_y2 - b2_y1).clamp(eps)
# Intersection area
inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp(0) * \
(b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp(0)
# Union Area
union = w1 * h1 + w2 * h2 - inter + eps
if scale:
self = WIoU_Scale(1 - (inter / union))
# IoU
# iou = inter / union # ori iou
iou = torch.pow(inter/(union + eps), alpha) # alpha iou
if CIoU or DIoU or GIoU or EIoU or SIoU or WIoU:
cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1) # convex (smallest enclosing box) width
ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1) # convex height
if CIoU or DIoU or EIoU or SIoU or WIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
c2 = (cw ** 2 + ch ** 2) ** alpha + eps # convex diagonal squared
rho2 = (((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4) ** alpha # center dist ** 2
if CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
with torch.no_grad():
alpha_ciou = v / (v - iou + (1 + eps))
if Focal:
return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha)), torch.pow(inter/(union + eps), gamma) # Focal_CIoU
else:
return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha)) # CIoU
elif EIoU:
rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2
rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2
cw2 = torch.pow(cw ** 2 + eps, alpha)
ch2 = torch.pow(ch ** 2 + eps, alpha)
if Focal:
return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2), torch.pow(inter/(union + eps), gamma) # Focal_EIou
else:
return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2) # EIou
elif SIoU:
# SIoU Loss https://arxiv.org/pdf/2205.12740.pdf
s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps
s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps
sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)
sin_alpha_1 = torch.abs(s_cw) / sigma
sin_alpha_2 = torch.abs(s_ch) / sigma
threshold = pow(2, 0.5) / 2
sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)
angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)
rho_x = (s_cw / cw) ** 2
rho_y = (s_ch / ch) ** 2
gamma = angle_cost - 2
distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)
omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)
omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)
shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)
if Focal:
return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha), torch.pow(inter/(union + eps), gamma) # Focal_SIou
else:
return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha) # SIou
elif WIoU:
if Focal:
raise RuntimeError("WIoU do not support Focal.")
elif scale:
return getattr(WIoU_Scale, '_scaled_loss')(self), (1 - iou) * torch.exp((rho2 / c2)), iou # WIoU https://arxiv.org/abs/2301.10051
else:
return iou, torch.exp((rho2 / c2)) # WIoU v1
if Focal:
return iou - rho2 / c2, torch.pow(inter/(union + eps), gamma) # Focal_DIoU
else:
return iou - rho2 / c2 # DIoU
c_area = cw * ch + eps # convex area
if Focal:
return iou - torch.pow((c_area - union) / c_area + eps, alpha), torch.pow(inter/(union + eps), gamma) # Focal_GIoU https://arxiv.org/pdf/1902.09630.pdf
else:
return iou - torch.pow((c_area - union) / c_area + eps, alpha) # GIoU https://arxiv.org/pdf/1902.09630.pdf
if Focal:
return iou, torch.pow(inter/(union + eps), gamma) # Focal_IoU
else:
return iou # IoU
### yolov8
if type(iou) is tuple:
if len(iou) == 2:
loss_iou = ((1.0 - iou[0]) * iou[1].detach() * weight).sum() / target_scores_sum
else:
loss_iou = (iou[0] * iou[1] * weight).sum() / target_scores_sum
else:
loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum
### yolov5
iou = bbox_iou(pbox, tbox[i], CIoU=True)
if type(iou) is tuple:
if len(iou) == 2:
lbox += (iou[1].detach().squeeze() * (1 - iou[0].squeeze())).mean()
iou = iou[0].squeeze()
else:
lbox += (iou[0] * iou[1]).mean()
iou = iou[2].squeeze()
else:
lbox += (1.0 - iou.squeeze()).mean() # iou loss
iou = iou.squeeze()
知乎:https://zhuanlan.zhihu.com/p/94799295
csdn:https://blog.csdn.net/xian0710830114/article/details/128177705
csdn:https://blog.csdn.net/qq_55745968/article/details/128888122
csdn:https://blog.csdn.net/weixin_43980331/article/details/126159134