无论是anchor box 还是anchor free,在训练计算类别/前背景损失时都需用到正负样本匹配,目前分为两大类:
第一类 fixed label assignment,常用的主要有MaxIou、ATSS、focos
第二类 dyanmic label assignment,常用的主要有simOTA、TaskAlign。
匹配原则:同一个anchor 只能被分配给某一个GT,但是一个GT可以被分配多个anchor
Maxiou主要通过计算anchor box与GT之间的iou,并分别设定正负样本的阈值,完成正负样本的匹配,常见的运用包括Faster-RCNN、Mask-RCNN、SSD、YOLOv3等。
1. 计算每个GT和所有anchor box的iou
2. 对于每个anchor box,找到与它最匹配的GT对应的最大iou
3. 若该最大iou < 负样本阈值,那么该anchor box 为负样本
4. 若该最大iou > 正样本阈值,那么该anchor box 为正样本
5. 若采用强制正样本(self.match_low_quality):对于每个GT,找到与其iou最大(最匹配)的anchor,该anchor即为该GT的正样本。
6. 步骤5 存在的问题:若anchor A 与 GT 1 的iou为0.9,与GT 2的iou为0.8,那么依据步骤2和4,anchor A会被匹配给与它iou最大的GT1,但是假如与GT 2 iou最大的anchor就是anchor A,那么依据步骤5,anchor A会被重新分配给GT2。此时可能存在GT1没有被分配到anchor的情况,同时GT1匹配的anchor质量也下降了。
def assign_wrt_overlaps(self, overlaps, gt_labels=None):
num_gts, num_bboxes = overlaps.size(0), overlaps.size(1)
# 1. assign -1 by default
assigned_gt_inds = overlaps.new_full((num_bboxes, ),-1, dtype=torch.long)
# for each anchor, the max iou of all gts,找出每个anchor最匹配的GT
max_overlaps, argmax_overlaps = overlaps.max(dim=0)
# for each gt, the max iou of all proposals,找出每个GT最匹配的anchor
gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=1)
# 2. assign negative: below
# the negative inds are set to be 0
assigned_gt_inds[(max_overlaps >= 0) & (max_overlaps < self.neg_iou_thr)] = 0
# 3. assign positive: above positive IoU threshold
pos_inds = max_overlaps >= self.pos_iou_thr
assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1
if self.match_low_quality:
# Low-quality matching will overwrite the assigned_gt_inds assigned
# in Step 3. Thus, the assigned gt might not be the best one for
# prediction.
# For example, if bbox A has 0.9 and 0.8 iou with GT bbox 1 & 2,
# bbox 1 will be assigned as the best target for bbox A in step 3.
# However, if GT bbox 2's gt_argmax_overlaps = A, bbox A's
# assigned_gt_inds will be overwritten to be bbox 2.
# This might be the reason that it is not used in ROI Heads.
for i in range(num_gts):
if gt_max_overlaps[i] >= self.min_pos_iou:
if self.gt_max_assign_all: # 若与该gt最近的bbox有多个时,是否对所有argmax bbox赋值
max_iou_inds = overlaps[i, :] == gt_max_overlaps[i]
assigned_gt_inds[max_iou_inds] = i + 1
else:
assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1 # 只对一个argmax bbox赋值
if gt_labels is not None:
assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
pos_inds = torch.nonzero(
assigned_gt_inds > 0, as_tuple=False).squeeze()
if pos_inds.numel() > 0:
assigned_labels[pos_inds] = gt_labels[
assigned_gt_inds[pos_inds] - 1]
else:
assigned_labels = None
return AssignResult(num_gts, assigned_gt_inds, max_overlaps, labels=assigned_labels)
def assign(self,
bboxes,
num_level_bboxes,
gt_bboxes,
gt_bboxes_ignore=None,
gt_labels=None,
cls_scores=None,
bbox_preds=None):
INF = 100000000
bboxes = bboxes[:, :4]
num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)
overlaps = self.iou_calculator(bboxes, gt_bboxes)
# assign 0 by default
assigned_gt_inds = overlaps.new_full((num_bboxes,), 0, dtype=torch.long)
# 1. 计算所有anchor_boxes 与 gt_bboxes之间的中心点距离
gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
gt_points = torch.stack((gt_cx, gt_cy), dim=1) # [gt_nums, 2]
bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1) # [anchor_nums, 2]
distances = (bboxes_points[:, None, :] - gt_points[None, :, :]).pow(2).sum(-1).sqrt() # [anchor_nums, gt_nums]
# 2. 遍历5个输出层,对于每个gt_bbox,都选择topk=9个L2距离最小的anchor作为候选框,此步骤完成后每个gt_bbox,一共挑选出 9x5=45 个候选 anchor
candidate_idxs = []
start_idx = 0
for level, bboxes_per_level in enumerate(num_level_bboxes):
# on each pyramid level, for each gt,
# select k bbox whose center are closest to the gt center
end_idx = start_idx + bboxes_per_level
distances_per_level = distances[start_idx:end_idx, :]
selectable_k = min(self.topk, bboxes_per_level)
_, topk_idxs_per_level = distances_per_level.topk(selectable_k, dim=0, largest=False)
candidate_idxs.append(topk_idxs_per_level + start_idx)
start_idx = end_idx
candidate_idxs = torch.cat(candidate_idxs, dim=0)
# 3. 计算这45个候选框与其对应GT的iou,并求均值和标准差的和作为正样本筛选阈值
candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)] # [45, gt_nums]
overlaps_mean_per_gt = candidate_overlaps.mean(0) # [gt_nums]
overlaps_std_per_gt = candidate_overlaps.std(0) # [gt_nums]
overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt
# 4. 找出与Gt的iou大于筛选阈值的候选框
is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]
# 5. 判断候选框的中心是否在GT内部,满足候选框中心在Gt内部且与Gt的iou大于筛选阈值的为正样本
for gt_idx in range(num_gt):
candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
ep_bboxes_cx = bboxes_cx.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1)
ep_bboxes_cy = bboxes_cy.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1)
candidate_idxs = candidate_idxs.view(-1)
# 候选框中心到Gt四条边的距离
# bbox center and gt side
l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01
is_pos = is_pos & is_in_gts
# 6. 若一个候选框同时匹配了多个GT,则将其匹配给与它iou最大的那个GT
overlaps_inf = torch.full_like(overlaps, -INF).t().contiguous().view(-1)
index = candidate_idxs.view(-1)[is_pos.view(-1)]
overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
overlaps_inf = overlaps_inf.view(num_gt, -1).t()
max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
assigned_gt_inds[max_overlaps != -INF] = argmax_overlaps[max_overlaps != -INF] + 1
if gt_labels is not None:
assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1)
pos_inds = torch.nonzero(assigned_gt_inds > 0, as_tuple=False).squeeze()
if pos_inds.numel() > 0:
assigned_labels[pos_inds] = gt_labels[assigned_gt_inds[pos_inds] - 1]
else:
assigned_labels = None
return AssignResult(num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
优点:
1. 考虑到了GT中心点对样本匹配的影响。
在RetinaNet中,anchor box与GT中心点越近一般IoU越高,而在FCOS中,中心点越近一般预测的质量越高。
2. 正负样本阈值非人工确定,而是根据候选框与GT框的iou的均值+标准差动态确定。
均值反映了预设的anchor与GT的匹配程度,均值高则应当提高阈值来调整正样本,均值低则应当降低阈值来调整正样本。标准差则反映了适合GT的特征层,标准差高则表示高质量的anchor box集中在一个层中,应将阈值加上标准差来过滤其他层的anchor box,低则表示多个层都适合该GT。均值和标准差结合作为IoU阈值能够很好地自动选择对应的特征层上合适的anchor box
3. 限制anchor的中心在GT区域内
若anchor box的中心点不在GT区域内,则其会使用非GT区域的特征进行预测,这不利于训练,应该排除 。
4. 不同大小、不同长宽比的GT分配的anchor数量均衡。
5. 仅有一个超参数K且影响较小,接近于没有超参。
缺点:
假设候选框的质量都不行,(均值很低)也会强行进行正样本匹配,因此容易带来误检问题。