✔️ SSD从Conv4_3开始,一共提取了6个特征图,其大小分别为 (38,38),(19,19),(10,10),(5,5),(3,3),(1,1),但是每个特征图上设置的先验框数量不同。
✔️ 先验框的设置,包括尺度(或者说大小)和长宽比两个方面。对于先验框的尺度,其遵守一个线性递增规则:
随着特征图大小降低,先验框尺度线性增加:
num_priors = 38x38x4+19x19x6+10x10x6+5x5x6+3x3x4+1x1x4=8732
"""需要用到的参数:
min_dim = 300
"输入图最短边的尺寸"
feature_maps = [38, 19, 10, 5, 3, 1]
steps = [8, 16, 32, 64, 100, 300]
"共有6个特征图:
feature_maps指的是在某一层特征图中,遍历一行/列需要的步数
steps指特征图中两像素点相距n则在原图中相距steps[k]*n
由于steps由于网络结构所以为固定,所以作者应该是由300/steps[k]得到feature_maps"
min_sizes = [30, 60, 111, 162, 213, 264]
max_sizes = [60, 111, 162, 213, 264, 315]
"min_sizes和max_sizes共同使用为用于计算aspect_ratios=1时
rel size: sqrt(s_k * s_(k+1))时所用"
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
"各层除1以外的aspect_ratios,可以看出是各不相同的,
这样每层特征图的每个像素点分别有[4,6,6,6,4,4]个default boxes
作者也在原文中提到这个可以根据自己的场景适当调整"
"""
class PriorBox(object):
"""
1、计算先验框,根据feature map的每个像素生成box;
2、框的中个数为: 38×38×4+19×19×6+10×10×6+5×5×6+3×3×4+1×1×4=8732
3、 cfg: SSD的参数配置,字典类型
"""
def __init__(self, cfg):
super(PriorBox, self).__init__()
self.img_size = cfg['img_size']
self.feature_maps = cfg['feature_maps']
self.min_sizes = cfg['min_sizes']
self.max_sizes = cfg['max_sizes']
self.steps = cfg['steps']
self.aspect_ratios = cfg['aspect_ratios']
self.clip = cfg['clip']
self.version = cfg['name']
self.variance = cfg['variance']
def forward(self):
mean = [] #用来存放 box的参数
# 遍多尺度的 map: [38, 19, 10, 5, 3, 1]
for k, f in enumerate(self.feature_maps):
# 遍历每个像素
for i, j in product(range(f), repeat=2):
# k-th 层的feature map 大小
f_k = self.img_size/self.steps[k]
# 每个框的中心坐标,从论文以及代码复现可知0
cx = (i+0.5)/f_k
cy = (j+0.5)/f_k
'''
当 ratio==1的时候,会产生两个 box
'''
# r==1, size = s_k, 正方形
s_k = self.min_sizes[k]/self.img_size
mean += [cx, cy, s_k, s_k]
# r==1, size = sqrt(s_k * s_(k+1)), 正方形
s_k_plus = self.max_sizes[k]/self.img_size
s_k_prime = sqrt(s_k * s_k_plus)
mean += [cx, cy, s_k_prime, s_k_prime]
'''
当 ratio != 1 的时候,产生的box为矩形
'''
for r in self.aspect_ratios[k]:
mean += [cx, cy, s_k * sqrt(r), s_k / sqrt(r)]
mean += [cx, cy, s_k / sqrt(r), s_k * sqrt(r)]
# 转化为 torch
boxes = torch.tensor(mean).view(-1, 4)
# 归一化,把输出设置在 [0,1]
if self.clip:
boxes.clamp_(max=1, min=0)
return boxes
# 调试代码
if __name__ == "__main__":
# SSD300 CONFIGS
voc = {
'num_classes': 21,
'lr_steps': (80000, 100000, 120000),
'max_iter': 120000,
'feature_maps': [38, 19, 10, 5, 3, 1],
'img_size': 300,
'steps': [8, 16, 32, 64, 100, 300],
'min_sizes': [30, 60, 111, 162, 213, 264],
'max_sizes': [60, 111, 162, 213, 264, 315],
'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]],
'variance': [0.1, 0.2],
'clip': True,
'name': 'VOC',
}
box = PriorBox(voc)
print('Priors box shape:', box.forward().shape)
print('Priors box:\n',box.forward())
Priors box shape: torch.Size([8732, 4])
Priors box:
tensor([[0.0133, 0.0133, 0.1000, 0.1000],
[0.0133, 0.0133, 0.1414, 0.1414],
[0.0133, 0.0133, 0.1414, 0.0707],
...,
[0.5000, 0.5000, 0.9612, 0.9612],
[0.5000, 0.5000, 1.0000, 0.6223],
[0.5000, 0.5000, 0.6223, 1.0000]])
✔️ 在训练过程中,首先需要确定训练图片中的 ground truth 与哪一个先验框来进行匹配,与之匹配的先验框所对应的边界框将负责预测它。
✔️ SSD的先验框和ground truth匹配原则主要两点: 1. 对于图片中的每个gt,找到与其IOU最大的先验框,该先验框与其匹配,这样可以保证每个gt一定与某个prior匹配。 2. 对于剩余未匹配的priors,若某个gt的IOU大于某个阈值(一般0.5),那么该prior与这个gt匹配。
通常称与gt匹配的prior为正样本,反之,若某一个prior没有与任何一个gt匹配,则为负样本。
某个gt可以和多个prior匹配,而每个prior只能和一个gt进行匹配。
如果多个gt和某一个prior的IOU均大于阈值,那么prior只与IOU最大的那个进行匹配。
def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
'''
Target:
把和每个prior box 有最大的IOU的ground truth box进行匹配,
同时,编码包围框,返回匹配的索引,对应的置信度和位置
Args:
threshold: IOU阈值,小于阈值设为bg
truths: ground truth boxes, shape[N,4]
priors: 先验框, shape[M,4]
variances: prior的方差, list(float)
labels: 图片的所有类别,shape[num_obj]
loc_t: 用于填充encoded loc 目标张量
conf_t: 用于填充encoded conf 目标张量
idx: 现在的batch index
'''
overlaps = iou(truths, point_form(priors))
# [1,num_objects] 和每个ground truth box 交集最大的 prior box
best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
# [1,num_priors] 和每个prior box 交集最大的 ground truth box
best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
# squeeze shape
best_prior_idx.squeeze_(1) #(N)
best_prior_overlap.squeeze_(1) #(N)
best_truth_idx.squeeze_(0) #(M)
best_truth_overlap.squeeze_(0) #(M)
# 保证每个ground truth box 与某一个prior box 匹配,固定值为 2 > threshold
best_truth_overlap.index_fill_(0, best_prior_idx, 2) # ensure best prior
# 保证每一个ground truth 匹配它的都是具有最大IOU的prior
# 根据 best_prior_dix 锁定 best_truth_idx里面的最大IOU prior
for j in range(best_prior_idx.size(0)):
best_truth_idx[best_prior_idx[j]] = j
# 提取出所有匹配的ground truth box, Shape: [M,4]
matches = truths[best_truth_idx]
# 提取出所有GT框的类别, Shape:[M]
conf = labels[best_truth_idx] + 1
# 把 iou < threshold 的框类别设置为 bg,即为0
conf[best_truth_overlap < threshold] = 0
# 编码包围框
loc = encode(matches, priors, variances)
# 保存匹配好的loc和conf到loc_t和conf_t中
loc_t[idx] = loc # [M,4] encoded offsets to learn
conf_t[idx] = conf # [M] top class label for each prior