anchor 的作用:anchor 是用来做辅助计算的,用于和 (上节课说的,由RPN Head 计算出的)bounding box regression 一起,计算出 预测的 候选框的坐标信息。
** 我理解 bounding box regression 是一个 相对位置信息,且是一个系数。
每个size 对应三个高宽比,会创建 3个 anchor。
比如,size=32, aspect_ratios= (0.5, 1.0, 2.0) 时, 会创建如下 3个 anchor :
因为有 (32, 64, 128, 256, 512) 这 5个size,每个 size 分别会生成 3个不同 aspect_ratios 的anchor,所以,一共会生成 15个 不同尺寸的 anchor。
我们仍然 以 size=32, aspect_ratios= (0.5, 1.0, 2.0) 举例:
当 aspect_ratios=0.5 时,假设 高为 x x x, 宽为 2 x 2x 2x,
anchor 面积为: x ∗ 2 x = 3 2 2 x * 2x =32^2 x∗2x=322 , 解方程 得 x = 32 2 x = \frac{32}{\sqrt{2}} x=232
所以, a n c h o r _ h e i g h t = x = 32 2 anchor\_height = x = \frac{32}{\sqrt{2}} anchor_height=x=232 , a n c h o r _ w i d t h = 2 x = 32 2 anchor\_width = 2x = 32 \sqrt{2} anchor_width=2x=322
当 aspect_ratios= 1时, a n c h o r _ h e i g h t = 32 anchor\_height = 32 anchor_height=32 , a n c h o r _ w i d t h = 32 anchor\_width = 32 anchor_width=32
当 aspect_ratios=2 时, a n c h o r _ w i d t h = x = 32 2 anchor\_width = x = \frac{32}{\sqrt{2}} anchor_width=x=232 , a n c h o r _ h e i g h t = 2 x = 32 2 anchor\_height = 2x = 32 \sqrt{2} anchor_height=2x=322
为了计算方便,我们设计一个 h_ratios 和 w_ratios : h _ r a t i o s = [ 1 2 , 1 , 2 ] , w _ r a t i o s = [ 2 , 1 , 1 2 ] h\_ratios = [\frac{1}{\sqrt{2}}, 1, \sqrt{2}], \quad w\_ratios = [\sqrt{2}, 1, \frac{1}{\sqrt{2}}] h_ratios=[21,1,2],w_ratios=[2,1,21]
使得:
a n c h o r _ w i d t h = w _ r a t i o s ∗ s c a l e s = ( 2 , 1 , 1 2 ) ∗ 32 = [ 45.2548 , 32.0000 , 22.6274 ] anchor\_width = w\_ratios * scales = (\sqrt{2}, 1, \frac{1}{\sqrt{2}}) * 32 = [45.2548, 32.0000, 22.6274] anchor_width=w_ratios∗scales=(2,1,21)∗32=[45.2548,32.0000,22.6274]
a n c h o r _ h e i g t h = h _ r a t i o s ∗ s c a l e s = ( 1 2 , 1 , 2 ) ∗ 32 = [ 22.6274 , 32.0000 , 45.2548 ] anchor\_heigth = h\_ratios * scales = (\frac{1}{\sqrt{2}}, 1, \sqrt{2}) * 32 = [22.6274, 32.0000, 45.2548] anchor_heigth=h_ratios∗scales=(21,1,2)∗32=[22.6274,32.0000,45.2548]
最后,我们将 anchor 中心位置的坐标作为 [0, 0],分别计算 3个 anchor 左上角的坐标和右下角的坐标:
最后,将坐标进行四舍五入, 得:
size=32, aspect_ratios= 0.5 的 anchor :( xmin, xmax, ymin, ymax)= [-23., -11., 23., 11.]
size=32, aspect_ratios= 0.5 的 anchor :( xmin, xmax, ymin, ymax)= [-16., -16., 16., 16.]
size=32, aspect_ratios= 0.5 的 anchor :( xmin, xmax, ymin, ymax)= [-11., -23., 11., 23.]
代码如下;
def generate_anchors(self, scales, aspect_ratios, dtype, device):
scales = torch.as_tensor(scales, dtype=dtype, device=device)
aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device)
h_ratios = torch.sqrt(aspect_ratios)
w_ratios = 1.0 / h_ratios
ws = (w_ratios[:, None] * scales[None, :]).view(-1)
hs = (h_ratios[:, None] * scales[None, :]).view(-1)
base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2
return base_anchors.round() # round 四舍五入
同理,我们可以求出 size 分别为 64、128、256、512 的 12个 anchor 的坐标信息。
一共15个 anchor 如下:
[ -23., -11., 23., 11.],
[ -16., -16., 16., 16.],
[ -11., -23., 11., 23.],
[ -45., -23., 45., 23.],
[ -32., -32., 32., 32.],
[ -23., -45., 23., 45.],
[ -91., -45., 91., 45.],
[ -64., -64., 64., 64.],
[ -45., -91., 45., 91.],
[-181., -91., 181., 91.],
[-128., -128., 128., 128.],
[ -91., -181., 91., 181.],
[-362., -181., 362., 181.],
[-256., -256., 256., 256.],
[-181., -362., 181., 362.]])
到现在为止,我们得到的 anchor 都是自己在跟自己玩,下面我们要把它映射到 原图上去
我们的做法可以理解为:
接下来,我们举例计算细节:
假设 原图的尺寸为 (10, 12), 特征图的尺寸为 (3, 4)
缩放尺寸为 (这里称为步幅) : stride=(10//3, 12//4)=(3, 3)
特征图上的 高和宽的坐标,按照步幅映射到原图,如下
x轴坐标 ( 0 , 1 , 2 ) × 3 = ( 0 , 3 , 6 ) ( 0, 1, 2) \times 3 = (0, 3, 6) (0,1,2)×3=(0,3,6)
y轴坐标 ( 0 , 1 , 2 , 3 ) × 3 = ( 0 , 3 , 6 , 9 ) ( 0, 1, 2, 3)\times 3 = (0, 3, 6, 9) (0,1,2,3)×3=(0,3,6,9)
然后通过 torch.meshgrid() 函数,将 每个元素的 x,y 坐标都复制出来,然后展平
通过函数 torch.stack([shift_x, shift_y, shift_x, shift_y], dim=1)
将他们叠起来,作为 anchor 在原图中的绝对坐标的中心。
再把之前计算出的 以(0, 0)为中心的相对坐标加上,就会得到 anchor 在原图上的绝对坐标啦!
也就是上面的每一列 都加上 我们之前计算得到的 15个anchor相对坐标,一共会得到 12 *15 = 180个anchor
[ -23., -11., 23., 11.]
[ -16., -16., 16., 16.]
[ -11., -23., 11., 23.]
[ -45., -23., 45., 23.]
[ -32., -32., 32., 32.]
[ -23., -45., 23., 45.]
[ -91., -45., 91., 45.]
[ -64., -64., 64., 64.]
[ -45., -91., 45., 91.]
[-181., -91., 181., 91.]
[-128., -128., 128., 128.]
[ -91., -181., 91., 181.]
[-362., -181., 362., 181.]
[-256., -256., 256., 256.]
[-181., -362., 181., 362.]
这一部分的代码
def grid_anchors(self, feature_map_size, strides):
cell_anchors = self.cell_anchors
grid_height, grid_width = feature_map_size
stride_height, stride_width = strides
device = cell_anchors[0].device
shifts_x = torch.arange(0, grid_width, dtype=torch.float32, device=device) * stride_width
shifts_y = torch.arange(0, grid_height, dtype=torch.float32, device=device) * stride_height
shift_y, shift_x = torch.meshgrid([shifts_y, shifts_x], indexing='ij')
shift_x = shift_x.reshape(-1)
shift_y = shift_y.reshape(-1)
shifts = torch.stack([shift_x, shift_y, shift_x, shift_y], dim=1)
shifts_anchor = shifts.view(-1, 1, 4) + cell_anchors[0].view(1, -1, 4)
return shifts_anchor.reshape(-1, 4) # List[Tensor(all_num_anchors, 4)]
class AnchorsGenerator(torch.nn.Module):
def __init__(self, sizes, aspect_ratios):
# anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
# aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
super(AnchorsGenerator, self).__init__()
self.sizes = sizes
self.aspect_ratios = aspect_ratios
self.cell_anchors = None
self._cache = {}
def forward(self, image_list, feature_maps):
feature_map_size = feature_maps.shape[-2:]
image_size = image_list.tensors.shape[-2:]
dtype, device = feature_maps.dtype, feature_maps.device
strides = [torch.tensor(image_size[0] // feature_map_size[0], dtype=torch.int64, device=device),
torch.tensor(image_size[1] // feature_map_size[1], dtype=torch.int64, device=device)]
cell_anchors = [
self.generate_anchors(sizes, aspect_ratios, dtype, device)
for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios)
]
self.cell_anchors = [torch.concat(cell_anchors, dim=0)]
anchors_over_all_feature_maps = self.grid_anchors(feature_map_size, strides)
anchors = [anchors_over_all_feature_maps for i in range(feature_maps.shape[0])]
return anchors
def generate_anchors(self, scales, aspect_ratios, dtype, device):
# # type: (List[int], List[float], torch.dtype, torch.device) -> Tensor
scales = torch.as_tensor(scales, dtype=dtype, device=device)
aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device)
h_ratios = torch.sqrt(aspect_ratios)
w_ratios = 1.0 / h_ratios
ws = (w_ratios[:, None] * scales[None, :]).view(-1)
hs = (h_ratios[:, None] * scales[None, :]).view(-1)
base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2
return base_anchors.round() # round 四舍五入
def grid_anchors(self, feature_map_size, strides):
# # type: (torch.Size([int, int]), List[Tensor, Tensor]) -> List[Tensor]
cell_anchors = self.cell_anchors
grid_height, grid_width = feature_map_size
stride_height, stride_width = strides
device = cell_anchors[0].device
shifts_x = torch.arange(0, grid_width, dtype=torch.float32, device=device) * stride_width
shifts_y = torch.arange(0, grid_height, dtype=torch.float32, device=device) * stride_height
shift_y, shift_x = torch.meshgrid([shifts_y, shifts_x], indexing='ij')
shift_x = shift_x.reshape(-1)
shift_y = shift_y.reshape(-1)
shifts = torch.stack([shift_x, shift_y, shift_x, shift_y], dim=1)
shifts_anchor = shifts.view(-1, 1, 4) + cell_anchors[0].view(1, -1, 4)
return shifts_anchor.reshape(-1, 4) # List[Tensor(all_num_anchors, 4)]