之前写了mmdetection的模型创建部分,这次以cascade rcnn为例具体看下网络是怎么构建的。
讲网络之前,要先看看配置文件,这里我主要结合官方提供的cascade_mask_rcnn_r50_fpn_1x.py
来看具体实现,关于这些配置项具体的含义可以看mmdetection的configs中的各项参数具体解释
创建cascade rcnn网络
先找到cascade rcnn的定义文件mmdet/models/detectors/cascade_rcnn.py
这里我将cascade rcnn网络的创建过程主要分为5个部分。
- backbone
- neck
- rpn_head
- bbox_head
- mask_head
backbone
cascade rcnn的backb选择的是res50
,创建backbone的方式和之前一样,也是将支持的模型注册到registry
中,只后再通过builder
进行实例化。
resnet
的定义文件在mmdet/models/backbones/resnet.py
def forward(self, x):
x = self.conv1(x)
x = self.norm1(x)
x = self.relu(x)
x = self.maxpool(x)
outs = []
for i, layer_name in enumerate(self.res_layers):
res_layer = getattr(self, layer_name)
x = res_layer(x)
if i in self.out_indices:
outs.append(x)
if len(outs) == 1:
return outs[0]
else:
return tuple(outs)
在forward
中outs取的是多stage的输出,先拼成一个list在转成tuple,取哪些stage是根据config中的out_indices
。
model = dict(
type='CascadeRCNN',
num_stages=3,
pretrained='modelzoo://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
backbone是4stage,取了所有的stage。
backbone的主要作用就是提取图像特征。
neck
这部分主要是实现FPN
,FPN讲解
先看下config文件中与FPN相关的部分
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
in_channels
与之前backbone
的输出相匹配,out_channels
为输出纬度。
FPN
定义在mmdet/models/necks/fpn.py
,其中__init__.py
中
for i in range(self.start_level, self.backbone_end_level):
l_conv = ConvModule(
in_channels[i],
out_channels,
1,
normalize=normalize,
bias=self.with_bias,
activation=self.activation,
inplace=False)
fpn_conv = ConvModule(
out_channels,
out_channels,
3,
padding=1,
normalize=normalize,
bias=self.with_bias,
activation=self.activation,
inplace=False)
self.lateral_convs.append(l_conv)
self.fpn_convs.append(fpn_conv)
这里的self.start_level
为0 self.backbone_end_level
为len(in_channels)
,也就是说这里定义的lateral_convs
和fpn_convs
的长度和输入的长度是相等的。
这里可以这样理解,之前backbone的输出是多层的特征图,这里对每层的输出用不同的ConvModule
来处理,再统一channel
数,就完成了高低层特征的融合。可能比较绕,结合代码就比较好理解了。
下面是forward
函数部分代码。
# build laterals
laterals = [
lateral_conv(inputs[i + self.start_level])
for i, lateral_conv in enumerate(self.lateral_convs)
]
# part 1: from original levels
outs = [
self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels)
]
其实这部分也可以看成是在提取特征,到下面RPN部分就真正涉及到目标检测了。
RPN HEAD
cascade rcnn
的rpn_head
乍一看感觉还挺简单的,因为这部分主要就两个网络。主要涉及到两个文件mmdet/models/anchor_head/anchor_head.py
和mmdet/models/anchor_head/rpn_head.py
后者是前者的子类。
先是config相关项
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
use_sigmoid_cls=True),
rpn_head
的主要实现如下
#定义网络
def _init_layers(self):
self.rpn_conv = nn.Conv2d(
self.in_channels, self.feat_channels, 3, padding=1)
self.rpn_cls = nn.Conv2d(self.feat_channels,
self.num_anchors * self.cls_out_channels, 1)
self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_anchors * 4, 1)
#forward
def forward_single(self, x):
x = self.rpn_conv(x)
x = F.relu(x, inplace=True)
rpn_cls_score = self.rpn_cls(x)
rpn_bbox_pred = self.rpn_reg(x)
return rpn_cls_score, rpn_bbox_pred
很简单,就只有两个网络,判断是否是前景(rpn_cls),预测框的修改值(rpn_reg)。并且其中self.num_anchors = len(self.anchor_ratios) * len(self.anchor_scales)
。
但是RPN的目标是得到候选框,所以这里就还要用到anchor_head.py
中的另一个函数get_bboxs()
def get_bboxes(self, cls_scores, bbox_preds, img_metas, cfg,
rescale=False):
assert len(cls_scores) == len(bbox_preds)
num_levels = len(cls_scores)
mlvl_anchors = [
self.anchor_generators[i].grid_anchors(cls_scores[i].size()[-2:], self.anchor_strides[i])
for i in range(num_levels)
]
result_list = []
for img_id in range(len(img_metas)):
cls_score_list = [
cls_scores[i][img_id].detach() for i in range(num_levels)
]
bbox_pred_list = [
bbox_preds[i][img_id].detach() for i in range(num_levels)
]
img_shape = img_metas[img_id]['img_shape']
scale_factor = img_metas[img_id]['scale_factor']
proposals = self.get_bboxes_single(cls_score_list, bbox_pred_list,
mlvl_anchors, img_shape,
scale_factor, cfg, rescale)
result_list.append(proposals)
return result_list
在这里先通过self.anchor_generators[i].grid_anchors()
这个函数取到所有的anchor_boxs
,再通过self.get_bboxes_single()
根据之前rpn的结果获取到候选框(proposal boxs)。
在self.get_bboxes_single()
中,先在每个尺度上取2000个anchor
出来,concat
到一起作为该图像的anchor,对这些anchor boxs
作nms(thr=0.7)
就得到了所需的候选框。
这部分还有他的loss
比较复杂,就放到之后写loss
的时候在一起写。
assigners and samplers
上一步rpn
输出了一堆候选框,但是在将这些候选框拿去训练之前还需要分为正负样本。assigners
就是完成这个工作的。
cascade_rcnn
默认使用的是MaxIoUAssigner
定义在mmdet/core/bbox/assigners/max_iou_assigner.py
主要用到的是assign()
def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None):
"""Assign gt to bboxes.
This method assign a gt bbox to every bbox (proposal/anchor), each bbox
will be assigned with -1, 0, or a positive number. -1 means don't care,
0 means negative sample, positive number is the index (1-based) of
assigned gt.
The assignment is done in following steps, the order matters.
1. assign every bbox to -1
2. assign proposals whose iou with all gts < neg_iou_thr to 0
3. for each bbox, if the iou with its nearest gt >= pos_iou_thr,
assign it to that bbox
4. for each gt bbox, assign its nearest proposals (may be more than
one) to itself
Args:
bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
labelled as `ignored`, e.g., crowd boxes in COCO.
gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
Returns:
:obj:`AssignResult`: The assign result.
"""
if bboxes.shape[0] == 0 or gt_bboxes.shape[0] == 0:
raise ValueError('No gt or bboxes')
bboxes = bboxes[:, :4]
overlaps = bbox_overlaps(gt_bboxes, bboxes)
if (self.ignore_iof_thr > 0) and (gt_bboxes_ignore is not None) and (
gt_bboxes_ignore.numel() > 0):
if self.ignore_wrt_candidates:
ignore_overlaps = bbox_overlaps(
bboxes, gt_bboxes_ignore, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
else:
ignore_overlaps = bbox_overlaps(
gt_bboxes_ignore, bboxes, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=0)
overlaps[:, ignore_max_overlaps > self.ignore_iof_thr] = -1
assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
return assign_result
将proposal
分为正负样本过后,通过sampler
对这些proposal
进行采样得到sampler_result
进行训练。
cascade_rcnn
默认使用的是RandomSampler
定义在mmdet/core/bbox/sampler/random_sampler.py
@staticmethod
def random_choice(gallery, num):
"""Random select some elements from the gallery.
It seems that Pytorch's implementation is slower than numpy so we use
numpy to randperm the indices.
"""
assert len(gallery) >= num
if isinstance(gallery, list):
gallery = np.array(gallery)
cands = np.arange(len(gallery))
np.random.shuffle(cands)
rand_inds = cands[:num]
if not isinstance(gallery, np.ndarray):
rand_inds = torch.from_numpy(rand_inds).long().to(gallery.device)
return gallery[rand_inds]
def _sample_pos(self, assign_result, num_expected, **kwargs):
"""Randomly sample some positive samples."""
pos_inds = torch.nonzero(assign_result.gt_inds > 0)
if pos_inds.numel() != 0:
pos_inds = pos_inds.squeeze(1)
if pos_inds.numel() <= num_expected:
return pos_inds
else:
return self.random_choice(pos_inds, num_expected)
def _sample_neg(self, assign_result, num_expected, **kwargs):
"""Randomly sample some negative samples."""
neg_inds = torch.nonzero(assign_result.gt_inds == 0)
if neg_inds.numel() != 0:
neg_inds = neg_inds.squeeze(1)
if len(neg_inds) <= num_expected:
return neg_inds
else:
return self.random_choice(neg_inds, num_expected)
重写了两个sample函数供父类调用。
主要用到的是其父类mmdet/core/bbox/sampler/base_sampler.py
定义的sample
def sample(self,
assign_result,
bboxes,
gt_bboxes,
gt_labels=None,
**kwargs):
"""Sample positive and negative bboxes.
This is a simple implementation of bbox sampling given candidates,
assigning results and ground truth bboxes.
Args:
assign_result (:obj:`AssignResult`): Bbox assigning results.
bboxes (Tensor): Boxes to be sampled from.
gt_bboxes (Tensor): Ground truth bboxes.
gt_labels (Tensor, optional): Class labels of ground truth bboxes.
Returns:
:obj:`SamplingResult`: Sampling result.
"""
bboxes = bboxes[:, :4]
gt_flags = bboxes.new_zeros((bboxes.shape[0], ), dtype=torch.uint8)
if self.add_gt_as_proposals:
bboxes = torch.cat([gt_bboxes, bboxes], dim=0)
assign_result.add_gt_(gt_labels)
gt_ones = bboxes.new_ones(gt_bboxes.shape[0], dtype=torch.uint8)
gt_flags = torch.cat([gt_ones, gt_flags])
num_expected_pos = int(self.num * self.pos_fraction)
pos_inds = self.pos_sampler._sample_pos(
assign_result, num_expected_pos, bboxes=bboxes, **kwargs)
# We found that sampled indices have duplicated items occasionally.
# (may be a bug of PyTorch)
pos_inds = pos_inds.unique()
num_sampled_pos = pos_inds.numel()
num_expected_neg = self.num - num_sampled_pos
if self.neg_pos_ub >= 0:
_pos = max(1, num_sampled_pos)
neg_upper_bound = int(self.neg_pos_ub * _pos)
if num_expected_neg > neg_upper_bound:
num_expected_neg = neg_upper_bound
neg_inds = self.neg_sampler._sample_neg(
assign_result, num_expected_neg, bboxes=bboxes, **kwargs)
neg_inds = neg_inds.unique()
return SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes,
assign_result, gt_flags)
现在bbox已经处理好了,之后就是将这些框分别送到bbox head
和mask head
了。
bbox head
当然之前得到的那些框还不能直接送到bbox head
,在此之前还要做一次RoI Pooling
,将不同大小的框映射成固定大小。
具体定义在mmdet/models/roi_extractors/single_level.py
def forward(self, feats, rois):
if len(feats) == 1:
return self.roi_layers[0](feats[0], rois)
out_size = self.roi_layers[0].out_size
num_levels = len(feats)
target_lvls = self.map_roi_levels(rois, num_levels)
roi_feats = torch.cuda.FloatTensor(rois.size()[0], self.out_channels,
out_size, out_size).fill_(0)
for i in range(num_levels):
inds = target_lvls == i
if inds.any():
rois_ = rois[inds, :]
roi_feats_t = self.roi_layers[i](feats[i], rois_)
roi_feats[inds] += roi_feats_t
return roi_feats
这里的roi_layers
用的是RoIAlign
,RoI的结果就可以送到bbox head
了。
bbox head
部分和之前的rpn
部分的操作差不多,主要是针对每个框进行分类和坐标修正。之前rpn
分为前景和背景两类,这里分为N+1
类(实际类别 + 背景)。具体代码在mmdet/models/bbox_head/convfc_bbox_head.py
def forward(self, x):
# shared part
if self.num_shared_convs > 0:
for conv in self.shared_convs:
x = conv(x)
if self.num_shared_fcs > 0:
if self.with_avg_pool:
x = self.avg_pool(x)
x = x.view(x.size(0), -1)
for fc in self.shared_fcs:
x = self.relu(fc(x))
# separate branches
x_cls = x
x_reg = x
for conv in self.cls_convs:
x_cls = conv(x_cls)
if x_cls.dim() > 2:
if self.with_avg_pool:
x_cls = self.avg_pool(x_cls)
x_cls = x_cls.view(x_cls.size(0), -1)
for fc in self.cls_fcs:
x_cls = self.relu(fc(x_cls))
for conv in self.reg_convs:
x_reg = conv(x_reg)
if x_reg.dim() > 2:
if self.with_avg_pool:
x_reg = self.avg_pool(x_reg)
x_reg = x_reg.view(x_reg.size(0), -1)
for fc in self.reg_fcs:
x_reg = self.relu(fc(x_reg))
cls_score = self.fc_cls(x_cls) if self.with_cls else None
bbox_pred = self.fc_reg(x_reg) if self.with_reg else None
return cls_score, bbox_pred
forward
的输出就是框的分类score和坐标。
之后再通过这两个结果去计算bbox_loss
,这个也放到之后在写。
下面就是与 bbox head
平行的另一个分支mask head
了。
mask head
mask
部分的流程和bbox
部分相同,也是先对之前的候选框先做一次RoI Pooling
,这里的RoI
与之前bbox
网络都一样只是部分参数不同。
具体定义在mmdet/models/mask_heads/fcn_mask_head.py
def forward(self, x):
for conv in self.convs:
x = conv(x)
if self.upsample is not None:
x = self.upsample(x)
if self.upsample_method == 'deconv':
x = self.relu(x)
mask_pred = self.conv_logits(x)
return mask_pred
forward
的输出就是每个像素点的分类值,之后也是通过这个结果去计算mask loss
。
在bbox head
和这部分forward
的输出结果都不是测试阶段的最终结果,还需要进行其他操作才能得到测试结果。这部分之后写test
的时候再写。
小结
这篇主要写了mmdetection
中cascade_rcnn
的网络创建过程,之前想的是慢慢抠细节,争取把每部分的细节都写了,但是实际看的时候还是觉得太复杂了,就先把整体流程写了一遍,相当于把整体骨架写了。准备之后把loss
和测试部分写完了,在慢慢来抠每个部分的细节。