参考:
mmdetection检测训练和源码解读
源码解读
图像目标检测之cascade-rcnn实践
mmdetection添加focal loss
mmdetection,训练自己的数据
训练自定义的dataset
$ virtualenv myproject
$ source myproject/bin/activate
执行第一个命令在myproject文件夹创建一个隔离的virtualenv环境,第二个命令激活这个隔离的环境(virtualenv)。
在创建virtualenv时,你必须做出决定:这个virtualenv是使用系统全局的模块呢?还是只使用这个virtualenv内的模块。
默认情况下,virtualenv不会使用系统全局模块。
如果你想让你的virtualenv使用系统全局模块,请使用–system-site-packages参数创建你的virtualenv,例如:
virtualenv --system-site-packages mycoolproject
使用以下命令可以退出这个virtualenv:
$ deactivate
参考:
mmdetection训练voc格式数据集
参考二
mmdetection
├── mmdet
├── tools
├── configs
├── data #手动创建data、VOCdevkit、VOC2007、Annotations、JPEGImages、ImageSets、Main这些文件夹
│ ├── VOCdevkit
│ │ ├── VOC2007
│ │ │ ├── Annotations #把test.txt、trainval.txt对应的xml文件放在这
│ │ │ ├── JPEGImages #把test.txt、trainval.txt对应的图片放在这
│ │ │ ├── ImageSets
│ │ │ │ ├── Main
│ │ │ │ │ ├── test.txt (这里面存放的是每张测试图片的图片,不包括后缀,也不要路径)
│ │ │ │ │ ├── trainval.txt(这里面存放的是每张训练图片的图片,不包括后缀,也不要路径)
上图中这里train=dict(…)表示训练时的配置,同样的还要对val,test进行配置
关于上面这段粗体字的理解,我在实际训练时,因为是一块gpu,且设置了这块gpu只训练一张图片,imgs_per_gpu=1。在训练时终端的一部分输出如下:
Epoch [1][50/4325] lr: 0.00100 s0.acc: 86.8906,s1.acc: 92.4492,s2.acc: 91.5039,loss: 1.6009
Epoch [1][100/4325] lr: 0.00116,s0.acc: 83.5664,s1.acc: 86.8912,s2.acc: 93.6975
Epoch [1][150/4325] lr: 0.00133,
Epoch [1][450/4325] lr: 0.00233
Epoch [1][500/4325] lr: 0.00250,
Epoch [1][550/4325] lr: 0.00250,
Epoch [1][4250/4325] lr: 0.00250,s0.acc: 91.8164,s1.acc: 91.8085,s2.acc: 90.3332,
Epoch [1][4300/4325] lr: 0.00250
Epoch [2][50/4325] lr: 0.00250,s0.acc: 92.3086,s1.acc: 93.0722,s2.acc: 91.9551,
Epoch [2][4250/4325] lr: 0.00250,s0.acc: 93.2656,s1.acc: 94.2887,s2.acc: 92.8941,
Epoch [2][4300/4325] lr: 0.00250
Epoch [3][50/4325] lr: 0.00250,
Epoch [3][4300/4325] lr: 0.00250, s0.acc: 94.0078,s1.acc: 94.5620,s2.acc: 93.4796,
Epoch [8][4300/4325] lr: 0.00250, 第8个epoch学习率没变
Epoch [9][250/4325] lr: 0.00025,第9个epoch学习率降低了
Epoch [10][1900/4325] lr: 0.00025,第9个epoch学习率保持着低值
Epoch [11][950/4325] lr: 0.00025,第11个epoch时学习率保持着低值
Epoch [12][4300/4325] lr: 0.00003, s1.acc: 98.8292,s2.acc: 98.5651,loss: 0.1845最后一个epoch学习率又降低了
从输出中可以发现规律,
lr_config = dict(
policy='step',#优化策略
warmup='linear',#初始的学习率增加的策略,linear为线性增加,
warmup_iters=500,#在初始的500次迭代中学习率逐渐增加
warmup_ratio=1.0 / 3,#设置的起始学习率
step=[8, 11])#在第9 第10 和第11个epoch时降低学习率
后来学习率一直稳定在0.0025是因为有代码:
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001)
checkpoint_config = dict(interval=1)#每一个epoch存储一次模型
摘抄:epoch后面的8684是总图片的数量 ,但是实际训练集有69472的图片,而只显示出8648的原因是mmdetection默认一张gpu训练2张图片,而该作者开启了4个gps,所以一个batch的大小是2*4=8 ,69472/8=8684,所以一共8684个batch。
因为我的训练只是目标检测,与分割无关,所以就评估bbox。
与我的训练无关,摘抄,是针对coco数据的训练情况的,coco默认AP即mAP,而不是某一类物体的AP。voc格式的数据集在测试后并不会直接输出mAP,需要保存测试结果另外计算mAP。
学习率的设置也很重要,
上面图片中提到,
4个GPU,每个处理2张图片,即一个batch 8张图片。学习率为0.01
那么一个GPU,每个处理1张图片,那么一个batch 1张图片。学习率是0.01/8=0.00125
若1个GPU,处理2张图片,则一个batch 2张图片,学习率是0.01/4=0.0025
与下面的输出log对应的实际的训练指令
python3 tools/train.py configs/cascade_rcnn_r50_fpn_1x.py
每迭代完一个epoch,会保存一次模型参数(后缀为.pth),位于mmdetection/work_dirs/cascade_rcnn_r50_fpn_1x/。其中work_dirs为自动生成不需要手动创建。
训练产生的输出log如下:
12个epoch共训练了3个小时
(venv) syy@syy1996:~/software/mmdetection$ python3 tools/train.py configs/cascade_rcnn_r50_fpn_1x.py
2019-08-02 16:56:30,801 - INFO - Distributed training: False
2019-08-02 16:56:31,199 - INFO - load model from: modelzoo://resnet50
/home/syy/software/mmdetection2/venv/lib/python3.6/site-packages/mmcv/runner/checkpoint.py:140: UserWarning: The URL scheme of "modelzoo://" is deprecated, please use "torchvision://" instead
2019-08-02 16:56:31,312 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias
missing keys in source state_dict: layer3.0.bn2.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer2.1.bn1.num_batches_tracked, bn1.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer1.1.bn2.num_batches_tracked
**Start running,** INFO - **workflow: [('train', 1)], max: 12 epochs**
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 **error=11 : invalid argument**
**Epoch [1][50/4325**] **lr: 0.00100**, eta: 3:28:16, time: 0.241, data_time: 0.004, memory: 1563, loss_rpn_cls: 0.4341, loss_rpn_bbox: 0.0504, s0.loss_cls: 0.4917, **s0.acc: 86.8906,** s0.loss_bbox: 0.2669, s1.loss_cls: 0.1773, **s1.acc: 92.4492**, s1.loss_bbox: 0.0840, s2.loss_cls: 0.0841, **s2.acc: 91.5039,** s2.loss_bbox: 0.0123, loss: **1.6009**
**Epoch [1][100/4325]** lr: 0.00116, eta: 3:17:24, time: 0.216, data_time: 0.002, memory: 1571, loss_rpn_cls: 0.1258, loss_rpn_bbox: 0.0358, s0.loss_cls: 0.4672, s0.acc: 83.5664, s0.loss_bbox: 0.3891, s1.loss_cls: 0.1946, s1.acc: 86.8912, s1.loss_bbox: 0.2468, s2.loss_cls: 0.0624, s2.acc: 93.6975, s2.loss_bbox: 0.0546, los**s: 1.5763**
Epoch [1][150/4325] lr: 0.00133, eta: 3:13:18, time: 0.215, data_time: 0.002, memory: 1580, loss_rpn_cls: 0.1031, loss_rpn_bbox: 0.0360, s0.loss_cls: 0.4536, s0.acc: 84.0625, s0.loss_bbox: 0.3364, s1.loss_cls: 0.2223, s1.acc: 84.1253, s1.loss_bbox: 0.3252, s2.loss_cls: 0.0901, s2.acc: 87.5991, s2.loss_bbox: 0.1127, loss**: 1.6793**
Epoch [12][4200/4325] lr: 0.00003, eta: 0:00:26, time: 0.213, data_time: 0.002, memory: 1580, loss_rpn_cls: 0.0020, loss_rpn_bbox: 0.0040, s0.loss_cls: 0.0476, s0.acc: 98.0547, s0.loss_bbox: 0.0247, s1.loss_cls: 0.0161, s1.acc: 98.6515, s1.loss_bbox: 0.0298, s2.loss_cls: 0.0081, s2.acc: 98.7081, s2.loss_bbox: 0.0291, loss: 0.1614
Epoch [12][4250/4325] lr: 0.00003, eta: 0:00:15, time: 0.209, data_time: 0.002, memory: 1580, loss_rpn_cls: 0.0014, loss_rpn_bbox: 0.0051, s0.loss_cls: 0.0530, s0.acc: 97.9336, s0.loss_bbox: 0.0276, s1.loss_cls: 0.0157, s1.acc: 98.8566, s1.loss_bbox: 0.0379, s2.loss_cls: 0.0091, s2.acc: 98.7018, s2.loss_bbox: 0.0382, loss: 0.1880
Epoch [12][4300/4325] lr: 0.00003, eta: 0:00:05, time: 0.212, data_time: 0.002, memory: 1580, loss_rpn_cls: 0.0022, loss_rpn_bbox: 0.0057, s0.loss_cls: 0.0495, s0.acc: 97.9922, s0.loss_bbox: 0.0284, s1.loss_cls: 0.0147, s1.acc: 98.8292, s1.loss_bbox: 0.0388, s2.loss_cls: 0.0081, s2.acc: 98.5651, s2.loss_bbox: 0.0371, loss: 0.1845
备注:
在成功运行训练指令之前,遇到的一些问题:
python3 tools/train.py configs/faster_rcnn_r50_fpn_1x.py --gpus 1 --validate --work_dir work_dirs
loading annotations into memory...
2019-07-30 19:08:43,571 - INFO - THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument
RuntimeError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 5.79 GiB total capacity; 4.47 GiB already allocated; 34.56 MiB free; 26.71 MiB cached)
python3 tools/train.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py
File "/home/syy/software/mmdetection/mmdet/ops/dcn/functions/deform_conv.py", line 5, in
from .. import deform_conv_cuda
ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory
python3 tools/train.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py
File "tools/train.py", line 8, in
from mmdet import __version__
ModuleNotFoundError: No module named 'mmdet'
python3 tools/train.py
File "/home/syy/software/mmdetection2/mmdet/datasets/builder.py", line 20, in _concat_dataset
data_cfg['img_prefix'] = img_prefixes[i]
IndexError: list index out of range
python3 tools/train.py configs/cascade_rcnn_r50_fpn_1x.py
FileNotFoundError: [Errno 2] No such file or directory: '/home/syy/data/VOCdevkit/VOC2007/JPEGImages/0005873.xml'
(venv) syy@syy1996:~/software/mmdetection$
对测试集进行测试并保存结果:
#保存的结果为result.pkl
python3 tools/test.py configs/cascade_rcnn_r50_fpn_1x.py work_dirs/cascade_rcnn_r50_fpn_1x/epoch_12.pth --out ./result.pkl
python3 tools/voc_eval.py result.pkl ./configs/cascade_rcnn_r50_fpn_1x.py
最正确的参考链接:mmdetection训练coco格式数据集
python3 tools/train.py configs/faster_rcnn_r50_fpn_1x.py
mmdetection
├── mmdet
├── tools
├── configs
├── checkpoints存放权重文件
├── data
│ ├── coco
│ │ ├── annotations/instances_train2017.json
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
一定要按照上面的这个格式,包括命名也要一模一样,大小写也要一模一样,这是因为源代码中就是这样命名的
cd mmdetection
mkdir data
ln -s $ COCO_ROOT data
$COCO_ROOT 改为自己数据集的路径(写全)
修改相关文件的第一步:
定义数据种类,需要修改的地方在mmdetection/mmdet/datasets/coco.py。把CLASSES的那个tuple改为自己数据集对应的种类tuple即可。
CLASSES = ('WaterBottle', 'Emulsion', )
修改相关文件的第二步:
在mmdetection/mmdet/core/evaluation/class_names.py修改coco_classes数据集类别,这个关系到后面test的时候结果图中显示的类别名称。例如:
def coco_classes():
return [
'WaterBottle', 'Emulsion'
]
修改相关文件的第三步:
修改configs/mask_rcnn_r101_fpn_1x.py(因为我的demo1.py中使用的这config)中的model字典中的num_classes、data字典中的img_scale和optimizer中的lr(学习率)。
num_classes=3,#类别数+1
img_scale=(640, 480),#输入图像尺寸的最大边与最小边,train val test这三处都要修改
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001) #当gpu数量为8时,lr=0.02;当gpu数量为4时,lr=0.01;我只要一个gpu,所以设置lr=0.0025
修改相关文件的第四步:
在mmdetection的目录下新建work_dirs文件夹
sudo python3 setup.py develop
python3 tools/train.py configs/faster_rcnn_r50_fpn_1x.py --gpus 1 --validate --work_dir work_dirs
上面指令中的–validate表示是否在训练中建立checkpoint的时候对该checkpoint进行评估(evaluate)。如果使用是分布式训练,且设置了–validate,会在训练中建立checkpoint的时候对该checkpoint进行评估。(未采用分布式训练时,–validate无效,因为train_detector中调用的mmdet.apis._non_dist_train函数未对validate参数做任何处理)。
有两个方法可以进行测试。
checkpoint_file = 'work_dirs/epoch_100.pth'
python3 tools/test.py configs/mask_rcnn_r101_fpn_1x.py work_dirs/epoch_100.pth --out ./result/result_100.pkl --eval bbox --show
但是使用这个测试命令的时候会报错
使用demo.py来测试是可以出结果的,但是会出现”warnings.warn('Class names are not saved in the checkpoint’s ’ "的警告信息。使用这一步的test命令的时候会报错,程序中断,但是其实问题是一致的,应该是训练中保存下来的pth文件中没有CLASSES信息,所以show不了图片结果。因此需要按照下面的步骤修改下官方代码才可以。
修改:
修改mmdetection/mmdet/tools/test.py中的第29行为:
if show:
model.module.show_result(data, result, dataset.img_norm_cfg, dataset='coco')
此处的格式化输出称为检测评价矩阵(detection evaluation metrics)。
Average Precision (AP):
AP % AP at IoU=.50:.05:.95 (primary challenge metric)
APIoU=.50 % AP at IoU=.50 (PASCAL VOC metric)
APIoU=.75 % AP at IoU=.75 (strict metric)
AP Across Scales:
APsmall % AP for small objects: area < 322
APmedium % AP for medium objects: 322 < area < 962
APlarge % AP for large objects: area > 962
Average Recall (AR):
ARmax=1 % AR given 1 detection per image
ARmax=10 % AR given 10 detections per image
ARmax=100 % AR given 100 detections per image
AR Across Scales:
ARsmall % AR for small objects: area < 322
ARmedium % AR for medium objects: 322 < area < 962
ARlarge % AR for large objects: area > 962
在底层实现上是在mmdet.core.evaluation.coco_utils.py中,coco_eval方法通过调用微软的COCO API中的pycocotools包实现的。
通过构造COCOeval对象,配置参数,并依次调用evaluate、accumulate、summarize方法实现对数据集的测试评价。
此处摘录COCO数据集文档中对该评价矩阵的简要说明:
Average Precision (AP):
AP % AP at IoU=.50:.05:.95 (primary challenge metric)
APIoU=.50 % AP at IoU=.50 (PASCAL VOC metric)
APIoU=.75 % AP at IoU=.75 (strict metric)
AP Across Scales:
APsmall % AP for small objects: area < 322
APmedium % AP for medium objects: 322 < area < 962
APlarge % AP for large objects: area > 962
Average Recall (AR):
ARmax=1 % AR given 1 detection per image
ARmax=10 % AR given 10 detections per image
ARmax=100 % AR given 100 detections per image
AR Across Scales:
ARsmall % AR for small objects: area < 322
ARmedium % AR for medium objects: 322 < area < 962
ARlarge % AR for large objects: area > 962
分别调用:_dist_train方法和_non_dist_train方法
test.py的使用说明
python3 tools/test.py --gpus --out
python3 tools/test.py configs/mask_rcnn_r50_fpn_1x.py --gpus 8 --out results.pkl --eval bbox segm
python3 tools/test.py --show
在该single_test方法中,实际通过以下的几个主要步骤对模型进行测试输出。
首先,通过torch.nn.Module.eval方法,将该模型设置进入评价模式(evaluation mode):
model.eval()
随后,通过遍历数据加载器data_loader读取数据,按照PyTorch的标准流程,取消梯度计算,输入数据运行模型,并取得模型输出(同时处理好X Server中图片目标检测结果可视化和Shell中进度条刷新事宜):
for i, data in enumerate(data_loader):
with torch.no_grad():
result = model(return_loss=False, rescale=not show, **data)
results.append(result)
if show:
model.module.show_result(data, result, dataset.img_norm_cfg,
dataset=dataset.CLASSES)
batch_size = data['img'][0].size(0)
for _ in range(batch_size):
prog_bar.update()
log_config = dict(
interval=10, # 每10个batch输出一次信息
hooks=[
dict(type='TextLoggerHook'), # 控制台输出信息的风格
dict(type='TensorboardLoggerHook') # 需要安装tensorflow and tensorboard才可以使用
])
mmdetection使用上述组件编写了一些通用检测流水线,如SingleStageDetector和TwoStageDetector。
from ..registry import NECKS
@NECKS.register
class PAFPN(nn.Module):
def __init__(self,
in_channels,
out_channels,
num_outs,
start_level=0,
end_level=-1,
add_extra_convs=False):
pass
def forward(self, inputs):
# implementation is ignored
pass
第二步:修改配置文件
原本的:
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
修改为:
neck=dict(
type='PAFPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
参考:
源码阅读笔记(2)–Loss
RPN_loss
bbox_loss
mask_loss
如下:
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
img_metas,
cfg,
gt_bboxes_ignore=None):
losses = super(RPNHead, self).loss(
cls_scores,
bbox_preds,
gt_bboxes,
None,
img_metas,
cfg,
gt_bboxes_ignore=gt_bboxes_ignore)
return dict(
loss_rpn_cls=losses['loss_cls'], loss_rpn_bbox=losses['loss_bbox'])
loss函数如下
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
gt_labels,
img_metas,
cfg,
gt_bboxes_ignore=None):
featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
assert len(featmap_sizes) == len(self.anchor_generators)
anchor_list, valid_flag_list = self.get_anchors(
featmap_sizes, img_metas)#通过这步获取到所有的anchor以及一个是否有效的flag(根据bbox是否超出图像边界来计算)。
label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
cls_reg_targets = anchor_target(
anchor_list,
valid_flag_list,
gt_bboxes,
img_metas,
self.target_means,
self.target_stds,
cfg,
gt_bboxes_ignore_list=gt_bboxes_ignore,
gt_labels_list=gt_labels,
label_channels=label_channels,
sampling=self.sampling)
if cls_reg_targets is None:
return None
(labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,
num_total_pos, num_total_neg) = cls_reg_targets
num_total_samples = (
num_total_pos + num_total_neg if self.sampling else num_total_pos)
losses_cls, losses_bbox = multi_apply(
self.loss_single,
cls_scores,
bbox_preds,
labels_list,
label_weights_list,
bbox_targets_list,
bbox_weights_list,
num_total_samples=num_total_samples,
cfg=cfg)
return dict(loss_cls=losses_cls, loss_bbox=losses_bbox)
def loss_single(self, cls_score, bbox_pred, labels, label_weights,
bbox_targets, bbox_weights, num_total_samples, cfg):
# classification loss
labels = labels.reshape(-1)
label_weights = label_weights.reshape(-1)
cls_score = cls_score.permute(0, 2, 3,
1).reshape(-1, self.cls_out_channels)
loss_cls = self.loss_cls(
cls_score, labels, label_weights, avg_factor=num_total_samples)
# regression loss
bbox_targets = bbox_targets.reshape(-1, 4)
bbox_weights = bbox_weights.reshape(-1, 4)
bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)
loss_bbox = self.loss_bbox(
bbox_pred,
bbox_targets,
bbox_weights,
avg_factor=num_total_samples)
return loss_cls, loss_bbox
这里使用的loss就是CrossEntropyLoss交叉熵损失函数和SmoothL1Loss
def loss(self,
cls_score,
bbox_pred,
labels,
label_weights,
bbox_targets,
bbox_weights,
reduce=True):
losses = dict()
if cls_score is not None:
losses['loss_cls'] = self.loss_cls(
cls_score, labels, label_weights, reduce=reduce)
losses['acc'] = accuracy(cls_score, labels)
if bbox_pred is not None:
pos_inds = labels > 0
if self.reg_class_agnostic:
pos_bbox_pred = bbox_pred.view(bbox_pred.size(0), 4)[pos_inds]
else:
pos_bbox_pred = bbox_pred.view(bbox_pred.size(0), -1,
4)[pos_inds, labels[pos_inds]]
losses['loss_bbox'] = self.loss_bbox(
pos_bbox_pred,
bbox_targets[pos_inds],
bbox_weights[pos_inds],
avg_factor=bbox_targets.size(0))
return losses
def get_target(self, sampling_results, gt_masks, rcnn_train_cfg):
pos_proposals = [res.pos_bboxes for res in sampling_results]
pos_assigned_gt_inds = [
res.pos_assigned_gt_inds for res in sampling_results
]
mask_targets = mask_target(pos_proposals, pos_assigned_gt_inds,
gt_masks, rcnn_train_cfg)
return mask_targets
def mask_target(pos_proposals_list, pos_assigned_gt_inds_list, gt_masks_list,
cfg):
cfg_list = [cfg for _ in range(len(pos_proposals_list))]
mask_targets = map(mask_target_single, pos_proposals_list,
pos_assigned_gt_inds_list, gt_masks_list, cfg_list)
mask_targets = torch.cat(list(mask_targets))
return mask_targets
def mask_target_single(pos_proposals, pos_assigned_gt_inds, gt_masks, cfg):
mask_size = cfg.mask_size
num_pos = pos_proposals.size(0)
mask_targets = []
if num_pos > 0:
proposals_np = pos_proposals.cpu().numpy()
pos_assigned_gt_inds = pos_assigned_gt_inds.cpu().numpy()
for i in range(num_pos):
gt_mask = gt_masks[pos_assigned_gt_inds[i]]
bbox = proposals_np[i, :].astype(np.int32)
x1, y1, x2, y2 = bbox
w = np.maximum(x2 - x1 + 1, 1)
h = np.maximum(y2 - y1 + 1, 1)
# mask is uint8 both before and after resizing
target = mmcv.imresize(gt_mask[y1:y1 + h, x1:x1 + w],
(mask_size, mask_size))
mask_targets.append(target)
mask_targets = torch.from_numpy(np.stack(mask_targets)).float().to(
pos_proposals.device)
else:
mask_targets = pos_proposals.new_zeros((0, mask_size, mask_size))
return mask_targets
def loss(self, mask_pred, mask_targets, labels):
loss = dict()
if self.class_agnostic:
loss_mask = self.loss_mask(mask_pred, mask_targets,
torch.zeros_like(labels))
else:
loss_mask = self.loss_mask(mask_pred, mask_targets, labels)
loss['loss_mask'] = loss_mask
return loss
mmdetection集成了很多的目标检测模型,表现好的模型有cascade rcnn。
在two-satge模型中,会预测一些目标对象的候选框,这个候选框与真实值之间一般通过交叉面积iou的计算来判断该框是否为正样本,即要保留的候选框。常见的iou参数设置是0.5。但是0.5参数的设置也会导致很多无效的对象。如下作图所示,当值为0.5时是左边的图,值为0.7时是右边的图,明显可以看出,值为0.5时图中有很多无效对象,值为0.7时图会更清晰些。但是设置为0.7的缺点是不可避免会漏掉一些候选框,特别是微小目标,同时由于正样本数目过少,会导致容易出现过拟合的现象。
cascade rcnn的重点就是解决这个iou参数设置为问题。它设置了一个级联检测的方法来实现。如下图d如所示
上图中的d图具有级联特性,与b图相比,其每次的iou参数都是不一样的,正常设置为0.5 /0.6 /0.7。通过级联特性可以实现对候选框的级联优化检测。
'''
执行该脚本就可以查看单张检测图片的检测效果和一个目录下所有图片的检测效果
只要更换congig和weights文件就能用不同的网络检测。
该mmdetection做了封装,build_detector函数搭建模型,
inference_detector函数负责推理检测
将不同的模块封装为backbone/neck/head等部分,
在config中写入,通过读取配置,注册模块,进行封装,然后高级调用搭建网络
如果要训练自己的数据集的话,使用coco格式比较方便,先可以用labelImg标注数据得到xml再转换为coco的json format
'''
#ipdb库是为了debug时使用,导入库后就通过设置breakpoint即使用方法ipdb.set_trace()来debug,ipdb是需要安装的
import ipdb
import sys,os,torch,mmcv
from mmcv.runner import load_checkpoint
#下面这句import执行时定位且调用Registry执行了五个模块的注册
'''
registry
功能:注册模块占位符,在程序运行之前先注册相应的模块占位,便于在config文件直接对相应的模块进行配置填充.
五大类:
BACKBONES = Registry('backbone')
NECKS = Registry('neck')
ROI_EXTRACTORS = Registry('roi_extractor')
HEADS = Registry('head')
DETECTORS = Registry('detector')
'''
from mmdet.models import build_detector
from mmdet.apis import inference_detector,show_result
if __name__=='__main__':
#debug语句
#ipdb.set_trace()
'''
mmcv.Config.fromfile
封装方法:配置cfg方式不是直接实例化Config类,而是用其fromfile方法
该函数返回的是Config类:Config(cfg_dict, filename=filename);
传入的参数cfg_dict是将配置文件(如mask_rcnn_r101_fpn_1x.py)用一个大字典进行封装,
内嵌套小字典就是py文件的dict,
最后是k-v,对应每个选项;filename就是py配置文件的路径名
'''
# 下面的模型配置文件设置为自己需要的,在configs文件夹下提供了很多
cfg=mmcv.Config.fromfile('configs/cascade_rcnn_r101_fpn_1x.py')
#inference不设置预训练模型
cfg.model.pretrained=None
#inference只传入cfg的model和test配置,其他的都是训练参数
model=build_detector(cfg.model,test_cfg=cfg.test_cfg)
'''
下面的路径改为下载好的权重文件存放的路径,权重文件要和config路径文件名相匹配
权重文件下载的链接:
https://github.com/open-mmlab/mmdetection/blob/master/MODEL_ZOO.md
load_checkpoint(model,filename,map_location=None,strict=False,logger=None)
这个函数实现的功能是从url链接中或者文件中加载模型,实现过程是:
第一步先用torch.load将path文件加载到变量checkpoint,
第二步从中提取权值参数存为state_dict,因为还有可能pth中存在模型后者优化器数据
第三步load_state_dict将数据加载
'''
_ = load_checkpoint(model, 'weights/cascade_rcnn_r101_fpn_1x_…….pth')#名称没有写全
#print(model)#展开模型
#测试单张图片,路径要按照实际修改
img=mmcv.imread('/py/pic/2.jpg')
result=inference_detector(model,img,cfg)
show_result(img,result)
#test a list of folder,路径按照实际修改
path='your_path'
imgs=os.listdir(path)
for i in range(len(ings)):
imgs[i]=os.path.join(path,imgs[i])
for i,result in enumerate(inference_detector(model,imgs,cfg,device='cuda:0')):
print(i,imgs[i])
show_result(imgs[i],result)
mmdetection提供的config文件里只在retinanet中打开了focal loss的功能,原因是one stage算法使用密集anchor一步回归的方法,其中正负样本非常不均衡,所以focal loss损失函数主要正负样本不均衡以及难分易分样本权值一样的问题(这里与OHEM的区别在于OHEM主要在于主要集中在难分样本上,不考虑易分样本)
smoothl1_beta=0.11,
gamma=2.0,
alpha=0.25,
allowed_border=-1,
pos_weight=-1,
debug=False