轻松掌握 MMDetection 整体构建流程(一) - OpenMMLab的文章 - 知乎 20210512
# ---------------------------20210518分割线---------------------------
the initial project version of mmdet_v2.7;
add my code for calculating COCO-style mAP to the following code:
- tools/train.py
- mmdet/datasets/voc.py
- mmdet/datasets/__init__.py
- mmdet/core/evaluation/mean_ap.py
add the following file to the project:
- mmdet/configs/_base_/datasets/voc07_mini_cocostyle.py
- mmdet/configs/_base_/datasets/voc0712_cocostyle.py
- mmdet/datasets/voc_mini.py
- tools/hcy_train.py
- configs/faster_rcnn/bl_01_faster_rcnn_r50_fpn_1x_voc.py
- configs/retinanet/bl_01_retinanet_r50_fpn_1x_voc.py
# ---------------------------20210519分割线---------------------------
write the code for the following schemes:
- hhid
add the following file to the project:
- configs/hhid/hhid_r50_fpn_1x_voc.py
- mmdet/models/detectors/hhid.py
# ---------------------------20201203分割线---------------------------
the initial project version of MMDet v2.7.0;
add my code for calculating COCO-style mAP to the following code:
- tools/train.py
- mmdet/datasets/voc.py
- mmdet/core/evaluation/mean_ap.py
# ---------------------------20201215分割线---------------------------
add comments to the official code;
hc-y_modifybug:... assigned_gt_inds ... max_overlaps in mmdet/core/bbox/assigners/max_iou_assigner.py;
add the following file to mmdet/configs/_base_/datasets/:
- coco_detection_multiscale.py
- voc07_mini_cocostyle.py
- voc0712_cocostyle.py
add the following file to mmdet/datasets/:
- voc_mini.py
write the code for the following two schemes:
- v4v5_11_retinanet_r50_fpn_1x_voc
- v4v5_11_faster_rcnn_r50_fpn_1x_voc.py
- v4v6_11_retinanet_r50_fpn_1x_voc
- v4v7_11_retinanet_r50_fpn_1x_voc
- v4v8_11_retinanet_r50_fpn_1x_voc
- v4v9_11_retinanet_r50_fpn_1x_voc
- v5v1_11_retinanet_r50_fpn_1x_voc
- v5v2_11_retinanet_r50_fpn_1x_voc
# ---------------------------20210308分割线---------------------------
write the code for the following schemes:
- los_rcnn_r50_fpn_1x_carton_mini55_fourcat
- los_rcnn_r50_fpn_1x_carton_onecat
add comments to the official code about the following model:
- SABL RetinaNet;
- YOLACT;
add comments to the official code:
- mmdet/models/detectors/single_stage.py
- mmdet/models/roi_heads/mask_heads/fcn_mask_head.py
- mmdet/datasets/pipelines/loading.py
- mmdet/datasets/pipelines/formating.py
- mmdet/datasets/custom.py
add the following file to mmdet/configs/_base_/datasets/:
- carton_instance_onecat.py
- carton_instance_mini55_onecat.py
- carton_detection_onecat.py
- carton_detection_fourcat.py
- carton_detection_mini55_onecat.py
add the following file to mmdet/datasets/:
- carton.py, including the class CartonDatasetOneCat, CartonDatasetFourCat, CartonminiDatasetOneCat;
modify the official code of the following file:
- mmdet/models/detectors/base.py def show_result(), in order to visualize bbox and mask on image, \
modify the outline thickness of bbox, modify the fill color of the mask from the random color \
to the specified color, add some codes for drawing the contour of the mask;
# ---------------------------20210622分割线---------------------------
add the following file to the project:
- carton_detection_fourcat.py
- carton_detection_mini55_onecat.py
add or modify the following file related to the LOS R-CNN Model:
- mmdet/models/roi_heads/standard_roi_head.py
add or modify the following file related to the FCOS Model on carton:
- configs/fcos/bl_01_fcos_r50_caffe_fpn_gn-head_6x2_1x_carton.py
- configs/fcos/bl_02_fcos_r50_caffe_fpn_gn-head_6x2_1x_carton.py
- configs/fcos/bl_01_fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_6x2_1x_carton.py
- configs/fcos/bl_02_fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_6x2_1x_carton.py
- configs/fcos/bl_02_fcos_center-normbbox-centeronreg_r50_caffe_fpn_gn-head_6x2_1x_carton.py
- configs/fcos/v2v1_11_fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_6x2_1x_carton.py
- mmdet/models/dense_heads/v1_fcos_head.py
add or modify the following file related to the SKU110K dataset:
- configs/_base_/datasets/sku_detection_onecat.py
- mmdet/datasets/sku.py, including the class SkuDatasetOneCat;
- configs/fcos/bl_01_fcos_r50_caffe_fpn_gn-head_6x2_1x_sku.py
- configs/fcos/bl_02_fcos_r50_caffe_fpn_gn-head_6x2_1x_sku.py
- configs/fcos/bl_02_fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_6x2_1x_sku.py
- configs/cascade_rcnn/bl_01_cascade_rcnn_r50_fpn_1x_sku.py
add or modify the following file related to the LSNet Model:
- configs/_base_/datasets/carton_instance_onecat_dpp.py
- configs/_base_/datasets/coco_lsvr.py
- configs/lsnet/xxx.py
- mmdet/models/dense_heads/lsnet_head.py
- mmdet/models/dense_heads/lscpvnet_head.py
- mmdet/datasets/pipelines/loading_plus.py
- mmdet/core/bbox/transforms_plus.py
- mmdet/core/bbox/assigners/centroid_assigner.py
- mmdet/ops/corner_pool/xxx.py
- mmdet/ops/dcn/xxx.py
- mmdet/models/losses/cross_iou_loss.py
- mmdet/core/post_processing/bbox_nms_plus.py
# ---------------------------20210706分割线---------------------------
add or modify the following file related to the LSNet Model:
- configs/_base_/datasets/carton_instance_onecat_dpp.py
- configs/_base_/datasets/coco_lsvr.py
- configs/_base_/datasets/carton_onecat_lsvr_dpp.py
- configs/_base_/datasets/carton_onecat_lsvr_dpp_mini55.py
# in order to load the "extreme_points" field in the annotation json file
- mmdet/datasets/custom.py, results['extreme_fields'] = [], results['keypoint_fields'] = []
- mmdet/datasets/coco.py, gt_extremes_ann = []
- mmdet/datasets/pipelines/loading_plus.py
- mmdet/datasets/pipelines/transforms_plus.py class ResizeV1v1(), class RandomFlipV1v1()
- mmdet/core/mask/structures.py class PolygonMasks() def flip(), related to class RandomFlipV1v1()
- mmdet/datasets/pipelines/formating.py class DefaultFormatBundle()
- mmdet/core/evaluation/eval_hooks.py class EvalHook(), class DistEvalHook()
- mmdet/apis/test.py, single_gpu_test(), multi_gpu_test()
- mmdet/core/mask/utils.py, def get_rle(), def encode_poly_results()
- mmdet/core/mask/__init__.py encode_poly_results
- tools/test.py, single_gpu_test(), multi_gpu_test()
- mmdet/apis/test.py, single_gpu_test(), multi_gpu_test()
- setup_mmdet_v221.py
- mmdet/ops/corner_pool/xxx.py
- mmdet/ops/dcn/xxx.py
- mmdet/models/dense_heads/lsnet_head.py
- mmdet/models/dense_heads/lscpvnet_head.py
- mmdet/models/losses/cross_iou_loss.py
- mmdet/core/post_processing/bbox_nms_plus.py
- mmdet/models/detectors/lsnet.py def show_result()
- mmcv/visualization/image.py def imshow_extremes(), def imshow_polygons_v2(), def imshow_pose()
add or modify the following file which were copied from 20210507_LSNet-main_Duankaiwen:
- mmdet/ops/chamfer_2d/xxx.py
- mmdet/models/losses/chamfer_loss.py
- mmdet/models/losses/focal_loss.py
- mmdet/core/bbox/assigners/fcos_assigner.py
- mmdet/core/bbox/assigners/point_assigner_v2.py
- mmdet/core/bbox/assigners/point_ct_assigner.py
- mmdet/core/bbox/assigners/point_hm_assigner.py
- mmdet/datasets/pipelines/loading_reppointsv2.py
- mmdet/datasets/pipelines/formating_reppointsv2.py
- mmdet/datasets/coco_pose.py
- mmdet/models/backbones/mobilenet.py
- mmdet/models/dense_heads/reppoints_v2_head.py
- mmdet/models/dense_heads/dense_reppoints_head.py
- mmdet/models/dense_heads/dense_reppoints_v2_head.py
- mmdet/models/detectors/reppoints_v2_detector.py
- mmdet/models/detectors/dense_reppoints_detector.py
- mmdet/models/detectors/dense_reppoints_v2_detector.py
# ---------------------------2020xxxx分割线---------------------------
待添加
# ---------------------------2020xxxx分割线---------------------------
# ---------------------------mmdetection_v2.1.0分割线---------------------------
the initial project version;
try to train mask_rcnn_r50_fpn_1x on Poker dataset;
add comments to the official code;
try to train faster_rcnn_r50_fpn_1x on Poker dataset;
read the code of faster_rcnn_r50_fpn_1x on Poker dataset;
# ---------------------------mmdetection_v2.1.0分割线---------------------------
# ---------------------------mmdetection_v2.3.0分割线---------------------------
# ---------------------------20200817分割线---------------------------
the initial project version
# ---------------------------20200817分割线---------------------------
try to train faster_rcnn_r50_fpn_1x on Poker dataset
# ---------------------------20200830分割线---------------------------
add comments to the official code;
hc-y_modifybug:... assigned_gt_inds ... max_overlaps in mmdet/core/bbox/assigners/max_iou_assigner.py;
learn the use of pycocotools/coco.py and pycocotools/cocoeval.py;
modify my code in tools/usr_train.py;
add my code 'record data' to mmdet/models/dense_heads/anchor_head.py;
# ---------------------------20200904分割线---------------------------
add comments to the official code;
hc-y_modifybug:... if unmap_outputs ... in mmdet/models/dense_heads/anchor_head.py;
add my code 'record data' to mmdet/models/dense_heads/anchor_head.py and mmdet/models/dense_heads/rpn_head.py;
record data of {img_meta, anchors, rpn_preds, rpn_loss, proposals} and save to /work_dirs/usr_recorddata with flag_record_data=True;
modify my code in mmcv/runner/epoch_based_runner.py;
complete usr_recorddata_analysis.py for single image;
# ---------------------------20200908分割线---------------------------
add comments to the official code;
hc-y_modifybug:def loss():avg_factor=bbox_targets.size(0) in mmdet/models/roi_heads/bbox_heads/bbox_head.py;
add my code 'record data' to mmdet/models/dense_heads/anchor_head.py and mmdet/models/dense_heads/rpn_head.py;
record data of {img_meta, anchors, rpn_preds, rpn_loss, proposals} and save to /work_dirs/usr_recorddata with flag_record_data=True;
add my code 'record data' to mmdet/models/roi_heads/standard_roi_head.py;
record data of {q0, assign_pg_cls_labels, rcnn_preds, rcnn_loss} and save to /work_dirs/usr_recorddata with flag_record_data=True;
modify my code in mmcv/runner/epoch_based_runner.py;
add code to usr_recorddata_analysis.py for single image;
# ---------------------------20200908分割线---------------------------
replace flag_record_data with flag_train_record_data in mmdet/models/dense_heads/anchor_head.py and mmdet/models/dense_heads/rpn_head.py mmdet/models/roi_heads/standard_roi_head.py;
# ---------------------------20200911分割线---------------------------
add comments to the official code;
add my code 'record data' to mmdet/models/roi_heads/standard_roi_head.py;
record data of {q0, assign_pg_cls_labels, rcnn_preds, rcnn_loss} and save to /work_dirs/usr_recorddata with flag_train_record_data=True;
read code by debugging my_workspace/test_demo_model/usr_demo_detectobject.py and add my code 'record data';
add my code 'record data' to mmdet/core/post_processing/bbox_nms.py and mmdet/models/roi_heads/bbox_heads/bbox_head.py and mmdet/models/roi_heads/standard_roi_head.py;
record data of {q0, q1, q2, q3} and save to /work_dirs/usr_recorddata with flag_train_record_data=True;
add code to usr_recorddata_analysis.py for single image;
# ---------------------------20200912分割线---------------------------
modify my code in mmdet/core/post_processing/bbox_nms.py;
data can be collected successfully on {Poker, coco, VOCdevkit} dataset, but whether the data is desired and available remains to be confirmed;
- record data: {img_meta, A, G, assign_G_ind, IoU(A,assign_G), pos_inds, neg_inds}
- record data: {rpn_cls_scores, rpn_bbox_preds, rpn_cls_labels, rpn_bbox_targets}
- record data: {p0, p0_scores, p1, p1_scores, p1_inds, p2, p2_scores, p2_inds, p3, p3_scores, p3_inds}
- record data: {p3_inrcnn, G, assign_G_ind, IoU(p3_inrcnn,assign_G), assign_G_cls_label, pos_inds, neg_inds}
- record data: {rcnn_cls_scores, rcnn_bbox_preds, rcnn_cls_labels, rcnn_bbox_targets}
- record data: {q0, q0_scores_after_softmax, q1, q1_scores_after_softmax, q1_inds, q2, q2_scores_after_softmax, q2_inds, q3, q3_scores_after_softmax, q3_inds}
# ---------------------------20200912分割线---------------------------
revert the code of {anchor_head.py, rpn_head.py, standard_roi_head.py, bbox_head.py, bbox_nms.py, test_mixins.py} to the official code;
# ---------------------------20200917分割线---------------------------
data can be collected successfully on {Poker, coco, VOCdevkit} dataset, and whether the data excluding {p0, p1, p2, p3, q0, q1, q2, q3} is desired and available has been confirmed;
- record data: {img_meta, all_a, G, inside_a_flags, assign_G_ind, IoU(inside_a,assign_G), inside_a_pos_bbox_targets, sampling_a_pos_inds, sampling_a_neg_inds}
- record data: {all_a_cls_scores, all_a_bbox_preds, sampling_a_cls_labels, sampling_a_bbox_targets}
- record data: {p0, p0_scores, p1, p1_scores, p1_inds, p2, p2_scores, p2_inds, p3, p3_scores, p3_inds}
- record data: {p3_inrcnn, G, assign_G_ind, IoU(p3_inrcnn,assign_G), assign_G_cls_label, sampling_p3_inrcnn_pos_inds, sampling_p3_inrcnn_neg_inds, all_p3_inrcnn_cls_scores, all_p3_inrcnn_bbox_preds, all_p3_inrcnn_pos_bbox_targets}
- record data: {sampling_p3_inrcnn_cls_scores, sampling_p3_inrcnn_bbox_preds, sampling_p3_inrcnn_cls_labels, sampling_p3_inrcnn_bbox_targets}
- record data: {q0, q0_scores_after_softmax, q1, q1_scores_after_softmax, q1_inds, q2, q2_scores_after_softmax, q2_inds, q3, q3_scores_after_softmax, q3_inds}
# ---------------------------20201007分割线---------------------------
review the inheritance and polymorphisn of Class in Python, and perform code review;
write the code for the following two schemes:
- v2v1_11_faster_rcnn_r50_fpn_1x_voc
- v2v2_11_faster_rcnn_r50_fpn_1x_voc
# ---------------------------20201008分割线---------------------------
add comments to the code in the following file:
- mmdet/core/bbox/assigners/v2v1_max_iof_assigner.py
- mmdet/core/bbox/obtain_gt_bgs.py
- mmdet/models/roi_heads/test_mixins.py
# ---------------------------20201113分割线---------------------------
add comments to the official code;
read the code about the following papers:
- mmdet/configs/ghm
- mmdet/configs/pisa
- mmdet/configs/gfl
revise the config of learning rate in my code;
write the code for the following two schemes:
- v3v1_00_retinanet_r50_fpn_1x_voc
- v3v1_11_retinanet_r50_fpn_1x_poker
- v4v1_11_retinanet_r50_fpn_1x_voc
add my code for calculating COCO-style mAP to the following code:
- mmdet/datasets/poker.py
- mmdet/datasets/voc.py
- mmdet/core/evaluation/mean_ap.py
# ---------------------------20201203分割线---------------------------
add comments to the official code;
write the code for the following schemes:
- etc
- v4v5_11_retinanet_r50_fpn_1x_voc
- v4v6_11_retinanet_r50_fpn_1x_voc
add the following file to mmdet/configs/_base_/datasets/:
- coco_detection_multiscale.py
- voc07_mini_cocostyle.py
- voc0712_cocostyle.py
submit this commit message before updating the MMDet version from v2.3.0 to v2.7.0;
# ---------------------------2020xxxx分割线---------------------------
待添加
# ---------------------------2020xxxx分割线---------------------------
cfg = {Config} Config (path: ../../configs/mask_rcnn/usr_mask_rcnn_r50_fpn_1x_poker.py): {'model': {}, 'key': {}, }
_cfg_dict = {ConfigDict} {'model': {}, 'key': {}, }
_filename = {str} '../../configs/mask_rcnn/usr_mask_rcnn_r50_fpn_1x_poker.py'
_text = {str} '键值存储的是../../configs/mask_rcnn/usr_mask_rcnn_r50_fpn_1x_poker.py其本身及其所包含的4个文件的内容'
filename = {str} '../../configs/mask_rcnn/usr_mask_rcnn_r50_fpn_1x_poker.py'
pretty_text = {str} '键值与usr_dumpconfig_mask_rcnn_r50_fpn_1x_poker.py中的内容一致'
text = {str} '键值存储的是../../configs/mask_rcnn/usr_mask_rcnn_r50_fpn_1x_poker.py其本身及其所包含的4个文件的内容'
we set FG labels to [0, num_class-1] and BG label to num_class in other heads since mmdet v2.0, However we keep BG label as 0 and FG label as 1 in rpn head;
gt_bboxes_ignore (Tensor, optional)
: Ground truth bboxes that are labelled as ignored
, e.g., crowd boxes in COCO.
imgs (List[Tensor])
: the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.
mmdet/models/detectors/base.py
含义是其中1个Tensor表示的是N=batchsize个图片(CxHxW),对这1组图片分别执行不同的图像变换又会得到多个Tensor,这些Tensor以list形式组合之后便得到List[Tensor];
这里,有几个Tensor就表示执行了几种图像变换;Tensor中的N表示有N张图片;
img_metas (List[List[dict]])
: the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch.
mmdet/models/detectors/base.py
含义是其中1个List[dict]表示的是batchsize个图片的img_metas信息,对这1组图片分别执行不同的图像变换又会得到多个List[dict],这些List[dict]以list形式组合之后便得到List[List[dict]];
这里,有几个List[dict]就表示执行了几种图像变换;有几个dict就表示有几张图片;dict中存储单张图片的img_metas信息;
proposals (List[List[Tensor]])
: the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. The Tensor should have a shape Px4, where P is the number of proposals.
mmdet/models/detectors/base.py
含义是其中1个List[Tensor]表示的是batchsize个图片,对这1组图片分别执行不同的图像变换又会得到多个List[Tensor],这些List[Tensor]组合之后便得到List[List[Tensor]];
这里,有几个List[Tensor]就表示执行了几种图像变换;有几个Tensor就表示有几张图片;Tensor中存储单张图片的proposals数据;
feature map
通常是a 4D-tensor, 其shape大小为torch.Size([N,C,W,H]);
bbox_targets_list (list[Tensor])
: BBox targets of each level.
logger.info('Environment info:\n' + dash_line + env_info + '\n' + dash_line)
# c-y_note:logger.info()在.log文件中输出log信息“2020-08-01 20:41:12,985 - mmdet - INFO - Environment info:内容”
# 用于判断以'var_name'为变量名的变量var是否是list类型
for var, name in [(imgs, 'imgs'), (img_metas, 'img_metas')]:
if not isinstance(var, list):
raise TypeError(f'{name} must be a list, but got {type(var)}')
在这里插入代码片
在这里插入代码片
在这里插入代码片
mmdet/models/detectors/base.py
class BaseDetector(nn.Module, metaclass=ABCMeta):
"""Base class for detectors."""
def train_step(self, data, optimizer):
losses = self(**data) # hc-y_Q20200903:这是什么操作?
类似的有self(x),self(x)应该是Pytorch中独有的,与nn.Module有关吧。
mmdet/models/dense_heads/rpn_test_mixin.py
class RPNTestMixin(object):
"""Test methods of RPN."""
def simple_test_rpn(self, x, img_metas):
rpn_outs = self(x) # hc-y_note:self(x)会去调用相应模块的def forward(self, feats)函数
proposal_list = self.get_bboxes(*rpn_outs, img_metas) # hc-y_note:此行代码的执行过程与train时mmdet/models/dense_heads/
return proposal_list # base_dense_head.py中proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)的执行过程一致
mmdet/models/dense_heads/base_dense_head.py
class BaseDenseHead(nn.Module, metaclass=ABCMeta):
"""Base class for DenseHeads."""
def forward_train(self, x, img_metas, gt_bboxes, gt_labels=None, gt_bboxes_ignore=None, proposal_cfg=None, **kwargs):
outs = self(x) # hc-y_note:tuple[2Tensor], 2Tensor分别表示的是cls_scores, bbox_preds; 获取由rpn得到的preds, 即{cls_scores, bbox_preds};
if gt_labels is None:
loss_inputs = outs + (gt_bboxes, img_metas)
else:
loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) # hc-y_note:mmdet/models/dense_heads/anchor_head.py def loss()
if proposal_cfg is None:
return losses
else:
proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg) # hc-y_note:mmdet/models/dense_heads/anchor_head.py def get_bboxes()
return losses, proposal_list
type(scale_factor) # numpy.ndarray, dtype=float32, [1.5873016 1.5873016 1.5873016 1.5873016]
bboxes # torch.Size([rpn_nms_post, num_cls*4])
mmdet/models/roi_heads/bbox_heads/bbox_head.py
class BBoxHead(nn.Module):
def get_bboxes():
if rescale:
if isinstance(scale_factor, float):
bboxes /= scale_factor
else:
scale_factor = bboxes.new_tensor(scale_factor) # torch.Tensor.new_tensor(): Returns a new Tensor with data as the tensor data. By default, the returned Tensor has the same torch.dtype and torch.device as this tensor.
bboxes = (bboxes.view(bboxes.size(0), -1, 4) /
scale_factor).view(bboxes.size()[0], -1)
在这里插入代码片
在这里插入代码片
在这里插入代码片
mmdet/models/dense_heads/rpn_head.py
class RPNHead(RPNTestMixin, AnchorHead):
def _get_bboxes_single():
for idx in range(len(cls_scores)): # hc-y_note:在单张图片上 再逐个scale level处理
pass
dets, keep = batched_nms(proposals, scores, ids, nms_cfg) # hc-y_highlight:NMS在each scale level上是各自执行的;
mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py
class SingleRoIExtractor(BaseRoIExtractor):
def forward():
for i in range(num_levels): # hc-y_note:逐个scale level地对从1个batch里的所有图片中采样出的rois执行RoIAlign操作;
在这里插入代码片
在这里插入代码片
在这里插入代码片
mmdet/core/post_processing/bbox_nms.py
def multiclass_nms():
dets, keep = batched_nms(bboxes, scores, labels, nms_cfg) # hc-y_note:逐个类别地执行nms; !!!注意到同一个bbox是有可能被同时预测出多个类别标签的; 这里可以改进吧!!!
在这里插入代码片
Why the default value of finest_scale is 56 in mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py?
Why the default value of finest_scale is 56? · Issue #2843 · open-mmlab/mmdetection · GitHub 20200529
The corelation between anchor_scale and map_roi_levels ? · Issue #2387 · open-mmlab/mmdetection · GitHub 20200403
It seems strange as below
P2 anchor_scale set for rpn_head: 32 mapped_roi_scale for roi_extractor: 0-112
P3 anchor_scale set for rpn_head: 64 mapped_roi_scale for roi_extractor: 112-224
P4 anchor_scale set for rpn_head: 128 mapped_roi_scale for roi_extractor: 224-448
P5 anchor_scale set for rpn_head: 256 mapped_roi_scale for roi_extractor: 448-
So why the mapped_roi_scale for roi_extractor don’t approximately match the anchor_scale for rpn_head?
hc-y_Q20200907: 如果让anchor_scale与roi的投射scale保持一致, 性能会不会有所提升?
在这里插入代码片
由此可见, 假设检测器总的损失loss=lamda_1*rpn_loss + lamda_2*roi_loss, 这里的rpn_losses
(mmdet/models/detectors/two_stage.py)相当于lamda_1*rpn_loss, roi_losses相当于lamda_2*roi_loss;
mmdet/models/detectors/base.py loss, log_vars = self._parse_losses(losses)
mmdet/models/detectors/base.py loss, def _parse_losses(self, losses):对分类损失和定位损失求和得到总的损失
在这里插入代码片
在这里插入代码片
在这里插入代码片
img_metas[img_id].img_shape = <class 'tuple'> (800, 600, 3)分别表示(H, W, C)
mmdet/core/bbox/coder/delta_xywh_bbox_coder.py
def bbox2delta(proposals, gt, means=(0., 0., 0., 0.), stds=(1., 1., 1., 1.)):
"""Compute deltas of proposals w.r.t. gt.
We usually compute the deltas of x, y, w, h of proposals w.r.t ground
truth bboxes to get regression target.
This is the inverse function of :func:`delta2bbox`.
Args:
proposals (Tensor): Boxes to be transformed, shape (N, ..., 4)
gt (Tensor): Gt bboxes to be used as base, shape (N, ..., 4)
means (Sequence[float]): Denormalizing means for delta coordinates
stds (Sequence[float]): Denormalizing standard deviation for delta
coordinates
Returns:
Tensor: deltas with shape (N, 4), where columns represent
dx, dy, dw, dh.
"""
pass
def delta2bbox(rois, deltas, means=(0., 0., 0., 0.), stds=(1., 1., 1., 1.), max_shape=None, wh_ratio_clip=16 / 1000):
"""Apply deltas to shift/scale base boxes.
Typically the rois are anchor or proposed bounding boxes and the deltas are
network outputs used to shift/scale those boxes.
This is the inverse function of :func:`bbox2delta`.
Args:
rois (Tensor): Boxes to be transformed. Has shape (N, 4)
deltas (Tensor): Encoded offsets with respect to each roi.
Has shape (N, 4 * num_classes). Note N = num_anchors * W * H when
rois is a grid of anchors. Offset encoding follows [1]_.
means (Sequence[float]): Denormalizing means for delta coordinates
stds (Sequence[float]): Denormalizing standard deviation for delta
coordinates
max_shape (tuple[int, int]): Maximum bounds for boxes. specifies (H, W)
wh_ratio_clip (float): Maximum aspect ratio for boxes.
Returns:
Tensor: Boxes with shape (N, 4), where columns represent
tl_x, tl_y, br_x, br_y.
"""
pass
20200907记:
"backward"和"update weights"操作发生在:
# mmcv/runner/epoch_based_runner.py
class EpochBasedRunner(BaseRunner):
def train():
self.call_hook('after_train_iter')
# mmcv/runner/base_runner.py
class BaseRunner(metaclass=ABCMeta):
def call_hook(self, fn_name):
在hook = <mmcv.runner.hooks.optimizer.OptimizerHook object at 0x7f2ac0811250>时, 会调用class OptimizerHook.after_train_iter()
# mmcv/runner/hooks/optimizer.py
class OptimizerHook(Hook):
def after_train_iter(self, runner):
runner.optimizer.zero_grad()
runner.outputs['loss'].backward() # c-y_note:backward
if self.grad_clip is not None:
grad_norm = self.clip_grads(runner.model.parameters())
if grad_norm is not None:
# Add grad norm to the logger
runner.log_buffer.update({'grad_norm': float(grad_norm)},
runner.outputs['num_samples'])
runner.optimizer.step() # c-y_note:update weights
mmdet/core/anchor/anchor_generator.py def single_level_grid_anchors()
shift_xx = torch.tensor([ 0, 4, 0, 4, 0, 4])
shift_yy = torch.tensor([ 0, 0, 4, 4, 8, 8])
shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
>>> shifts
Out[13]:
tensor([[0, 0, 0, 0],
[4, 0, 4, 0],
[0, 4, 0, 4],
[4, 4, 4, 4],
[0, 8, 0, 8],
[4, 8, 4, 8]])
>>> shifts.shape
Out[14]: torch.Size([6, 4])
20200903记:
#mmcv/ops/nms.py
# MMDet中的NMS代码实现
def batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False):
"""Performs non-maximum suppression in a batched fashion.
Modified from https://github.com/pytorch/vision/blob
/505cd6957711af790211896d32b40291bea1bc21/torchvision/ops/boxes.py#L39.
In order to perform NMS independently per class, we add an offset to all
the boxes. The offset is dependent only on the class idx, and is large
enough so that boxes from different classes do not overlap.
Arguments:
boxes (torch.Tensor): boxes in shape (N, 4).
scores (torch.Tensor): scores in shape (N, ).
idxs (torch.Tensor): each index value correspond to a bbox cluster,
and NMS will not be applied between elements of different idxs,
shape (N, ).
nms_cfg (dict): specify nms type and other parameters like iou_thr.
class_agnostic (bool): if true, nms is class agnostic,
i.e. IoU thresholding happens over all boxes,
regardless of the predicted class
Returns:
tuple: kept dets and indice.
"""
nms_cfg_ = nms_cfg.copy()
class_agnostic = nms_cfg_.pop('class_agnostic', class_agnostic)
if class_agnostic:
boxes_for_nms = boxes
else:
max_coordinate = boxes.max() # c-y_note:返回的是单个值
offsets = idxs.to(boxes) * (max_coordinate + 1) # idxs.dtype:torch.int64, boxes.dtype:torch.float32
boxes_for_nms = boxes + offsets[:, None]
nms_type = nms_cfg_.pop('type', 'nms')
nms_op = eval(nms_type) # c-y_Q20200903:内置函数eval()什么意思?
dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_) # c-y_note:kept dets (boxes and scores) and indice, which is always the same data type as the input.
boxes = boxes[keep] # torch.Size([num_boxes, 4])
scores = dets[:, -1] # torch.Size([num_boxes])
return torch.cat([boxes, scores[:, None]], -1), keep # c-y_note:这里设置为-1, .cat()时会自动判断in which dimension上执行.cat()操作
20200910记:
知识点: torch.Tensor布尔索引与torch.masked_select()的使用;
阅读mmdet/core/post_processing/bbox_nms.py脚本文件时,可辅助理解的代码片段:
# mmdet/core/post_processing/bbox_nms.py
temp_bboxes = torch.randn(6, 3, 4) # torch.Size([6, 3, 4])
temp_scores = torch.rand(6, 3)
temp_valid_mask = temp_scores > 0.5 # torch.Size([6, 3])
temp_bboxes[temp_valid_mask].shape # torch.Size([11, 4])
print('temp_valid_mask.nonzero()的值:\n', temp_valid_mask.nonzero()) # torch.Size([11, 2])
temp_valid_mask_inds = temp_valid_mask.nonzero()
temp_label = temp_valid_mask.nonzero()[:, 1] # 值与temp_cls_inds相同
print('torch.where(temp_valid_mask == True)的值:\n', torch.where(temp_valid_mask == True)) # tuple[len(temp_valid_mask.size())=2个Tensor], each Tensor的shape是torch.Size([11])
temp_q0_inds = torch.where(temp_valid_mask == True)[0] # torch.Size([11])
temp_cls_inds = torch.where(temp_valid_mask == True)[1] # torch.Size([11])
# 以下三行代码的输出结果均相同
torch.masked_select(temp_bboxes, torch.stack((temp_valid_mask, temp_valid_mask, temp_valid_mask, temp_valid_mask), -1)).view(-1, 4)
temp_bboxes[temp_valid_mask] # torch.Size([11, 4])
temp_bboxes[temp_q0_inds, temp_cls_inds]
torch.masked_select(temp_scores, temp_valid_mask) # torch.Size([11])
Feel free to put the dataset at any place you want, and then soft link the dataset under the data/
folder:
cd data/coco
ln -s /media/your_username/A42C33A02C336D04/dataset/coco2017/annotations_trainval2017/annotations annotations
ln -s /media/your_username/A42C33A02C336D04/dataset/coco2017/train_image_2017/train2017 train2017
ln -s /media/your_username/A42C33A02C336D04/dataset/coco2017/val_image_2017 val2017
ln -s /media/your_username/A42C33A02C336D04/dataset/coco2017/test_image_2017/test2017 test2017
cd data/cityscapes
ln -s /media/your_username/A42C33A02C336D04/dataset/cityscapes/annotations annotations
ln -s /media/your_username/A42C33A02C336D04/dataset/cityscapes/leftImg8bit leftImg8bit
ln -s /media/your_username/A42C33A02C336D04/dataset/cityscapes/gtFine gtFine
cd data/VOCdevkit
ln -s /media/your_username/A42C33A02C336D04/dataset/Pascal VOC Dataset/VOC2007 VOC2007
ln -s /media/your_username/A42C33A02C336D04/dataset/Pascal VOC Dataset/VOC2012 VOC2012
暂未执行该操作:
The cityscapes annotations have to be converted into the coco format using tools/convert_datasets/cityscapes.py:
mmdetection/model_zoo.md at master · open-mmlab/mmdetection · GitHub
在上述链接中下载需要测试的预训练模型,例如faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth,并在mmdetection目录下新建目录checkpoints,将模型存放到该目录下。
wget "https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth"
如何在Terminal中下载文件,参考
Download Files From Google Drive With curl/wget - DEV
You can use the following commands to test a dataset.
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]
Examples:
Assume that you have already downloaded the checkpoints to the directory checkpoints/
.
# Test Faster R-CNN and visualize the results. Press any key for the next image.
python tools/test.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
--show
# Test Faster R-CNN and save the painted images for latter visualization.
python tools/test.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
--show-dir my_workspace/faster_rcnn_r50_fpn_1x_results
# Test Faster R-CNN on PASCAL VOC (without saving the test results) and evaluate the mAP.
python tools/test.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py \
checkpoints/SOME_CHECKPOINT.pth \
--eval mAP
# ------以下待修改------
# Test Mask R-CNN with 8 GPUs, and evaluate the bbox and mask AP.
./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \
checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
8 --out results.pkl --eval bbox segm
# Test Mask R-CNN with 8 GPUs, and evaluate the classwise bbox and mask AP.
./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \
checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
8 --out results.pkl --eval bbox segm --options "classwise=True"
# Test Mask R-CNN on COCO test-dev with 8 GPUs, and generate the json file to be submit to the official evaluation server.
./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \
checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
8 --format-only --options "jsonfile_prefix=./mask_rcnn_test-dev_results"
# You will get two json files mask_rcnn_test-dev_results.bbox.json and mask_rcnn_test-dev_results.segm.json.
# Test Mask R-CNN on Cityscapes test with 8 GPUs, and generate the txt and png files to be submit to the official evaluation server.
./tools/dist_test.sh configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py \
checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \
8 --format-only --options "txtfile_prefix=./mask_rcnn_cityscapes_test_results"
# The generated png and txt would be under ./mask_rcnn_cityscapes_test_results directory.
We provide a demo script to test a single image.
python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}]
Examples:
python demo/image_demo.py demo/demo.jpg configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth --device cpu
We provide a webcam demo to illustrate the results.
python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--camera-id ${CAMERA-ID}] [--score-thr ${SCORE_THR}]
Examples:
python demo/webcam_demo.py configs/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
Synchronous interface
Here is an example of building the model and test given images.
from mmdet.apis import init_detector, inference_detector
import mmcv
config_file = 'configs/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth'
# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')
# test a single image and show the results
img = 'test.jpg' # or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
# visualize the results in a new window
model.show_result(img, result)
# or save the visualization results to image files
model.show_result(img, result, out_file='result.jpg')
# test a video and show the results
video = mmcv.VideoReader('video.mp4')
for frame in video:
result = inference_detector(model, frame)
model.show_result(frame, result, wait_time=1)
A notebook demo can be found in demo/inference_demo.ipynb.
class EpochBasedRunner(BaseRunner) 的类方法的调用层级: run() --> train()/val() --> run_iter();
class IterBasedRunner(BaseRunner) 的类方法的调用层级: run() --> train()/val() --> run_iter();
使用不同的IMS_PER_BATCH,或者更换不同的GPU设备,对同一个模型进行评估,AP值是相同的;
Average Precision (AP) @[ IoU=0.60 | area= all | maxDets=1000 ] = 0.945
Average Precision (AP) @[ IoU=0.70 | area= all | maxDets=1000 ] = 0.926
Average Precision (AP) @[ IoU=0.80 | area= all | maxDets=1000 ] = 0.880
Average Precision (AP) @[ IoU=0.90 | area= all | maxDets=1000 ] = 0.731
cd ~/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/mmpycocotools-12.0.3-py3.8-linux-x86_64.egg/pycocotools
vim cocoeval.py
# cd ~/anaconda3/envs/mmlab/lib/python3.9/site-packages/pycocotools/cocoeval.py
line563
def setDetParams(self):
self.imgIds = []
self.catIds = []
# np.arange causes trouble. the data point on arange is slightly
# larger than the true value
# 原始为 self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
# 原始为 self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
self.iouThrs = np.linspace(50,
95,
int(np.round((0.95 - .5) / .05)) + 1,
endpoint=True) / 100
self.recThrs = np.linspace(.0,
100,
int(np.round((1.00 - .0) / .01)) + 1,
endpoint=True) / 100
line582
def setKpParams(self):
self.imgIds = []
self.catIds = []
# np.arange causes trouble. the data point on arange is slightly
# larger than the true value
# 原始为 self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
# 原始为 self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01)) + 1, endpoint=True)
self.iouThrs = np.linspace(50,
95,
int(np.round((0.95 - .5) / .05)) + 1,
endpoint=True) / 100
self.recThrs = np.linspace(.0,
100,
int(np.round((1.00 - .0) / .01)) + 1,
endpoint=True) / 100
def _summarize( ap=1, iouThr=None, areaRng='all', maxDets=100 ):
# 二选一: 要么改 self.iouThrs 和 self.recThrs, 要么改 t = np.where(iouThr == p.iouThrs)[0];
t = np.where(abs(iouThr - p.iouThrs) < 1e-5)[0] # usr_modify0110:p.iouThrs[-2]=0.8999999999999999; 原始为 t = np.where(iouThr == p.iouThrs)[0];
line498
def _summarizeDets():
stats = np.zeros((16, ))
stats[0] = _summarize(1)
stats[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
stats[2] = _summarize(1,
iouThr=.75,
maxDets=self.params.maxDets[2])
stats[3] = _summarize(1,
areaRng='small',
maxDets=self.params.maxDets[2])
stats[4] = _summarize(1,
areaRng='medium',
maxDets=self.params.maxDets[2])
stats[5] = _summarize(1,
areaRng='large',
maxDets=self.params.maxDets[2])
stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
stats[7] = _summarize(0, maxDets=self.params.maxDets[1])
stats[8] = _summarize(0, maxDets=self.params.maxDets[2])
stats[9] = _summarize(0,
areaRng='small',
maxDets=self.params.maxDets[2])
stats[10] = _summarize(0,
areaRng='medium',
maxDets=self.params.maxDets[2])
stats[11] = _summarize(0,
areaRng='large',
maxDets=self.params.maxDets[2])
stats[12] = _summarize(1, iouThr=.6, maxDets=self.params.maxDets[2])
stats[13] = _summarize(1, iouThr=.7, maxDets=self.params.maxDets[2])
stats[14] = _summarize(1, iouThr=.8, maxDets=self.params.maxDets[2])
stats[15] = _summarize(1, iouThr=.9, maxDets=self.params.maxDets[2])
return stats
mmdet_20211213/mmdet/datasets/coco.py
def evaluate()
略
# mapping of cocoEval.stats
coco_metric_names = {
'mAP': 0,
'mAP_50': 1,
'mAP_75': 2,
'mAP_s': 3,
'mAP_m': 4,
'mAP_l': 5,
'AR@100': 6,
'AR@300': 7,
'AR@1000': 8,
'AR_s@1000': 9,
'AR_m@1000': 10,
'AR_l@1000': 11,
'mAP_60': 12, # usr_add1231:
'mAP_70': 13, # usr_add1231:
'mAP_80': 14, # usr_add1231:
'mAP_90': 15, # usr_add1231:
}
略
metric_items = [
'mAP', 'mAP_50', 'mAP_75', 'mAP_s', 'mAP_m', 'mAP_l',
'mAP_60', 'mAP_70', 'mAP_80', 'mAP_90', # usr_add1231:
]
略
ap = cocoEval.stats[:6]
ap_extra = cocoEval.stats[-4:] # usr_add1231:
eval_results[f'{metric}_mAP_copypaste'] = (
f'{ap[0]:.3f} {ap[1]:.3f} {ap[2]:.3f} {ap[3]:.3f} '
f'{ap[4]:.3f} {ap[5]:.3f} '
f'{ap_extra[-4]:.3f} {ap_extra[-3]:.3f} {ap_extra[-2]:.3f} {ap_extra[-1]:.3f}') # usr_add1231:
略
Linux下的.so是基于Linux下的动态链接,其功能和作用类似与windows下.dll文件。
通常情况下,对函数库的链接是放在编译时期(compile time)完成的。所有相关的对象文件(object file)与牵涉到的函数库(library)被链接合成一个可执行文件(executable file)。程序在运行时,与函数库再无瓜葛,因为所有需要的函数已拷贝到自己门下。所以这些函数库被成为静态库(static libaray),通常文件名为“libxxx.a”的形式。
其实,我们也可以把对一些库函数的链接载入推迟到程序运行的时期(runtime)。这就是如雷贯耳的动态链接库(dynamic link library)技术。
摘自:linux 中的.so和.a文件 - 心田居士 - 博客园 20190616
mmdetection/build路径下的这些.so文件是通过编译mmdetection/mmdet/ops/下的相关文件(.py,.so,.cpp,.cu)生成的;
因此,如果对mmdetection/mmdet/ops/下的相关文件作了修改,需要通过python setup.py develop
or pip install -v -e .
命令重新编译一下mmdetection,不重新编译的话载入的依旧是原来的动态链接库。
20200729记:
mmdetection v2.1.0 99a31d2 on 9 Jun 2020
卷 软件 的文件夹 PATH 列表
mmdetection/build
│
├─lib.linux-x86_64-3.8
│ └─mmdet
│ └─ops
│ ├─carafe
│ │ carafe_ext.cpython-38-x86_64-linux-gnu.so
│ │ carafe_naive_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ ├─corner_pool
│ │ corner_pool_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ ├─dcn
│ │ deform_conv_ext.cpython-38-x86_64-linux-gnu.so
│ │ deform_pool_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ ├─masked_conv
│ │ masked_conv2d_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ ├─nms
│ │ nms_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ ├─roi_align
│ │ roi_align_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ ├─roi_pool
│ │ roi_pool_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ ├─sigmoid_focal_loss
│ │ sigmoid_focal_loss_ext.cpython-38-x86_64-linux-gnu.so
│ │
│ └─utils
│ compiling_info.cpython-38-x86_64-linux-gnu.so
│
└─temp.linux-x86_64-3.8
└─mmdet
└─ops
├─carafe
│ └─src
│ │ carafe_ext.o
│ │ carafe_naive_ext.o
│ │
│ └─cuda
│ carafe_cuda.o
│ carafe_cuda_kernel.o
│ carafe_naive_cuda.o
│ carafe_naive_cuda_kernel.o
│
├─corner_pool
│ └─src
│ corner_pool.o
│
├─dcn
│ └─src
│ │ deform_conv_ext.o
│ │ deform_pool_ext.o
│ │
│ └─cuda
│ deform_conv_cuda.o
│ deform_conv_cuda_kernel.o
│ deform_pool_cuda.o
│ deform_pool_cuda_kernel.o
│
├─masked_conv
│ └─src
│ │ masked_conv2d_ext.o
│ │
│ └─cuda
│ masked_conv2d_cuda.o
│ masked_conv2d_kernel.o
│
├─nms
│ └─src
│ │ nms_ext.o
│ │
│ ├─cpu
│ │ nms_cpu.o
│ │
│ └─cuda
│ nms_cuda.o
│ nms_kernel.o
│
├─roi_align
│ └─src
│ │ roi_align_ext.o
│ │
│ ├─cpu
│ │ roi_align_v2.o
│ │
│ └─cuda
│ roi_align_kernel.o
│ roi_align_kernel_v2.o
│
├─roi_pool
│ └─src
│ │ roi_pool_ext.o
│ │
│ └─cuda
│ roi_pool_kernel.o
│
├─sigmoid_focal_loss
│ └─src
│ │ sigmoid_focal_loss_ext.o
│ │
│ └─cuda
│ sigmoid_focal_loss_cuda.o
│
└─utils
└─src
compiling_info.o
在终端Terminal通过Ctrl+C强制终止模型的训练过程后, 如果GPU还没被释放掉, 可以通过命令杀死进程;
nvidia-smi # 查看gpu利用率
fuser -v /dev/nvidia* # 使用fuser命令显示所有占用nvidia设备的进程processID
kill -9 13754 # 杀死PID为13754的进程, PID号表示的是进程地址编号
ps -A -ostat,ppid,pid,cmd | grep -e '^[Zz]' # 查看隐藏的进程
深度学习训练已经停止了,可GPU内存还在占用着,怎么办?_u014264373的博客 20200819
ps -ef|grep 进程ID
查找该进程的父进程GPU上某些进程杀不死,查了一下说是因为父进程虽然kill了,但是子进程的内存还没有释放。因此要重新杀死父进程,我们通过命令
ps -ef|grep 进程ID
,来查找其父进程。然后再使用kill -9 父进程ID
来杀死。
关于GPU上进程杀不死的解决_geter_CS的博客-CSDN博客 20190214
如何查看并准确找到占用GPU的程序_XCCCCZ的博客-CSDN博客
查询GPU使用情况以及杀死GPU上的多个无用进程_南国那片枫叶的博客-CSDN博客 20190507
nvidia-smi
的打印输出解释Explained Output of nvidia-smi Utility | by Shachi Kaul | Analytics Vidhya | Medium 20191216
GPU状态监测 nvidia-smi 命令详解_黄飞的博客专栏-CSDN博客 20180201
nvidia-smi查看GPU的使用信息并分析_薰衣草PK向日葵的博客-CSDN博客 20191113
Linux查看GPU信息和使用情况 - Oops!# - 博客园 20181127
nvidia-smi: command not found
, but GPU works fine问题描述:
nvidia-smi: command not found
, but GPU works fine;
原因分析and解决方案:
先运行命令sudo apt purge nvidia-*
, 然后在"系统设置 | 软件和更新 | 附加驱动"里面选中一个驱动后, 点击"Apply Changes";
方式一: 通过 Ctrl+Alt+F1~F7 关闭图形界面; Ctrl+Alt+F8 打开图形界面;
Linux:Xorg占用现存过大问题_Hz_xi的博客-CSDN博客 20210715
注: 键盘按键 Ctrl+Alt+F1~F7 关闭图形界面后, GPU-Util的确下降了. 但如果在用向日葵远程控制的话, 关闭图形界面后黑屏状态下, 就没法远程重新打开图形界面了.
方式二(未测试): 该问题主要是由于系统默认使用独立显卡, 可以通过修改xorg的显卡使用来让xorg切换到集成显卡;
解决/usr/lib/xorg/Xorg占用gpu显存的问题 - 简书 20210706
usrname@usrname:~$ cat /etc/X11/xorg.conf
cat: /etc/X11/xorg.conf: No such file or directory
方式三: Just run sudo prime-select on-demand
and reboot
How to prevent Xorg process from using the GPU? on Ubuntu 20.04.3 LTS (with a RTX 3050 Ti) - Graphics / Linux / Linux - NVIDIA Developer Forums
generix Top Contributor 20220121:
Just runsudo prime-select on-demand
andreboot
. This will run everything on the intel igpu, only leaving a 4MB process on the nvidia left.
If you also want to get of that, additionally create the file you mentioned in your first post, you’ll then have to make sure the nvidia-persistenced daemon is started on boot.
user142861 20220121:
Hey @generix , Well done! It was much simpler than I thought. Indeed, I have now my/usr/lib/xorg/Xorg
process with only 4MB and nothing else. Exactly the kind of GPU optimization I was looking for.
Thanks a lot for your quick support.
尝试了执行命令 sudo prime-select on-demand
, 但该命令在当前机器上用法不对;
usrname@usrname:/usr/share/X11/xorg.conf.d$ sudo prime-select on-demand
Usage: /usr/bin/prime-select nvidia|intel|query
(usr_mmlab) usrname@usrname-System-Product-Name:~/usrname_workdir/usr/Pytorch_WorkSpace/OpenSourcePlatform/mmdetection$ ./tools/dist_train.sh configs/faster_rcnn/v2v3_11_faster_rcnn_r50_fpn_1x_voc.py 2
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
2020-10-10 11:39:12,029 - mmdet - INFO - Environment info:
在终端运行命令的前面加上CUDA_VISIBLE_DEVICES=0,1,例如CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=1 python train_net.py
;
指定GPU训练 CUDA_VISIBLE_DEVICES_xkx_07_10的博客-CSDN博客 20190819
(待阅读) Pytorch 多GPU训练-单运算节点-All you need - walter_xh - 博客园 20190926
PyTorch Distributed Developers
Distributed communication package - torch.distributed — PyTorch 1.8.1 documentation
mmdet/models/dense_heads/gfl_head.py
def reduce_mean(tensor):
if not (dist.is_available() and dist.is_initialized()):
return tensor
tensor = tensor.clone()
# 先除以the number of processes in the current process group;
# 再Reduces the tensor data across all machines in such a way that all get the final result.
# After the call ``tensor`` is going to be bitwise identical in all processes;
dist.all_reduce(tensor.div_(dist.get_world_size()), op=dist.ReduceOp.SUM)
return tensor
num_total_samples = reduce_mean(torch.tensor(num_total_pos).cuda()).item()
num_total_samples = max(num_total_samples, 1.0)
mmdet/models/detectors/base.py
log_vars['loss'] = loss
for loss_name, loss_value in log_vars.items():
# reduce loss when distributed training
if dist.is_available() and dist.is_initialized():
loss_value = loss_value.data.clone()
dist.all_reduce(loss_value.div_(dist.get_world_size()))
log_vars[loss_name] = loss_value.item()
model training is not reproducible · Issue #2773 · open-mmlab/mmdetection · GitHub 20200524
@ZwwWayne sure, here are the 4 mAPs obtained with the command above: 0.6258, 0.6283, 0.6226, 0.6197.
tmp_a = [0.6258, 0.6283, 0.6226, 0.6197], 跳动范围max(tmp_a)-min(tmp_a)=0.0085
model training is not reproducible · Issue #2773 · open-mmlab/mmdetection · GitHub 20200524
Hi @aabramovrepo ,
I also try to run the same config for four times, and obtain 0.8086, 0.7974, 0.7985, and 0.8009 AP. So it seems that the performance on VOC is indeed more unstable than the detectors on COCO dataset.
tmp_b = [0.8086, 0.7974, 0.7985, 0.8009], 跳动范围max(tmp_b)-min(tmp_b)=0.0112
Divergence while training Mask RCNN with ResNet (50 or 101 backbone) on a custom COCO type format dataset · Issue #3557 · open-mmlab/mmdetection · GitHub 20200814
如何同时安装CUDA10.0和CUDA10.1?
在已经安装了cuda 10.0的环境下再安装一下cuda 10.1,直接在官网下载安装cuda 10.1和cudnn 7.6.1;
同时安装了CUDA10.0和CUDA10.1,Linux系统下如何切换?
当需要CUDA10.0时,
sudo rm -rf /usr/local/cuda
sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda
nvcc --version
当需要CUDA10.1时,
sudo rm -rf /usr/local/cuda
sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda
nvcc --version
问题描述:
det_bboxes, det_scores = (pred_bboxes[:, :4], pred_bboxes[:, 4])
batched_nms(det_bboxes, det_scores, det_labels, nms_cfg)
Traceback (most recent call last):
File "/home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "" , line 1, in
batched_nms(det_bboxes, det_scores, det_labels, nms_cfg)
File "/home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 259, in batched_nms
dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
File "/home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/mmcv/utils/misc.py", line 310, in new_func
output = old_func(*args, **kwargs)
File "/home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 113, in nms
inds = NMSop.apply(boxes, scores, iou_threshold, offset)
File "/home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/mmcv/ops/nms.py", line 18, in forward
inds = ext_module.nms(
RuntimeError: scores must be contiguous (nms at ./mmcv/ops/csrc/pytorch/nms.cpp:67)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f788a3d1627 in /home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: nms(at::Tensor, at::Tensor, float, int) + 0x5b3 (0x7f7865bb7f53 in /home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so)
frame #2: + 0xc7e09 (0x7f7865ae5e09 in /home/usrname/miniconda2/envs/usr_mmlab/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so)
frame #3: ...
略
原因分析and解决方案:
原因是由于the tensor det_labels is not contiguous; 解决方案是执行.contiguous()操作;即改为batched_nms(det_bboxes, det_scores.contiguous(), det_labels, nms_cfg);
mmcv/utils/registry.py def build_from_cfg():
执行 return obj_cls(**args)
该行代码时一直无法构造实例, 后来突然发现是因为在类中定义了两个 def __init__()
函数;
问题描述:
已经register了自己的数据集后, 仍然报错xxxDataset is not in the dataset registry
;
原因分析and解决方案:
由于安装mmdet v2.18.0时使用的是mim install mmdet
命令, 因此程序中实际调用执行的仍然是之前的脚本文件; 于是, 先通过pip uninstall mmdet
命令卸载, 再通过以下命令重新安装一下mmdet, 问题便得以解决;
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e . # or "python setup.py develop"
- jshilong commented on 7 Aug 2021
Please make sure that the mmdet you use is modified which adding the SunrgbdDataset,I am afraid you are using the version installed in the environment before
SUNRGBDDataset is not in the dataset registry · Issue #5800 · open-mmlab/mmdetection · GitHub
问题描述:
开始
2022-05-18 09:41:08,390 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2022-05-18 09:41:22,162 - mmdet - INFO - Epoch [290][50/9846] lr: 4.621e-04, eta: 8 days, 2:48:26, time: 0.345, data_time: 0.145, memory: 7110, loss_cls: 1.0795, loss_bbox: 2.1580, loss_obj: 1.6240, loss: 4.8616
2022-05-18 09:41:36,016 - mmdet - INFO - Epoch [290][100/9846] lr: 4.620e-04, eta: 7 days, 7:37:21, time: 0.277, data_time: 0.098, memory: 7110, loss_cls: 0.7758, loss_bbox: 2.1439, loss_obj: 1.4736, loss: 4.3933
2022-05-18 09:41:50,135 - mmdet - INFO - Epoch [290][150/9846] lr: 4.620e-04, eta: 7 days, 2:13:19, time: 0.282, data_time: 0.101, memory: 7110, loss_cls: 0.7527, loss_bbox: 2.1367, loss_obj: 1.4209, loss: 4.3103
2022-05-18 09:42:04,179 - mmdet - INFO - Epoch [290][200/9846] lr: 4.620e-04, eta: 6 days, 23:18:18, time: 0.281, data_time: 0.112, memory: 7110, loss_cls: 0.7673, loss_bbox: 2.1797, loss_obj: 1.4003, loss: 4.3473
Traceback (most recent call last):
File "/media/user/USER/amax/USER_users/usrname/OpenSourcePlatform/mmdet_2.23.0/./tools/train.py", line 223, in <module>
main()
File "/media/user/USER/amax/USER_users/usrname/OpenSourcePlatform/mmdet_2.23.0/./tools/train.py", line 212, in main
train_detector(
File "/media/user/USER/amax/USER_users/usrname/OpenSourcePlatform/mmdet_2.23.0/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/mmcv/parallel/distributed.py", line 42, in train_step
and self.reducer._rebuild_buckets()):
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 2: 372 373 374 375 376 377
In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13219 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13220 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13222 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 13221) of binary: /home/user/anaconda3/envs/usr_mmlab1213/bin/python
Traceback (most recent call last):
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user/anaconda3/envs/usr_mmlab1213/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./tools/train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
原因分析:
开始
解决方案:
开始
数学公式粗体 \textbf{} 或者 m e m o r y {\bf memory} memory
数学公式粗斜体 \bm{}
摘录自“bookname_author”
此文系转载,原文链接:名称 20200505
高亮颜色说明:突出重点
个人觉得,:待核准个人观点是否有误
分割线
分割线
问题描述:
原因分析:
解决方案: