先讲怎么做,再讲源码层面的东西
数据集
方便起见,请自行转化为coco样式,我是在这个基础上修改的,如果不想转数据集,那参照后面的例子自己写data_loader
;
coco数据集样式:[假设都在detectron2的工程目录下]
以训练行人为例【只有person这一个类别】
修改./detectron2/data/datasets/builtin_meta.py
中的_get_coco_instances_meta()
函数。
在最后的return ret
之前,直接注释这个函数的前面代码,把ret
改成自己需要的部分,下面是我的代码:
def _get_coco_instances_meta():
#thing_ids = [k["id"] for k in COCO_CATEGORIES if k["isthing"] == 1]
#thing_colors = [k["color"] for k in COCO_CATEGORIES if k["isthing"] == 1]
#assert len(thing_ids) == 80, len(thing_ids)
## Mapping from the incontiguous COCO category id to an id in [0, 79]
#thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)}
#thing_classes = [k["name"] for k in COCO_CATEGORIES if k["isthing"] == 1]
#ret = {
# "thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id,
# "thing_classes": thing_classes,
# "thing_colors": thing_colors,
#}
ret = {
"thing_dataset_id_to_contiguous_id": {1:0},
"thing_classes": ["person"],
"thing_colors": [[220,20,60]],
}
#print("my ret: ",ret)
return ret
注意点:
_get_coco_instances_meta()
函数,做分割和关键点的小伙伴绕行,可以看懂下面原理后自己修改;builtin_meta.py
最开始的COCO_CATEGORIES
找定义,或者可以粗暴的修改COCO_CATEGORIES
定义,但我没试过,不晓得有没有bug;改配置文件
两个地方,MODEL.RETINANET.NUM_CLASSES
和MODEL.ROI_HEADS.NUM_CLASSES
都改为1(如果是coco,原本应该是80);
我的配置文件,config.yaml,内容如下:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:
- coco_2017_val
TRAIN:
- coco_2017_train
GLOBAL:
HACK: 1.0
INPUT:
CROP:
ENABLED: false
SIZE:
- 0.9
- 0.9
TYPE: relative_range
FORMAT: BGR
MASK_FORMAT: polygon
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN:
- 640
- 672
- 704
- 736
- 768
- 800
MIN_SIZE_TRAIN_SAMPLING: choice
MODEL:
ANCHOR_GENERATOR:
ANGLES:
- - -90
- 0
- 90
ASPECT_RATIOS:
- - 0.5
- 1.0
- 2.0
NAME: DefaultAnchorGenerator
SIZES:
- - 32
- 40.31747359663594
- 50.79683366298238
- - 64
- 80.63494719327188
- 101.59366732596476
- - 128
- 161.26989438654377
- 203.18733465192952
- - 256
- 322.53978877308754
- 406.37466930385904
- - 512
- 645.0795775461751
- 812.7493386077181
BACKBONE:
FREEZE_AT: 2
NAME: build_retinanet_resnet_fpn_backbone
DEVICE: cuda
FPN:
FUSE_TYPE: sum
IN_FEATURES:
- res3
- res4
- res5
NORM: ''
OUT_CHANNELS: 256
KEYPOINT_ON: false
LOAD_PROPOSALS: false
MASK_ON: false
META_ARCHITECTURE: RetinaNet
PANOPTIC_FPN:
COMBINE:
ENABLED: true
INSTANCES_CONFIDENCE_THRESH: 0.5
OVERLAP_THRESH: 0.5
STUFF_AREA_LIMIT: 4096
INSTANCE_LOSS_WEIGHT: 1.0
PIXEL_MEAN:
- 103.53
- 116.28
- 123.675
PIXEL_STD:
- 1.0
- 1.0
- 1.0
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
RESNETS:
DEFORM_MODULATED: false
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE:
- false
- false
- false
- false
DEPTH: 50
NORM: FrozenBN
NUM_GROUPS: 1
OUT_FEATURES:
- res3
- res4
- res5
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: true
WIDTH_PER_GROUP: 64
RETINANET:
BBOX_REG_WEIGHTS:
- 1.0
- 1.0
- 1.0
- 1.0
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
IN_FEATURES:
- p3
- p4
- p5
- p6
- p7
IOU_LABELS:
- 0
- -1
- 1
IOU_THRESHOLDS:
- 0.4
- 0.5
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 1
NUM_CONVS: 4
PRIOR_PROB: 0.01
SCORE_THRESH_TEST: 0.05
SMOOTH_L1_LOSS_BETA: 0.1
TOPK_CANDIDATES_TEST: 1000
ROI_BOX_CASCADE_HEAD:
BBOX_REG_WEIGHTS:
- - 10.0
- 10.0
- 5.0
- 5.0
- - 20.0
- 20.0
- 10.0
- 10.0
- - 30.0
- 30.0
- 15.0
- 15.0
IOUS:
- 0.5
- 0.6
- 0.7
ROI_BOX_HEAD:
BBOX_REG_WEIGHTS:
- 10.0
- 10.0
- 5.0
- 5.0
CLS_AGNOSTIC_BBOX_REG: false
CONV_DIM: 256
FC_DIM: 1024
NAME: ''
NORM: ''
NUM_CONV: 0
NUM_FC: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
SMOOTH_L1_BETA: 0.0
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
IN_FEATURES:
- res4
IOU_LABELS:
- 0
- 1
IOU_THRESHOLDS:
- 0.5
NAME: Res5ROIHeads
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 1
POSITIVE_FRACTION: 0.25
PROPOSAL_APPEND_GT: true
SCORE_THRESH_TEST: 0.05
ROI_KEYPOINT_HEAD:
CONV_DIMS:
- 512
- 512
- 512
- 512
- 512
- 512
- 512
- 512
LOSS_WEIGHT: 1.0
MIN_KEYPOINTS_PER_IMAGE: 1
NAME: KRCNNConvDeconvUpsampleHead
NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
NUM_KEYPOINTS: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
ROI_MASK_HEAD:
CLS_AGNOSTIC_MASK: false
CONV_DIM: 256
NAME: MaskRCNNConvUpsampleHead
NORM: ''
NUM_CONV: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
RPN:
BATCH_SIZE_PER_IMAGE: 256
BBOX_REG_WEIGHTS:
- 1.0
- 1.0
- 1.0
- 1.0
BOUNDARY_THRESH: -1
HEAD_NAME: StandardRPNHead
IN_FEATURES:
- res4
IOU_LABELS:
- 0
- -1
- 1
IOU_THRESHOLDS:
- 0.3
- 0.7
LOSS_WEIGHT: 1.0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
PRE_NMS_TOPK_TEST: 6000
PRE_NMS_TOPK_TRAIN: 12000
SMOOTH_L1_BETA: 0.0
SEM_SEG_HEAD:
COMMON_STRIDE: 4
CONVS_DIM: 128
IGNORE_VALUE: 255
IN_FEATURES:
- p2
- p3
- p4
- p5
LOSS_WEIGHT: 1.0
NAME: SemSegFPNHead
NORM: GN
NUM_CLASSES: 54
WEIGHTS: models/COCORetinaNet_R50.pkl
OUTPUT_DIR: ./output
SEED: -1
SOLVER:
BASE_LR: 0.0001
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 5000
GAMMA: 0.1
IMS_PER_BATCH: 32
LR_SCHEDULER_NAME: WarmupMultiStepLR
MAX_ITER: 270000
MOMENTUM: 0.9
STEPS:
- 210000
- 250000
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 1000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0.0001
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: false
FLIP: true
MAX_SIZE: 4000
MIN_SIZES:
- 400
- 500
- 600
- 700
- 800
- 900
- 1000
- 1100
- 1200
DETECTIONS_PER_IMAGE: 100
EVAL_PERIOD: 0
EXPECTED_RESULTS: []
KEYPOINT_OKS_SIGMAS: []
PRECISE_BN:
ENABLED: false
NUM_ITER: 200
VERSION: 2
自己写data_loader例子
pass,后面解释含义,和detectron2读入数据的逻辑,现在小伙伴就自己看代码吧~
import os
import numpy as np
import json
from detectron2.structures import BoxMode
import itertools
# write a function that loads the dataset into detectron2's standard format
# img_dir = "coco_person"
def get_balloon_dicts(img_dir):
json_file = os.path.join(img_dir)
with open(json_file) as f:
imgs_anns = json.load(f)
dataset_dicts = []
for _, v in imgs_anns["images"].items():
record = {}
filename = os.path.join(img_dir, v["filename"])
height, width = cv2.imread(filename).shape[:2]
record["file_name"] = filename
record["height"] = height
record["width"] = width
annos = v["regions"]
objs = []
for _, anno in annos.items():
assert not anno["region_attributes"]
anno = anno["shape_attributes"]
px = anno["all_points_x"]
py = anno["all_points_y"]
poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
poly = list(itertools.chain.from_iterable(poly))
obj = {
"bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
"bbox_mode": BoxMode.XYXY_ABS,
"segmentation": [poly],
"category_id": 0,
"iscrowd": 0
}
objs.append(obj)
record["annotations"] = objs
dataset_dicts.append(record)
return dataset_dicts
from detectron2.data import DatasetCatalog, MetadataCatalog
for d in ["train", "val"]:
DatasetCatalog.register("balloon/" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
MetadataCatalog.get("balloon/" + d).set(thing_classes=["balloon"])
balloon_metadata = MetadataCatalog.get("balloon/train")