detectron2训练自己的数据集

先讲怎么做,再讲源码层面的东西

  1. 数据集
    方便起见,请自行转化为coco样式,我是在这个基础上修改的,如果不想转数据集,那参照后面的例子自己写data_loader

    coco数据集样式:[假设都在detectron2的工程目录下]

    • datasets
      • coco
        • annotations
          • instances_train2017.json
          • instances_val2017.json
        • train2017
          • image001.jpg
          • image002.jpg
          • image004.jpg
        • val2017
          • image003.jpg
          • image005.jpg
  2. 以训练行人为例【只有person这一个类别】
    修改./detectron2/data/datasets/builtin_meta.py中的_get_coco_instances_meta()函数。
    在最后的return ret之前,直接注释这个函数的前面代码,把ret改成自己需要的部分,下面是我的代码:

    def _get_coco_instances_meta():
    #thing_ids = [k["id"] for k in COCO_CATEGORIES if k["isthing"] == 1]
    #thing_colors = [k["color"] for k in COCO_CATEGORIES if k["isthing"] == 1]
    #assert len(thing_ids) == 80, len(thing_ids)
    ## Mapping from the incontiguous COCO category id to an id in [0, 79]
    #thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)}
    #thing_classes = [k["name"] for k in COCO_CATEGORIES if k["isthing"] == 1]
    #ret = {
    #    "thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id,
    #    "thing_classes": thing_classes,
    #    "thing_colors": thing_colors,
    #}
    
    ret = {
        "thing_dataset_id_to_contiguous_id": {1:0},
        "thing_classes": ["person"],
        "thing_colors": [[220,20,60]],
    }
    #print("my ret: ",ret)
    return ret
    

    注意点:

    • 我是做行人检测,所以修改的是_get_coco_instances_meta()函数,做分割和关键点的小伙伴绕行,可以看懂下面原理后自己修改;
    • ret的那二个字段就是我的行人标签,第一和第三个字段可以去builtin_meta.py最开始的COCO_CATEGORIES找定义,或者可以粗暴的修改COCO_CATEGORIES定义,但我没试过,不晓得有没有bug;
    • 你也做行人检测,可以这么修改,木有问题,做其他任务的小伙伴,一定记得看代码,或者留言,我可以给些建议;
  3. 改配置文件
    两个地方,MODEL.RETINANET.NUM_CLASSESMODEL.ROI_HEADS.NUM_CLASSES都改为1(如果是coco,原本应该是80);
    我的配置文件,config.yaml,内容如下:

    CUDNN_BENCHMARK: false
    DATALOADER:
      ASPECT_RATIO_GROUPING: true
      NUM_WORKERS: 4
      REPEAT_THRESHOLD: 0.0
      SAMPLER_TRAIN: TrainingSampler
    DATASETS:
      PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
      PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
      PROPOSAL_FILES_TEST: []
      PROPOSAL_FILES_TRAIN: []
      TEST:
      - coco_2017_val
      TRAIN:
      - coco_2017_train
    GLOBAL:
      HACK: 1.0
    INPUT:
      CROP:
        ENABLED: false
        SIZE:
        - 0.9
        - 0.9
        TYPE: relative_range
      FORMAT: BGR
      MASK_FORMAT: polygon
      MAX_SIZE_TEST: 1333
      MAX_SIZE_TRAIN: 1333
      MIN_SIZE_TEST: 800
      MIN_SIZE_TRAIN:
      - 640
      - 672
      - 704
      - 736
      - 768
      - 800
      MIN_SIZE_TRAIN_SAMPLING: choice
    MODEL:
      ANCHOR_GENERATOR:
        ANGLES:
        - - -90
          - 0
          - 90
        ASPECT_RATIOS:
        - - 0.5
          - 1.0
          - 2.0
        NAME: DefaultAnchorGenerator
        SIZES:
        - - 32
          - 40.31747359663594
          - 50.79683366298238
        - - 64
          - 80.63494719327188
          - 101.59366732596476
        - - 128
          - 161.26989438654377
          - 203.18733465192952
        - - 256
          - 322.53978877308754
          - 406.37466930385904
        - - 512
          - 645.0795775461751
          - 812.7493386077181
      BACKBONE:
        FREEZE_AT: 2
        NAME: build_retinanet_resnet_fpn_backbone
      DEVICE: cuda
      FPN:
        FUSE_TYPE: sum
        IN_FEATURES:
        - res3
        - res4
        - res5
        NORM: ''
        OUT_CHANNELS: 256
      KEYPOINT_ON: false
      LOAD_PROPOSALS: false
      MASK_ON: false
      META_ARCHITECTURE: RetinaNet
      PANOPTIC_FPN:
        COMBINE:
          ENABLED: true
          INSTANCES_CONFIDENCE_THRESH: 0.5
          OVERLAP_THRESH: 0.5
          STUFF_AREA_LIMIT: 4096
        INSTANCE_LOSS_WEIGHT: 1.0
      PIXEL_MEAN:
      - 103.53
      - 116.28
      - 123.675
      PIXEL_STD:
      - 1.0
      - 1.0
      - 1.0
      PROPOSAL_GENERATOR:
        MIN_SIZE: 0
        NAME: RPN
      RESNETS:
        DEFORM_MODULATED: false
        DEFORM_NUM_GROUPS: 1
        DEFORM_ON_PER_STAGE:
        - false
        - false
        - false
        - false
        DEPTH: 50
        NORM: FrozenBN
        NUM_GROUPS: 1
        OUT_FEATURES:
        - res3
        - res4
        - res5
        RES2_OUT_CHANNELS: 256
        RES5_DILATION: 1
        STEM_OUT_CHANNELS: 64
        STRIDE_IN_1X1: true
        WIDTH_PER_GROUP: 64
      RETINANET:
        BBOX_REG_WEIGHTS:
        - 1.0
        - 1.0
        - 1.0
        - 1.0
        FOCAL_LOSS_ALPHA: 0.25
        FOCAL_LOSS_GAMMA: 2.0
        IN_FEATURES:
        - p3
        - p4
        - p5
        - p6
        - p7
        IOU_LABELS:
        - 0
        - -1
        - 1
        IOU_THRESHOLDS:
        - 0.4
        - 0.5
        NMS_THRESH_TEST: 0.5
        NUM_CLASSES: 1
        NUM_CONVS: 4
        PRIOR_PROB: 0.01
        SCORE_THRESH_TEST: 0.05
        SMOOTH_L1_LOSS_BETA: 0.1
        TOPK_CANDIDATES_TEST: 1000
      ROI_BOX_CASCADE_HEAD:
        BBOX_REG_WEIGHTS:
        - - 10.0
          - 10.0
          - 5.0
          - 5.0
        - - 20.0
          - 20.0
          - 10.0
          - 10.0
        - - 30.0
          - 30.0
          - 15.0
          - 15.0
        IOUS:
        - 0.5
        - 0.6
        - 0.7
      ROI_BOX_HEAD:
        BBOX_REG_WEIGHTS:
        - 10.0
        - 10.0
        - 5.0
        - 5.0
        CLS_AGNOSTIC_BBOX_REG: false
        CONV_DIM: 256
        FC_DIM: 1024
        NAME: ''
        NORM: ''
        NUM_CONV: 0
        NUM_FC: 0
        POOLER_RESOLUTION: 14
        POOLER_SAMPLING_RATIO: 0
        POOLER_TYPE: ROIAlignV2
        SMOOTH_L1_BETA: 0.0
      ROI_HEADS:
        BATCH_SIZE_PER_IMAGE: 512
        IN_FEATURES:
        - res4
        IOU_LABELS:
        - 0
        - 1
        IOU_THRESHOLDS:
        - 0.5
        NAME: Res5ROIHeads
        NMS_THRESH_TEST: 0.5
        NUM_CLASSES: 1
        POSITIVE_FRACTION: 0.25
        PROPOSAL_APPEND_GT: true
        SCORE_THRESH_TEST: 0.05
      ROI_KEYPOINT_HEAD:
        CONV_DIMS:
        - 512
        - 512
        - 512
        - 512
        - 512
        - 512
        - 512
        - 512
        LOSS_WEIGHT: 1.0
        MIN_KEYPOINTS_PER_IMAGE: 1
        NAME: KRCNNConvDeconvUpsampleHead
        NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
        NUM_KEYPOINTS: 17
        POOLER_RESOLUTION: 14
        POOLER_SAMPLING_RATIO: 0
        POOLER_TYPE: ROIAlignV2
      ROI_MASK_HEAD:
        CLS_AGNOSTIC_MASK: false
        CONV_DIM: 256
        NAME: MaskRCNNConvUpsampleHead
        NORM: ''
        NUM_CONV: 0
        POOLER_RESOLUTION: 14
        POOLER_SAMPLING_RATIO: 0
        POOLER_TYPE: ROIAlignV2
      RPN:
        BATCH_SIZE_PER_IMAGE: 256
        BBOX_REG_WEIGHTS:
        - 1.0
        - 1.0
        - 1.0
        - 1.0
        BOUNDARY_THRESH: -1
        HEAD_NAME: StandardRPNHead
        IN_FEATURES:
        - res4
        IOU_LABELS:
        - 0
        - -1
        - 1
        IOU_THRESHOLDS:
        - 0.3
        - 0.7
        LOSS_WEIGHT: 1.0
        NMS_THRESH: 0.7
        POSITIVE_FRACTION: 0.5
        POST_NMS_TOPK_TEST: 1000
        POST_NMS_TOPK_TRAIN: 2000
        PRE_NMS_TOPK_TEST: 6000
        PRE_NMS_TOPK_TRAIN: 12000
        SMOOTH_L1_BETA: 0.0
      SEM_SEG_HEAD:
        COMMON_STRIDE: 4
        CONVS_DIM: 128
        IGNORE_VALUE: 255
        IN_FEATURES:
        - p2
        - p3
        - p4
        - p5
        LOSS_WEIGHT: 1.0
        NAME: SemSegFPNHead
        NORM: GN
        NUM_CLASSES: 54
      WEIGHTS: models/COCORetinaNet_R50.pkl
    OUTPUT_DIR: ./output
    SEED: -1
    SOLVER:
      BASE_LR: 0.0001
      BIAS_LR_FACTOR: 1.0
      CHECKPOINT_PERIOD: 5000
      GAMMA: 0.1
      IMS_PER_BATCH: 32
      LR_SCHEDULER_NAME: WarmupMultiStepLR
      MAX_ITER: 270000
      MOMENTUM: 0.9
      STEPS:
      - 210000
      - 250000
      WARMUP_FACTOR: 0.001
      WARMUP_ITERS: 1000
      WARMUP_METHOD: linear
      WEIGHT_DECAY: 0.0001
      WEIGHT_DECAY_BIAS: 0.0001
      WEIGHT_DECAY_NORM: 0.0
    TEST:
      AUG:
        ENABLED: false
        FLIP: true
        MAX_SIZE: 4000
        MIN_SIZES:
        - 400
        - 500
        - 600
        - 700
        - 800
        - 900
        - 1000
        - 1100
        - 1200
      DETECTIONS_PER_IMAGE: 100
      EVAL_PERIOD: 0
      EXPECTED_RESULTS: []
      KEYPOINT_OKS_SIGMAS: []
      PRECISE_BN:
        ENABLED: false
        NUM_ITER: 200
    VERSION: 2
    
    
  4. 自己写data_loader例子
    pass,后面解释含义,和detectron2读入数据的逻辑,现在小伙伴就自己看代码吧~

import os
import numpy as np
import json
from detectron2.structures import BoxMode
import itertools

# write a function that loads the dataset into detectron2's standard format
# img_dir = "coco_person"
def get_balloon_dicts(img_dir):
    json_file = os.path.join(img_dir)
    with open(json_file) as f:
        imgs_anns = json.load(f)

    dataset_dicts = []
    for _, v in imgs_anns["images"].items():
        record = {}
        
        filename = os.path.join(img_dir, v["filename"])
        height, width = cv2.imread(filename).shape[:2]
        
        record["file_name"] = filename
        record["height"] = height
        record["width"] = width
      
        annos = v["regions"]
        objs = []
        for _, anno in annos.items():
            assert not anno["region_attributes"]
            anno = anno["shape_attributes"]
            px = anno["all_points_x"]
            py = anno["all_points_y"]
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = list(itertools.chain.from_iterable(poly))

            obj = {
                "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                "bbox_mode": BoxMode.XYXY_ABS,
                "segmentation": [poly],
                "category_id": 0,
                "iscrowd": 0
            }
            objs.append(obj)
        record["annotations"] = objs
        dataset_dicts.append(record)
    return dataset_dicts

from detectron2.data import DatasetCatalog, MetadataCatalog
for d in ["train", "val"]:
    DatasetCatalog.register("balloon/" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
    MetadataCatalog.get("balloon/" + d).set(thing_classes=["balloon"])
balloon_metadata = MetadataCatalog.get("balloon/train")

你可能感兴趣的:(Deep,Learning,PYTHON)