ubuntu16.04使用官方mask-rcnn (caffe2实现)训练自己的数据集

1、源代码及预训练模型准备
(1)官方源码下载地址:
https://github.com/facebookresearch/Detectron
(2)预训练模型下载地址(R-50):
https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl

2、运行环境搭建

略,详见源码的md说明文件。。。

3、数据集准备
(1)数据集存放位置

$DETECTRON/detectron/datasets/data

把数据集放置在如上目录下.

(2)数据集目录结构(当然这个目录结构可以自由设定),比如我的:

hainu
|_ train
|  |_ .jpg
|  |_ ...
|  |_ .jpg
|_ val
|  |_ .jpg
|  |_ ...
|  |_ .jpg
|_ annotations
   |_ instances_train.json
   |_ instances_val.json

(3)注册自己的数据集

在$DETECTRON//detectron/datasets/dataset_catalog.py文件中注册自己的数据集

打开dataset_catalog.py可看到coco_2014数据集注册方式如下:

    'coco_2014_train': {
        _IM_DIR:
            _DATA_DIR + '/coco/coco_train2014',
        _ANN_FN:
            _DATA_DIR + '/coco/annotations/instances_train2014.json'
    },
    'coco_2014_val': {
        _IM_DIR:
            _DATA_DIR + '/coco/coco_val2014',
        _ANN_FN:
            _DATA_DIR + '/coco/annotations/instances_val2014.json'
    },

照猫画虎注册自己的数据集:

    'anthrax_train': {
        _IM_DIR:
            _DATA_DIR + '/hainu/train',
        _ANN_FN:
            _DATA_DIR + '/hainu/annotations/instance_train.json'
    },
    'anthrax_val': {
        _IM_DIR:
            _DATA_DIR + '/hainu/val',
        _ANN_FN:
            _DATA_DIR + '/hainu/annotations/instance_val.json'
    }

至此,数据集注册完成.

4、开始用自己的数据集训练mask-rcnn
注:训练及测试网络命令:

#训练
python2 tools/train_net.py \
    --cfg configs/getting_started/tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml \
    OUTPUT_DIR  /home/output
#预测有test_net.py和infer_simple.py两种,其中test_net.py有输出详细的评价指标而infer_simple.py没有。即预测时可以:
#预测
python2 tools/test_net.py \
    --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml \
    TEST.WEIGHTS /home/output/model_final.pkl \
    NUM_GPUS 1

#也可以这样预测,此方式把要测试的图片放到$DETECTRON/demo目录下,下述wts参数请自行修改
python tools/infer_simple.py \
    --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
    --output-dir /tmp/detectron-visualizations \
    --image-ext jpg \
    --wts https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
    demo
    

(1)训练前修改$DETECTRON/detectron/configs/getting_started目录下的yaml文件

(2)2019年2月份下载下来的源码中,发现在$DETECTRON/detectron/configs/getting_started目录下的yaml文件都只是适用于训练faster-rcnn的,因此要训练mask-rcnn需要把mask部分添加进去(下面有说明).

###########################出错记录#############################

训练:
python2 tools/train_net.py \
    --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml \
    OUTPUT_DIR  /home/output 
预测:
python2 tools/test_net.py \
    --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml \
    TEST.WEIGHTS /home/output/model_final.pkl \
    NUM_GPUS 1

然后在运行python2 tools/test_net.py就会出现如下错误:

INFO net.py:  57: Loading weights from: t/home/output/model_final.pkl
INFO net.py:  86: fcn1_w not found
INFO net.py:  86: fcn1_b not found
INFO net.py:  86: fcn2_w not found
INFO net.py:  86: fcn2_b not found
INFO net.py:  86: fcn3_w not found
INFO net.py:  86: fcn3_b not found
INFO net.py:  86: fcn4_w not found
INFO net.py:  86: fcn4_b not found
INFO net.py:  86: conv5_mask_w not found
INFO net.py:  86: conv5_mask_b not found
INFO net.py:  86: mask_fcn_logits_w not found
INFO net.py:  86: mask_fcn_logits_b not found
...
Offending Blob name: gpu_0/_[mask]_fcn1_w.
Error from operator:
input: "gpu_0/_[mask]_roi_feat" input: "gpu_0/_[mask]_fcn1_w" input: "gpu_0/_[mask]_fcn1_b" output: "gpu_0/_[mask]_fcn1" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" debug_info: "  File \"tools/infer_simple.py\", line 147, in \n    main(args)\n  File \"tools/infer_simple.py\", line 99, in main\n    model = infer_engine.initialize_model_from_cfg()\n  File \"/home/siyu/Detectron/lib/core/test_engine.py\", line 266, in initialize_model_from_cfg\n    model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)\n  File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 124, in create\n    return get_func(model_type_func)(model)\n  File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 89, in generalized_rcnn\n    freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY\n  File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 229, in build_generic_detection_model\n    optim.build_data_parallel_model(model, _single_gpu_build_func)\n  File \"/home/siyu/Detectron/lib/modeling/optimizer.py\", line 54, in build_data_parallel_model\n    single_gpu_build_func(model)\n  File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 210, in _single_gpu_build_func\n    spatial_scale_conv\n  File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 272, in _add_roi_mask_head\n    model, blob_in, dim_in, spatial_scale_in\n  File \"/home/siyu/Detectron/lib/modeling/mask_rcnn_heads.py\", line 113, in mask_rcnn_fcn_head_v1up4convs\n    model, blob_in, dim_in, spatial_scale, 4\n  File \"/home/siyu/Detectron/lib/modeling/mask_rcnn_heads.py\", line 151, in mask_rcnn_fcn_head_v1upXconvs\n    bias_init=(\'ConstantFill\', {\'value\': 0.})\n  File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/cnn.py\", line 112, in Conv\n    **kwargs\n  File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/brew.py\", line 121, in scope_wrapper\n    return func(*args, **new_kwargs)\n  File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/helpers/conv.py\", line 201, in conv\n    group, transform_inputs, **kwargs)\n  File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/helpers/conv.py\", line 154, in _ConvBase\n    **kwargs)\n  File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/core.py\", line 2047, in \n    op_type, *args, **kwargs)\n  File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/core.py\", line 2024, in _CreateAndAddToSelf\n    op = CreateOperator(op_type, inputs, outputs, **kwargs)\n"
Original python traceback for operator 6 in network `mask_net` in exception above (most recent call last):
Traceback (most recent call last):
  File "tools/infer_simple.py", line 147, in 
  ...

原因: 是训练的时候用的是tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml这个配置文件,而预测的时候用的是e2e_mask_rcnn_R-50-FPN_2x.yaml这个配置文件,从文件名可以看出,一个是faster-rcnn的而另一个是mask-rcnn的,配置不匹配,不出错才怪.

解决办法:自己修改一个适用于mask-rcnn的训练yaml文件,详见下述 (3)
(3)我采取的方式(单GPU训练):

a. 在$DETECTRON/detectron/configs/getting_started目录下copy一份tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml文件,修改文件名为tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml,在此基础上添加mask部分,使之能够适用于训练mask-rcnn模型.

b.修改部分

1)修改基本参数:修改NUM_CLASSES、WEIGHTS和DATASETS使之适合自己的实际情况

2)添加mask部分:

tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml的MODEL部分为:

MODEL:
  TYPE: generalized_rcnn
  CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
  NUM_CLASSES: 2
  FASTER_RCNN: True

修改为:

MODEL:
  TYPE: mask_rcnn修改部分
  CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
  NUM_CLASSES: 2
  FASTER_RCNN: True
  MASK_ON: True       #添加的部分

然后再复制$DETECTRON/detectron/configs/test_time_aug/e2e_mask_rcnn_R-50-FPN_2x.yaml文件下的MRCNN部分,如下:

MRCNN:
  ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
  RESOLUTION: 28  # (output mask resolution) default 14
  ROI_XFORM_METHOD: RoIAlign
  ROI_XFORM_RESOLUTION: 14  # default 7
  ROI_XFORM_SAMPLING_RATIO: 2  # default 0
  DILATION: 1  # default 2
  CONV_INIT: MSRAFill  # default GaussianFill

再把上面的MRCNN部分复制到$DETECTRON/detectron/configs/getting_started/tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml中的FAST_RCNN模块后面,修改后完整的配置文件如下:

MODEL:
  TYPE: mask_rcnn
  CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
  NUM_CLASSES: 2
  FASTER_RCNN: True
  MASK_ON: True
NUM_GPUS: 1
SOLVER:
  WEIGHT_DECAY: 0.0001
  LR_POLICY: steps_with_decay
  BASE_LR: 0.0025
  GAMMA: 0.1
  MAX_ITER: 1000   #60000
  STEPS: [0, 500, 999] #[0, 30000, 40000]
  # Equivalent schedules with...
  # 1 GPU:
  #   BASE_LR: 0.0025
  #   MAX_ITER: 60000
  #   STEPS: [0, 30000, 40000]
  # 2 GPUs:
  #   BASE_LR: 0.005
  #   MAX_ITER: 30000
  #   STEPS: [0, 15000, 20000]
  # 4 GPUs:
  #   BASE_LR: 0.01
  #   MAX_ITER: 15000
  #   STEPS: [0, 7500, 10000]
  # 8 GPUs:
  #   BASE_LR: 0.02
  #   MAX_ITER: 7500
  #   STEPS: [0, 3750, 5000]
FPN:
  FPN_ON: True
  MULTILEVEL_ROIS: True
  MULTILEVEL_RPN: True
FAST_RCNN:
  ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
  ROI_XFORM_METHOD: RoIAlign
  ROI_XFORM_RESOLUTION: 7
  ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
  ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
  RESOLUTION: 28  # (output mask resolution) default 14
  ROI_XFORM_METHOD: RoIAlign
  ROI_XFORM_RESOLUTION: 14  # default 7
  ROI_XFORM_SAMPLING_RATIO: 2  # default 0
  DILATION: 1  # default 2
  CONV_INIT: MSRAFill  # default GaussianFill
TRAIN:
  WEIGHTS: /home/ubuntu/detectron/ImageNetPretrained/MSRA/R-50.pkl
  DATASETS: ('anthrax_train',)
  SCALES: (500,)
  MAX_SIZE: 833
  BATCH_SIZE_PER_IM: 256
  RPN_PRE_NMS_TOP_N: 2000  # Per FPN level
TEST:
  DATASETS: ('anthrax_val',)
  SCALE: 500
  MAX_SIZE: 833
  NMS: 0.5
  RPN_PRE_NMS_TOP_N: 1000  # Per FPN level
  RPN_POST_NMS_TOP_N: 1000
  FORCE_JSON_DATASET_EVAL: True   
OUTPUT_DIR: .

修改完后,重新运行python2 tools/train_net.py训练模型并用python2 tools/test_net.py进行预测,如果还出现上述错误,请尝试重启电脑.
修改后训练及预测命令:

训练:
python2 tools/train_net.py \
    --cfg configs/getting_started/tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml \
    OUTPUT_DIR  /home/output 
  
  预测:
python2 tools/test_net.py \
    --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml \
    TEST.WEIGHTS /home/output/model_final.pkl \
    NUM_GPUS 1 
   

错误记录:
1、如果在运行python2 tools/train_net.py或python2 tools/test_net.py时出现如下错误:

INFO task_evaluation.py:  75: Evaluating detections
Traceback (most recent call last):
  File "tools/train_net.py", line 281, in 
    main()
  File "tools/train_net.py", line 122, in main
    test_model(checkpoints['final'], args.multi_gpu_testing, args.opts)
  File "tools/train_net.py", line 277, in test_model
    test_net.main(multi_gpu_testing=multi_gpu_testing)
  File "/nfs/zapdos/home/data/vision3/cw234/detectron/tools/test_net.py", line 98, in main
    output_dir, ind_range=ind_range, multi_gpu_testing=multi_gpu_testing
  File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/core/test_engine.py", line 83, in run_inference
    results = parent_func(output_dir, multi_gpu=multi_gpu_testing)
  File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/core/test_engine.py", line 110, in test_net_on_dataset
    dataset, all_boxes, all_segms, all_keyps, output_dir
  File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/datasets/task_evaluation.py", line 59, in evaluate_all
    dataset, all_boxes, output_dir, use_matlab=use_matlab
  File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/datasets/task_evaluation.py", line 97, in evaluate_boxes
    'No evaluator for dataset: {}'.format(dataset.name)
NotImplementedError: No evaluator for dataset: ego_data

请尝试:
在.yaml文件中的TEST模块中添加FORCE_JSON_DATASET_EVAL: True:

TEST:
  ...
  FORCE_JSON_DATASET_EVAL: True

2、当运行python2 tools/train_net.py或python2 tools/test_net.py时出现如下错误时:

Loading and preparing results...
DONE (t=0.05s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
Traceback (most recent call last):
  File "tools/test_net.py", line 116, in 
    check_expected_results=True,
  File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 128, in run_inference
    all_results = result_getter()
  File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 108, in result_getter
    multi_gpu=multi_gpu_testing
  File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 164, in test_net_on_dataset
    dataset, all_boxes, all_segms, all_keyps, output_dir
  File "/home/ubuntu/detectron/detectron/datasets/task_evaluation.py", line 64, in evaluate_all
    results = evaluate_masks(dataset, all_boxes, all_segms, output_dir)
  File "/home/ubuntu/detectron/detectron/datasets/task_evaluation.py", line 114, in evaluate_masks
    cleanup=not_comp
  File "/home/ubuntu/detectron/detectron/datasets/json_dataset_evaluator.py", line 56, in evaluate_masks
    coco_eval = _do_segmentation_eval(json_dataset, res_file, output_dir)
  File "/home/ubuntu/detectron/detectron/datasets/json_dataset_evaluator.py", line 117, in _do_segmentation_eval
    coco_eval.evaluate()
  File "build/bdist.linux-x86_64/egg/pycocotools/cocoeval.py", line 141, in evaluate
  File "build/bdist.linux-x86_64/egg/pycocotools/cocoeval.py", line 105, in _prepare
  File "build/bdist.linux-x86_64/egg/pycocotools/cocoeval.py", line 93, in _toMask
  File "build/bdist.linux-x86_64/egg/pycocotools/coco.py", line 416, in annToRLE
  File "pycocotools/_mask.pyx", line 293, in pycocotools._mask.frPyObjects
TypeError: Argument 'bb' has incorrect type (expected numpy.ndarray, got list)

这个错误十有八九是val.json中的某张图片的bbox的标注有问题,找到标注错误的图片重新标注或者直接把标注错误的图片删了,即可解决问题.

3、运行python2 tools/visualize_results.py只输出一张图片的问题
我在运行visualize_results.py时,明明有很多张测试图片却发现只输出了一张图片,百思不得其解,打开visualize_results.py文件一看,发现是参数设置的问题:
有个参数是这样的:

    parser.add_argument(
        '--first',
        dest='first',
        help='only visualize the first k images',
        default=10000,       #之前把default设置成1,而运行时有没有传入新的--first数值,因此只输出了1张图片
        type=int
    )

你可能感兴趣的:(mask-rcnn)