1、源代码及预训练模型准备
(1)官方源码下载地址:
https://github.com/facebookresearch/Detectron
(2)预训练模型下载地址(R-50):
https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
2、运行环境搭建
略,详见源码的md说明文件。。。
3、数据集准备
(1)数据集存放位置
$DETECTRON/detectron/datasets/data
把数据集放置在如上目录下.
(2)数据集目录结构(当然这个目录结构可以自由设定),比如我的:
hainu
|_ train
| |_ .jpg
| |_ ...
| |_ .jpg
|_ val
| |_ .jpg
| |_ ...
| |_ .jpg
|_ annotations
|_ instances_train.json
|_ instances_val.json
(3)注册自己的数据集
在$DETECTRON//detectron/datasets/dataset_catalog.py文件中注册自己的数据集
打开dataset_catalog.py可看到coco_2014数据集注册方式如下:
'coco_2014_train': {
_IM_DIR:
_DATA_DIR + '/coco/coco_train2014',
_ANN_FN:
_DATA_DIR + '/coco/annotations/instances_train2014.json'
},
'coco_2014_val': {
_IM_DIR:
_DATA_DIR + '/coco/coco_val2014',
_ANN_FN:
_DATA_DIR + '/coco/annotations/instances_val2014.json'
},
照猫画虎注册自己的数据集:
'anthrax_train': {
_IM_DIR:
_DATA_DIR + '/hainu/train',
_ANN_FN:
_DATA_DIR + '/hainu/annotations/instance_train.json'
},
'anthrax_val': {
_IM_DIR:
_DATA_DIR + '/hainu/val',
_ANN_FN:
_DATA_DIR + '/hainu/annotations/instance_val.json'
}
至此,数据集注册完成.
4、开始用自己的数据集训练mask-rcnn
注:训练及测试网络命令:
#训练
python2 tools/train_net.py \
--cfg configs/getting_started/tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml \
OUTPUT_DIR /home/output
#预测有test_net.py和infer_simple.py两种,其中test_net.py有输出详细的评价指标而infer_simple.py没有。即预测时可以:
#预测
python2 tools/test_net.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml \
TEST.WEIGHTS /home/output/model_final.pkl \
NUM_GPUS 1
#也可以这样预测,此方式把要测试的图片放到$DETECTRON/demo目录下,下述wts参数请自行修改
python tools/infer_simple.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir /tmp/detectron-visualizations \
--image-ext jpg \
--wts https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
demo
(1)训练前修改$DETECTRON/detectron/configs/getting_started目录下的yaml文件
(2)2019年2月份下载下来的源码中,发现在$DETECTRON/detectron/configs/getting_started目录下的yaml文件都只是适用于训练faster-rcnn的,因此要训练mask-rcnn需要把mask部分添加进去(下面有说明).
###########################出错记录#############################
训练:
python2 tools/train_net.py \
--cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml \
OUTPUT_DIR /home/output
预测:
python2 tools/test_net.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml \
TEST.WEIGHTS /home/output/model_final.pkl \
NUM_GPUS 1
然后在运行python2 tools/test_net.py就会出现如下错误:
INFO net.py: 57: Loading weights from: t/home/output/model_final.pkl
INFO net.py: 86: fcn1_w not found
INFO net.py: 86: fcn1_b not found
INFO net.py: 86: fcn2_w not found
INFO net.py: 86: fcn2_b not found
INFO net.py: 86: fcn3_w not found
INFO net.py: 86: fcn3_b not found
INFO net.py: 86: fcn4_w not found
INFO net.py: 86: fcn4_b not found
INFO net.py: 86: conv5_mask_w not found
INFO net.py: 86: conv5_mask_b not found
INFO net.py: 86: mask_fcn_logits_w not found
INFO net.py: 86: mask_fcn_logits_b not found
...
Offending Blob name: gpu_0/_[mask]_fcn1_w.
Error from operator:
input: "gpu_0/_[mask]_roi_feat" input: "gpu_0/_[mask]_fcn1_w" input: "gpu_0/_[mask]_fcn1_b" output: "gpu_0/_[mask]_fcn1" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN" debug_info: " File \"tools/infer_simple.py\", line 147, in \n main(args)\n File \"tools/infer_simple.py\", line 99, in main\n model = infer_engine.initialize_model_from_cfg()\n File \"/home/siyu/Detectron/lib/core/test_engine.py\", line 266, in initialize_model_from_cfg\n model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)\n File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 124, in create\n return get_func(model_type_func)(model)\n File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 89, in generalized_rcnn\n freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY\n File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 229, in build_generic_detection_model\n optim.build_data_parallel_model(model, _single_gpu_build_func)\n File \"/home/siyu/Detectron/lib/modeling/optimizer.py\", line 54, in build_data_parallel_model\n single_gpu_build_func(model)\n File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 210, in _single_gpu_build_func\n spatial_scale_conv\n File \"/home/siyu/Detectron/lib/modeling/model_builder.py\", line 272, in _add_roi_mask_head\n model, blob_in, dim_in, spatial_scale_in\n File \"/home/siyu/Detectron/lib/modeling/mask_rcnn_heads.py\", line 113, in mask_rcnn_fcn_head_v1up4convs\n model, blob_in, dim_in, spatial_scale, 4\n File \"/home/siyu/Detectron/lib/modeling/mask_rcnn_heads.py\", line 151, in mask_rcnn_fcn_head_v1upXconvs\n bias_init=(\'ConstantFill\', {\'value\': 0.})\n File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/cnn.py\", line 112, in Conv\n **kwargs\n File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/brew.py\", line 121, in scope_wrapper\n return func(*args, **new_kwargs)\n File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/helpers/conv.py\", line 201, in conv\n group, transform_inputs, **kwargs)\n File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/helpers/conv.py\", line 154, in _ConvBase\n **kwargs)\n File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/core.py\", line 2047, in \n op_type, *args, **kwargs)\n File \"/home/siyu/anaconda3/envs/python27/lib/python2.7/site-packages/caffe2/python/core.py\", line 2024, in _CreateAndAddToSelf\n op = CreateOperator(op_type, inputs, outputs, **kwargs)\n"
Original python traceback for operator 6 in network `mask_net` in exception above (most recent call last):
Traceback (most recent call last):
File "tools/infer_simple.py", line 147, in
...
原因: 是训练的时候用的是tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml这个配置文件,而预测的时候用的是e2e_mask_rcnn_R-50-FPN_2x.yaml这个配置文件,从文件名可以看出,一个是faster-rcnn的而另一个是mask-rcnn的,配置不匹配,不出错才怪.
解决办法:自己修改一个适用于mask-rcnn的训练yaml文件,详见下述 (3)
(3)我采取的方式(单GPU训练):
a. 在$DETECTRON/detectron/configs/getting_started目录下copy一份tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml文件,修改文件名为tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml,在此基础上添加mask部分,使之能够适用于训练mask-rcnn模型.
b.修改部分
1)修改基本参数:修改NUM_CLASSES、WEIGHTS和DATASETS使之适合自己的实际情况
2)添加mask部分:
tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml的MODEL部分为:
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
修改为:
MODEL:
TYPE: mask_rcnn修改部分
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
MASK_ON: True #添加的部分
然后再复制$DETECTRON/detectron/configs/test_time_aug/e2e_mask_rcnn_R-50-FPN_2x.yaml文件下的MRCNN部分,如下:
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
再把上面的MRCNN部分复制到$DETECTRON/detectron/configs/getting_started/tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml中的FAST_RCNN模块后面,修改后完整的配置文件如下:
MODEL:
TYPE: mask_rcnn
CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
NUM_CLASSES: 2
FASTER_RCNN: True
MASK_ON: True
NUM_GPUS: 1
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.0025
GAMMA: 0.1
MAX_ITER: 1000 #60000
STEPS: [0, 500, 999] #[0, 30000, 40000]
# Equivalent schedules with...
# 1 GPU:
# BASE_LR: 0.0025
# MAX_ITER: 60000
# STEPS: [0, 30000, 40000]
# 2 GPUs:
# BASE_LR: 0.005
# MAX_ITER: 30000
# STEPS: [0, 15000, 20000]
# 4 GPUs:
# BASE_LR: 0.01
# MAX_ITER: 15000
# STEPS: [0, 7500, 10000]
# 8 GPUs:
# BASE_LR: 0.02
# MAX_ITER: 7500
# STEPS: [0, 3750, 5000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
MRCNN:
ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
RESOLUTION: 28 # (output mask resolution) default 14
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 14 # default 7
ROI_XFORM_SAMPLING_RATIO: 2 # default 0
DILATION: 1 # default 2
CONV_INIT: MSRAFill # default GaussianFill
TRAIN:
WEIGHTS: /home/ubuntu/detectron/ImageNetPretrained/MSRA/R-50.pkl
DATASETS: ('anthrax_train',)
SCALES: (500,)
MAX_SIZE: 833
BATCH_SIZE_PER_IM: 256
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('anthrax_val',)
SCALE: 500
MAX_SIZE: 833
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
FORCE_JSON_DATASET_EVAL: True
OUTPUT_DIR: .
修改完后,重新运行python2 tools/train_net.py训练模型并用python2 tools/test_net.py进行预测,如果还出现上述错误,请尝试重启电脑.
修改后训练及预测命令:
训练:
python2 tools/train_net.py \
--cfg configs/getting_started/tutorial_1gpu_e2e_mask_rcnn_R-50-FPN.yaml \
OUTPUT_DIR /home/output
预测:
python2 tools/test_net.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml \
TEST.WEIGHTS /home/output/model_final.pkl \
NUM_GPUS 1
错误记录:
1、如果在运行python2 tools/train_net.py或python2 tools/test_net.py时出现如下错误:
INFO task_evaluation.py: 75: Evaluating detections
Traceback (most recent call last):
File "tools/train_net.py", line 281, in
main()
File "tools/train_net.py", line 122, in main
test_model(checkpoints['final'], args.multi_gpu_testing, args.opts)
File "tools/train_net.py", line 277, in test_model
test_net.main(multi_gpu_testing=multi_gpu_testing)
File "/nfs/zapdos/home/data/vision3/cw234/detectron/tools/test_net.py", line 98, in main
output_dir, ind_range=ind_range, multi_gpu_testing=multi_gpu_testing
File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/core/test_engine.py", line 83, in run_inference
results = parent_func(output_dir, multi_gpu=multi_gpu_testing)
File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/core/test_engine.py", line 110, in test_net_on_dataset
dataset, all_boxes, all_segms, all_keyps, output_dir
File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/datasets/task_evaluation.py", line 59, in evaluate_all
dataset, all_boxes, output_dir, use_matlab=use_matlab
File "/nfs/zapdos/home/data/vision3/cw234/detectron/lib/datasets/task_evaluation.py", line 97, in evaluate_boxes
'No evaluator for dataset: {}'.format(dataset.name)
NotImplementedError: No evaluator for dataset: ego_data
请尝试:
在.yaml文件中的TEST模块中添加FORCE_JSON_DATASET_EVAL: True:
TEST:
...
FORCE_JSON_DATASET_EVAL: True
2、当运行python2 tools/train_net.py或python2 tools/test_net.py时出现如下错误时:
Loading and preparing results...
DONE (t=0.05s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
Traceback (most recent call last):
File "tools/test_net.py", line 116, in
check_expected_results=True,
File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 128, in run_inference
all_results = result_getter()
File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 108, in result_getter
multi_gpu=multi_gpu_testing
File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 164, in test_net_on_dataset
dataset, all_boxes, all_segms, all_keyps, output_dir
File "/home/ubuntu/detectron/detectron/datasets/task_evaluation.py", line 64, in evaluate_all
results = evaluate_masks(dataset, all_boxes, all_segms, output_dir)
File "/home/ubuntu/detectron/detectron/datasets/task_evaluation.py", line 114, in evaluate_masks
cleanup=not_comp
File "/home/ubuntu/detectron/detectron/datasets/json_dataset_evaluator.py", line 56, in evaluate_masks
coco_eval = _do_segmentation_eval(json_dataset, res_file, output_dir)
File "/home/ubuntu/detectron/detectron/datasets/json_dataset_evaluator.py", line 117, in _do_segmentation_eval
coco_eval.evaluate()
File "build/bdist.linux-x86_64/egg/pycocotools/cocoeval.py", line 141, in evaluate
File "build/bdist.linux-x86_64/egg/pycocotools/cocoeval.py", line 105, in _prepare
File "build/bdist.linux-x86_64/egg/pycocotools/cocoeval.py", line 93, in _toMask
File "build/bdist.linux-x86_64/egg/pycocotools/coco.py", line 416, in annToRLE
File "pycocotools/_mask.pyx", line 293, in pycocotools._mask.frPyObjects
TypeError: Argument 'bb' has incorrect type (expected numpy.ndarray, got list)
这个错误十有八九是val.json中的某张图片的bbox的标注有问题,找到标注错误的图片重新标注或者直接把标注错误的图片删了,即可解决问题.
3、运行python2 tools/visualize_results.py只输出一张图片的问题
我在运行visualize_results.py时,明明有很多张测试图片却发现只输出了一张图片,百思不得其解,打开visualize_results.py文件一看,发现是参数设置的问题:
有个参数是这样的:
parser.add_argument(
'--first',
dest='first',
help='only visualize the first k images',
default=10000, #之前把default设置成1,而运行时有没有传入新的--first数值,因此只输出了1张图片
type=int
)