输入命令:
python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input 001.jpg --output results --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
采用的图像:
输出结果:
从predictor()返回的数据如下,
[32m[05/14 15:39:49 detectron2]:[0mArguments: Namespace(confidence_threshold=0.5, config_file='configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', input=['001.jpg'], opts=['MODEL.WEIGHTS', 'detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl'], output='results', video_input=None, webcam=False)
: cpu_device= cpu
[32m[05/14 15:39:51 fvcore.common.checkpoint]:[0mLoading checkpoint from detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
[32m[05/14 15:39:51 fvcore.common.file_io]:[0mURL https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl cached in /home/lappai/.torch/fvcore_cache/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
[32m[05/14 15:39:51 fvcore.common.checkpoint]:[0mReading a file from 'Detectron2 Model Zoo'
: args.input= ['001.jpg']
__call__:
: run_on_image predictions= {'instances': Instances(num_instances=16, image_height=342, image_width=512, fields=[pred_boxes: Boxes(tensor([[8.4740e+00, 4.6892e+01, 1.4996e+02, 3.3636e+02],
[1.2094e+02, 2.8676e+01, 2.4164e+02, 3.4125e+02],
[3.9989e+02, 1.1977e+02, 5.0410e+02, 3.4135e+02],
[2.3525e+02, 5.9974e+01, 3.8057e+02, 3.4017e+02],
[3.5989e+02, 1.0638e+02, 4.3428e+02, 3.2155e+02],
[4.1590e+02, 1.0385e+02, 4.4406e+02, 1.5214e+02],
[2.7101e+02, 8.3826e+01, 3.0224e+02, 1.5380e+02],
[2.8008e+02, 1.1305e+02, 3.2311e+02, 1.8048e+02],
[3.1624e+02, 1.6404e+02, 4.0676e+02, 2.9497e+02],
[3.0986e+02, 5.6478e+01, 3.8312e+02, 2.0319e+02],
[1.1140e+00, 8.9818e+01, 6.5928e+01, 1.8706e+02],
[0.0000e+00, 1.0031e+02, 5.6573e+01, 3.3716e+02],
[1.3246e-01, 1.2312e+02, 6.7227e+01, 1.6550e+02],
[1.3788e-02, 8.6321e+01, 2.8173e+01, 1.4170e+02],
[4.8467e+02, 1.7300e+02, 5.1018e+02, 2.8373e+02],
[4.0865e+02, 9.6892e+01, 4.2856e+02, 1.4300e+02]], device='cuda:0')), scores: tensor([0.9969, 0.9952, 0.9943, 0.9886, 0.9663, 0.9632, 0.8624, 0.7518, 0.6952,
0.6793, 0.5957, 0.5795, 0.5773, 0.5474, 0.5355, 0.5209],
device='cuda:0'), pred_classes: tensor([ 0, 0, 0, 0, 0, 0, 0, 0, 26, 0, 0, 0, 0, 0, 26, 0],
device='cuda:0'), pred_masks: tensor([[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],
[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],
[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],
...,
[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],
[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],
[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]]], device='cuda:0')])}
[32m[05/14 15:39:51 detectron2]:[0m001.jpg: detected 16 instances in 0.12s
: args.output= results
: out_filename= results
根据如下的格式说明
https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
When in training mode, the builtin models output a dict[str->ScalarTensor]
with all the losses.
When in inference mode, the builtin models output a list[dict]
, one dict for each image. Based on the tasks the model is doing, each dict may contain the following fields:
“instances”: Instances object with the following fields:
“pred_boxes”: Boxes object storing N boxes, one for each detected instance.
“scores”: Tensor
, a vector of N scores.
“pred_classes”: Tensor
, a vector of N labels in range [0, num_categories).
“pred_masks”: a Tensor
of shape (N, H, W), masks for each detected instance.
“pred_keypoints”: a Tensor
of shape (N, num_keypoint, 3). Each row in the last dimension is (x, y, score). Scores are larger than 0.
“sem_seg”: Tensor
of (num_categories, H, W), the semantic segmentation prediction.
“proposals”: Instances object with the following fields:
“proposal_boxes”: Boxes object storing N boxes.
“objectness_logits”: a torch vector of N scores.
“panoptic_seg”: A tuple of (Tensor, list[dict])
. The tensor has shape (H, W), where each element represent the segment id of the pixel. Each dict describes one segment id and has the following fields:
“id”: the segment id
“isthing”: whether the segment is a thing or stuff
“category_id”: the category id of this segment. It represents the thing class id when isthing==True
, and the stuff class id otherwise.
我们使用的是instance图像分割方式,根据上面的说明和实际的log,我们可以看出输出数据的大致数据结构和含义如下:
Instances
num_instances=16, //检测到的instance个数,直白一点就是检测到的对象的个数
image_height=342, //被检测图标的尺寸
image_width=512,
fields //数据结构的子域
pred_boxes: Boxes(tensor[16x4], //N个instance的box位置信息,当前检测到了16个对象
device,//='cuda:0' // 模型运行的设备
scores: tensor([16x1], //N个instance的检测概率
device,//='cuda:0'
pred_classes: tensor([16x1], //N个instance的分类编号
device,//='cuda:0'
pred_masks: tensor([ixjxk], //掩码信息,数据关系暂时没看出来
device,//='cuda:0'
keypoints //这个在log数据里没有,命令里没有关键点检测