Detectron2源码分析- demo-对象检测2-数据解析

输入命令:

python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input 001.jpg --output results --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl 
 

采用的图像:

Detectron2源码分析- demo-对象检测2-数据解析_第1张图片

 

输出结果:

Detectron2源码分析- demo-对象检测2-数据解析_第2张图片

 

predictor()返回的数据如下,

[32m[05/14 15:39:49 detectron2]:[0mArguments: Namespace(confidence_threshold=0.5, config_file='configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', input=['001.jpg'], opts=['MODEL.WEIGHTS', 'detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl'], output='results', video_input=None, webcam=False)
: cpu_device= cpu
[32m[05/14 15:39:51 fvcore.common.checkpoint]:[0mLoading checkpoint from detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
[32m[05/14 15:39:51 fvcore.common.file_io]:[0mURL https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl cached in /home/lappai/.torch/fvcore_cache/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
[32m[05/14 15:39:51 fvcore.common.checkpoint]:[0mReading a file from 'Detectron2 Model Zoo'
: args.input= ['001.jpg']
  __call__:
               
  
: run_on_image predictions= {'instances': Instances(num_instances=16, image_height=342, image_width=512, fields=[pred_boxes: Boxes(tensor([[8.4740e+00, 4.6892e+01, 1.4996e+02, 3.3636e+02],
        [1.2094e+02, 2.8676e+01, 2.4164e+02, 3.4125e+02],
        [3.9989e+02, 1.1977e+02, 5.0410e+02, 3.4135e+02],
        [2.3525e+02, 5.9974e+01, 3.8057e+02, 3.4017e+02],
        [3.5989e+02, 1.0638e+02, 4.3428e+02, 3.2155e+02],
        [4.1590e+02, 1.0385e+02, 4.4406e+02, 1.5214e+02],
        [2.7101e+02, 8.3826e+01, 3.0224e+02, 1.5380e+02],
        [2.8008e+02, 1.1305e+02, 3.2311e+02, 1.8048e+02],
        [3.1624e+02, 1.6404e+02, 4.0676e+02, 2.9497e+02],
        [3.0986e+02, 5.6478e+01, 3.8312e+02, 2.0319e+02],
        [1.1140e+00, 8.9818e+01, 6.5928e+01, 1.8706e+02],
        [0.0000e+00, 1.0031e+02, 5.6573e+01, 3.3716e+02],
        [1.3246e-01, 1.2312e+02, 6.7227e+01, 1.6550e+02],
        [1.3788e-02, 8.6321e+01, 2.8173e+01, 1.4170e+02],
        [4.8467e+02, 1.7300e+02, 5.1018e+02, 2.8373e+02],
        [4.0865e+02, 9.6892e+01, 4.2856e+02, 1.4300e+02]], device='cuda:0')), scores: tensor([0.9969, 0.9952, 0.9943, 0.9886, 0.9663, 0.9632, 0.8624, 0.7518, 0.6952,
        0.6793, 0.5957, 0.5795, 0.5773, 0.5474, 0.5355, 0.5209],
       device='cuda:0'), pred_classes: tensor([ 0,  0,  0,  0,  0,  0,  0,  0, 26,  0,  0,  0,  0,  0, 26,  0],
       device='cuda:0'), pred_masks: tensor([[[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        ...,

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]]], device='cuda:0')])}
[32m[05/14 15:39:51 detectron2]:[0m001.jpg: detected 16 instances in 0.12s
: args.output= results
: out_filename= results
 

根据如下的格式说明

https://detectron2.readthedocs.io/tutorials/models.html#model-output-format

Model Output Format

When in training mode, the builtin models output a dict[str->ScalarTensor] with all the losses.

When in inference mode, the builtin models output a list[dict], one dict for each image. Based on the tasks the model is doing, each dict may contain the following fields:

  • “instances”: Instances object with the following fields:

    • “pred_boxes”: Boxes object storing N boxes, one for each detected instance.

    • “scores”: Tensor, a vector of N scores.

    • “pred_classes”: Tensor, a vector of N labels in range [0, num_categories).

    • “pred_masks”: a Tensor of shape (N, H, W), masks for each detected instance.

    • “pred_keypoints”: a Tensor of shape (N, num_keypoint, 3). Each row in the last dimension is (x, y, score). Scores are larger than 0.

  • “sem_seg”: Tensor of (num_categories, H, W), the semantic segmentation prediction.

  • “proposals”: Instances object with the following fields:

    • “proposal_boxes”: Boxes object storing N boxes.

    • “objectness_logits”: a torch vector of N scores.

  • “panoptic_seg”: A tuple of (Tensor, list[dict]). The tensor has shape (H, W), where each element represent the segment id of the pixel. Each dict describes one segment id and has the following fields:

    • “id”: the segment id

    • “isthing”: whether the segment is a thing or stuff

    • “category_id”: the category id of this segment. It represents the thing class id when isthing==True, and the stuff class id otherwise.

 

我们使用的是instance图像分割方式,根据上面的说明和实际的log,我们可以看出输出数据的大致数据结构和含义如下:

Instances

num_instances=16,  //检测到的instance个数,直白一点就是检测到的对象的个数

image_height=342,  //被检测图标的尺寸

image_width=512,

fields             //数据结构的子域

       pred_boxes: Boxes(tensor[16x4],   //N个instance的box位置信息,当前检测到了16个对象

               device,//='cuda:0'   // 模型运行的设备

      scores: tensor([16x1],             //N个instance的检测概率

               device,//='cuda:0'

      pred_classes: tensor([16x1],       //N个instance的分类编号

               device,//='cuda:0'

      pred_masks: tensor([ixjxk],       //掩码信息,数据关系暂时没看出来

 

               device,//='cuda:0'

 

        keypoints   //这个在log数据里没有,命令里没有关键点检测

 

 

 

 

你可能感兴趣的:(ML)