实例分割vs语义分割:语义分割只有类别;实例分割是每一个目标。
MS COCO数据集介绍以及pycocotools简单使用_太阳花的小绿豆的博客-CSDN博客
推荐文章: https://www.zhihu.com/question/53405779/answer/399478988
直接端到端的过程;不像Fast RCNN还需要ss部分;RPN取代ss,称为端到端。
框架越来越简洁,效果也越来越好。
(base) zhr@zhr-Lenovo-Legion-Y7000P2020H:/home/helen/code/deep-learning-for-image-processing$ conda activate fastai2
(fastai2) zhr@zhr-Lenovo-Legion-Y7000P2020H:/home/helen/code/deep-learning-for-image-processing$ /home/zhr/miniconda3/envs/fastai2/bin/python /home/helen/code/deep-learning-for-image-processing/pytorch_object_detection/faster_rcnn/train_res50_fpn.py
Namespace(device='cuda:0', data_path='/home/helen/dataset/VOCtrainval_06-Nov-2007', num_classes=20, output_dir='./save_weights', resume='', start_epoch=0, epochs=15, lr=0.01, momentum=0.9, weight_decay=0.0001, batch_size=2, aspect_ratio_group_factor=3, amp=False)
Using cuda device training.
Using [0, 0.5, 0.6299605249474366, 0.7937005259840997, 1.0, 1.2599210498948732, 1.5874010519681994, 2.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [ 4 8 400 57 103 1845 58 26]
Using 2 dataloader workers
_IncompatibleKeys(missing_keys=[], unexpected_keys=['fc.weight', 'fc.bias'])
/home/zhr/miniconda3/envs/fastai2/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Epoch: [0] [ 0/1250] eta: 0:16:50.245979 lr: 0.000020 loss: 3.9301 (3.9301) loss_classifier: 3.1865 (3.1865) loss_box_reg: 0.1150 (0.1150) loss_objectness: 0.5403 (0.5403) loss_rpn_box_reg: 0.0882 (0.0882) time: 0.8082 data: 0.1655 max mem: 3213
Epoch: [0] [ 50/1250] eta: 0:10:51.982285 lr: 0.000519 loss: 0.9857 (1.8274) loss_classifier: 0.5287 (1.2986) loss_box_reg: 0.4109 (0.2776) loss_objectness: 0.0314 (0.2208) loss_rpn_box_reg: 0.0147 (0.0305) time: 0.5342 data: 0.0001 max mem: 3564
Epoch: [0] [ 100/1250] eta: 0:10:16.309807 lr: 0.001019 loss: 0.5225 (1.2822) loss_classifier: 0.3056 (0.8365) loss_box_reg: 0.1879 (0.2742) loss_objectness: 0.0218 (0.1443) loss_rpn_box_reg: 0.0073 (0.0273) time: 0.5390 data: 0.0001 max mem: 3564
Epoch: [0] [ 150/1250] eta: 0:09:44.290952 lr: 0.001518 loss: 0.4234 (1.0831) loss_classifier: 0.2392 (0.6689) loss_box_reg: 0.1529 (0.2802) loss_objectness: 0.0269 (0.1081) loss_rpn_box_reg: 0.0044 (0.0259) time: 0.5095 data: 0.0001 max mem: 3564
Epoch: [0] [ 200/1250] eta: 0:09:17.379252 lr: 0.002018 loss: 0.3337 (0.9938) loss_classifier: 0.1420 (0.5824) loss_box_reg: 0.1097 (0.2923) loss_objectness: 0.0411 (0.0925) loss_rpn_box_reg: 0.0409 (0.0267) time: 0.5314 data: 0.0001 max mem: 3564
Epoch: [0] [ 250/1250] eta: 0:08:55.809067 lr: 0.002517 loss: 0.6095 (0.9255) loss_classifier: 0.2653 (0.5296) loss_box_reg: 0.3108 (0.2890) loss_objectness: 0.0097 (0.0808) loss_rpn_box_reg: 0.0237 (0.0260) time: 0.5611 data: 0.0001 max mem: 3564
Epoch: [0] [ 300/1250] eta: 0:08:27.952396 lr: 0.003017 loss: 0.6465 (0.8819) loss_classifier: 0.2775 (0.4919) loss_box_reg: 0.2594 (0.2901) loss_objectness: 0.0776 (0.0735) loss_rpn_box_reg: 0.0319 (0.0264) time: 0.5391 data: 0.0001 max mem: 3564
Epoch: [0] [ 350/1250] eta: 0:08:01.532866 lr: 0.003516 loss: 0.5011 (0.8319) loss_classifier: 0.1880 (0.4560) loss_box_reg: 0.2810 (0.2828) loss_objectness: 0.0193 (0.0667) loss_rpn_box_reg: 0.0129 (0.0264) time: 0.5380 data: 0.0001 max mem: 3564
Epoch: [0] [ 400/1250] eta: 0:07:32.227619 lr: 0.004016 loss: 0.9177 (0.7983) loss_classifier: 0.4662 (0.4321) loss_box_reg: 0.4026 (0.2783) loss_objectness: 0.0273 (0.0621) loss_rpn_box_reg: 0.0215 (0.0259) time: 0.5185 data: 0.0001 max mem: 3564
Epoch: [0] [ 450/1250] eta: 0:07:04.867693 lr: 0.004515 loss: 0.5924 (0.7654) loss_classifier: 0.1544 (0.4091) loss_box_reg: 0.3071 (0.2733) loss_objectness: 0.0731 (0.0576) loss_rpn_box_reg: 0.0578 (0.0254) time: 0.5164 data: 0.0001 max mem: 3564
Epoch: [0] [ 500/1250] eta: 0:06:38.074630 lr: 0.005015 loss: 0.3341 (0.7497) loss_classifier: 0.1391 (0.3970) loss_box_reg: 0.0885 (0.2717) loss_objectness: 0.0195 (0.0556) loss_rpn_box_reg: 0.0871 (0.0255) time: 0.5333 data: 0.0001 max mem: 3564
Epoch: [0] [ 550/1250] eta: 0:06:10.807065 lr: 0.005514 loss: 0.1902 (0.7351) loss_classifier: 0.0802 (0.3851) loss_box_reg: 0.0925 (0.2711) loss_objectness: 0.0157 (0.0533) loss_rpn_box_reg: 0.0019 (0.0256) time: 0.5072 data: 0.0001 max mem: 3564
Epoch: [0] [ 600/1250] eta: 0:05:44.049398 lr: 0.006014 loss: 0.3604 (0.7184) loss_classifier: 0.1228 (0.3740) loss_box_reg: 0.1714 (0.2674) loss_objectness: 0.0401 (0.0516) loss_rpn_box_reg: 0.0261 (0.0254) time: 0.5430 data: 0.0001 max mem: 3564
Epoch: [0] [ 650/1250] eta: 0:05:17.771165 lr: 0.006513 loss: 0.4614 (0.6996) loss_classifier: 0.2250 (0.3619) loss_box_reg: 0.1799 (0.2622) loss_objectness: 0.0104 (0.0501) loss_rpn_box_reg: 0.0460 (0.0254) time: 0.5383 data: 0.0001 max mem: 3564
Epoch: [0] [ 700/1250] eta: 0:04:51.137477 lr: 0.007013 loss: 0.8279 (0.6829) loss_classifier: 0.3304 (0.3515) loss_box_reg: 0.4181 (0.2582) loss_objectness: 0.0566 (0.0482) loss_rpn_box_reg: 0.0228 (0.0249) time: 0.5235 data: 0.0001 max mem: 3564
Epoch: [0] [ 750/1250] eta: 0:04:24.435647 lr: 0.007512 loss: 0.8482 (0.6704) loss_classifier: 0.3065 (0.3429) loss_box_reg: 0.5166 (0.2559) loss_objectness: 0.0114 (0.0467) loss_rpn_box_reg: 0.0138 (0.0249) time: 0.5250 data: 0.0001 max mem: 3564
Epoch: [0] [ 800/1250] eta: 0:03:58.256285 lr: 0.008012 loss: 0.3776 (0.6573) loss_classifier: 0.1871 (0.3344) loss_box_reg: 0.1464 (0.2527) loss_objectness: 0.0362 (0.0453) loss_rpn_box_reg: 0.0078 (0.0248) time: 0.5322 data: 0.0001 max mem: 3564
Epoch: [0] [ 850/1250] eta: 0:03:31.801975 lr: 0.008511 loss: 0.5743 (0.6512) loss_classifier: 0.1657 (0.3305) loss_box_reg: 0.3795 (0.2515) loss_objectness: 0.0139 (0.0444) loss_rpn_box_reg: 0.0152 (0.0248) time: 0.5123 data: 0.0001 max mem: 3564
Epoch: [0] [ 900/1250] eta: 0:03:05.681250 lr: 0.009011 loss: 0.3209 (0.6391) loss_classifier: 0.1504 (0.3230) loss_box_reg: 0.1290 (0.2475) loss_objectness: 0.0192 (0.0437) loss_rpn_box_reg: 0.0224 (0.0249) time: 0.5535 data: 0.0001 max mem: 3564
Epoch: [0] [ 950/1250] eta: 0:02:39.160406 lr: 0.009510 loss: 0.3390 (0.6315) loss_classifier: 0.1408 (0.3179) loss_box_reg: 0.0968 (0.2446) loss_objectness: 0.0551 (0.0437) loss_rpn_box_reg: 0.0463 (0.0252) time: 0.5313 data: 0.0001 max mem: 3564
Epoch: [0] [1000/1250] eta: 0:02:12.473630 lr: 0.010000 loss: 0.6111 (0.6244) loss_classifier: 0.2668 (0.3134) loss_box_reg: 0.2741 (0.2423) loss_objectness: 0.0424 (0.0434) loss_rpn_box_reg: 0.0279 (0.0253) time: 0.5298 data: 0.0001 max mem: 3564
Epoch: [0] [1050/1250] eta: 0:01:45.931352 lr: 0.010000 loss: 0.1799 (0.6206) loss_classifier: 0.0963 (0.3109) loss_box_reg: 0.0665 (0.2412) loss_objectness: 0.0159 (0.0431) loss_rpn_box_reg: 0.0012 (0.0254) time: 0.5216 data: 0.0001 max mem: 3564
Epoch: [0] [1100/1250] eta: 0:01:19.383105 lr: 0.010000 loss: 0.3410 (0.6164) loss_classifier: 0.1912 (0.3075) loss_box_reg: 0.1053 (0.2405) loss_objectness: 0.0146 (0.0428) loss_rpn_box_reg: 0.0299 (0.0257) time: 0.5140 data: 0.0001 max mem: 3564
Epoch: [0] [1150/1250] eta: 0:00:52.859734 lr: 0.010000 loss: 0.3323 (0.6092) loss_classifier: 0.1147 (0.3034) loss_box_reg: 0.1117 (0.2384) loss_objectness: 0.0517 (0.0419) loss_rpn_box_reg: 0.0541 (0.0255) time: 0.5126 data: 0.0001 max mem: 3564
Epoch: [0] [1200/1250] eta: 0:00:26.451852 lr: 0.010000 loss: 0.8478 (0.6050) loss_classifier: 0.3946 (0.3007) loss_box_reg: 0.3991 (0.2376) loss_objectness: 0.0318 (0.0414) loss_rpn_box_reg: 0.0224 (0.0254) time: 0.5384 data: 0.0001 max mem: 3564
Epoch: [0] [1249/1250] eta: 0:00:00.529085 lr: 0.010000 loss: 0.5046 (0.6003) loss_classifier: 0.2864 (0.2969) loss_box_reg: 0.1911 (0.2361) loss_objectness: 0.0120 (0.0417) loss_rpn_box_reg: 0.0152 (0.0256) time: 0.5260 data: 0.0001 max mem: 3564
Epoch: [0] Total time: 0:11:01 (0.5292 s / it)
creating index...
index created!
Test: [ 0/2510] eta: 0:09:30.997376 model_time: 0.1030 (0.1030) evaluator_time: 0.0103 (0.0103) time: 0.2275 data: 0.1136 max mem: 3564
Test: [ 100/2510] eta: 0:04:19.837475 model_time: 0.1070 (0.1019) evaluator_time: 0.0035 (0.0044) time: 0.1081 data: 0.0001 max mem: 3564
Test: [ 200/2510] eta: 0:04:08.381130 model_time: 0.0998 (0.1020) evaluator_time: 0.0105 (0.0045) time: 0.1107 data: 0.0001 max mem: 3564
Test: [ 300/2510] eta: 0:03:56.536819 model_time: 0.0983 (0.1018) evaluator_time: 0.0041 (0.0044) time: 0.1066 data: 0.0001 max mem: 3564
Test: [ 400/2510] eta: 0:03:46.088226 model_time: 0.0989 (0.1019) evaluator_time: 0.0042 (0.0046) time: 0.1066 data: 0.0001 max mem: 3564
Test: [ 500/2510] eta: 0:03:35.095542 model_time: 0.1007 (0.1018) evaluator_time: 0.0039 (0.0046) time: 0.1068 data: 0.0001 max mem: 3564
Test: [ 600/2510] eta: 0:03:24.057438 model_time: 0.0991 (0.1017) evaluator_time: 0.0059 (0.0045) time: 0.1056 data: 0.0001 max mem: 3564
Test: [ 700/2510] eta: 0:03:13.869030 model_time: 0.1081 (0.1021) evaluator_time: 0.0024 (0.0045) time: 0.1076 data: 0.0001 max mem: 3564
Test: [ 800/2510] eta: 0:03:03.341802 model_time: 0.0984 (0.1022) evaluator_time: 0.0061 (0.0045) time: 0.1050 data: 0.0001 max mem: 3564
Test: [ 900/2510] eta: 0:02:52.732500 model_time: 0.0994 (0.1022) evaluator_time: 0.0057 (0.0045) time: 0.1091 data: 0.0001 max mem: 3564
Test: [1000/2510] eta: 0:02:41.909927 model_time: 0.0996 (0.1021) evaluator_time: 0.0051 (0.0046) time: 0.1084 data: 0.0001 max mem: 3564
Test: [1100/2510] eta: 0:02:31.036624 model_time: 0.0991 (0.1020) evaluator_time: 0.0040 (0.0046) time: 0.1080 data: 0.0001 max mem: 3564
Test: [1200/2510] eta: 0:02:20.235565 model_time: 0.0989 (0.1020) evaluator_time: 0.0017 (0.0046) time: 0.1063 data: 0.0001 max mem: 3564
Test: [1300/2510] eta: 0:02:09.559061 model_time: 0.0985 (0.1020) evaluator_time: 0.0032 (0.0046) time: 0.1059 data: 0.0001 max mem: 3564
Test: [1400/2510] eta: 0:01:58.835945 model_time: 0.1080 (0.1020) evaluator_time: 0.0085 (0.0045) time: 0.1059 data: 0.0001 max mem: 3564
Test: [1500/2510] eta: 0:01:48.172328 model_time: 0.0986 (0.1020) evaluator_time: 0.0053 (0.0046) time: 0.1067 data: 0.0001 max mem: 3564
Test: [1600/2510] eta: 0:01:37.436264 model_time: 0.1076 (0.1020) evaluator_time: 0.0040 (0.0046) time: 0.1088 data: 0.0001 max mem: 3564
Test: [1700/2510] eta: 0:01:26.723859 model_time: 0.0989 (0.1020) evaluator_time: 0.0039 (0.0046) time: 0.1053 data: 0.0001 max mem: 3564
Test: [1800/2510] eta: 0:01:16.011011 model_time: 0.0987 (0.1020) evaluator_time: 0.0036 (0.0046) time: 0.1078 data: 0.0001 max mem: 3564
Test: [1900/2510] eta: 0:01:05.321071 model_time: 0.0878 (0.1021) evaluator_time: 0.0170 (0.0045) time: 0.1078 data: 0.0001 max mem: 3564
Test: [2000/2510] eta: 0:00:54.577022 model_time: 0.1124 (0.1020) evaluator_time: 0.0149 (0.0045) time: 0.1038 data: 0.0001 max mem: 3564
Test: [2100/2510] eta: 0:00:43.881791 model_time: 0.0977 (0.1020) evaluator_time: 0.0047 (0.0045) time: 0.1071 data: 0.0001 max mem: 3564
Test: [2200/2510] eta: 0:00:33.196065 model_time: 0.0995 (0.1021) evaluator_time: 0.0050 (0.0046) time: 0.1082 data: 0.0001 max mem: 3564
Test: [2300/2510] eta: 0:00:22.478021 model_time: 0.0994 (0.1020) evaluator_time: 0.0061 (0.0046) time: 0.1059 data: 0.0001 max mem: 3564
Test: [2400/2510] eta: 0:00:11.767563 model_time: 0.0984 (0.1020) evaluator_time: 0.0062 (0.0046) time: 0.1059 data: 0.0001 max mem: 3564
Test: [2500/2510] eta: 0:00:01.069602 model_time: 0.1081 (0.1019) evaluator_time: 0.0037 (0.0046) time: 0.1075 data: 0.0001 max mem: 3564
Test: [2509/2510] eta: 0:00:00.106963 model_time: 0.1083 (0.1019) evaluator_time: 0.0094 (0.0046) time: 0.1069 data: 0.0001 max mem: 3564
Test: Total time: 0:04:28 (0.1070 s / it)
Averaged stats: model_time: 0.1083 (0.1019) evaluator_time: 0.0094 (0.0046)
Accumulating evaluation results...
DONE (t=2.25s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.237
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.539
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.158
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.089
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.192
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.255
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.274
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.425
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.436
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.370
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.457
现在是往里面填值进去:10是IoU的10个阈值,100是recall的10个值,20是类别,4是面积,3是max个数;precision和recall都是这么大的一个矩阵
在coco_eval的evaluateImg中,有详细计算的逻辑:match和ignore的地方,然后根据这个去做统计。