作者|facebookresearch
编译|Flin
来源|Github

基准测试

在这里,我们以一些其他流行的开源Mask R-CNN实现为基准,对Detectron2中Mask R-CNN的训练速度进行了基准测试。

设置

硬件：8个带有NVLink的NVIDIA V100。
软件: Python 3.7, CUDA 10.0, cuDNN 7.6.4, PyTorch 1.3.0 (链接(https://download.pytorch.org/...
TensorFlow 1.15.0rc2, Keras 2.2.5, MxNet 1.6.0b20190820.
模型：端到端R-50-FPN Mask-RCNN模型,使用与Detectron基线配置(https://github.com/facebookre... 。
指标：我们使用100-500次迭代中的平均吞吐量来跳过GPU预热时间。请注意,对于R-CNN样式的模型,模型的吞吐量通常会在训练期间发生变化,因为它取决于模型的预测。因此,该指标不能直接与model zoo中的"训练速度"相比较,后者是整个训练过程的平均速度。

主要结果

工具	吞吐率(img / s)
Detectron2	59
maskrcnn-benchmark	51
tensorpack	50
mmdetection	41
simpledet	39
Detectron	19
matterport/Mask_RCNN	14

每个实现的链接:

Detectron2:https://github.com/facebookre...
maskrcnn-benchmark:https://github.com/facebookre...
tensorpack:https://github.com/tensorpack...
mmdetection:https://github.com/open-mmlab...
simpledet:https://github.com/TuSimple/s...
Detectron:https://github.com/facebookre...
matterport/Mask_RCNN:https://github.com/matterport...

每个实现的详细信息：

__Detectron2__:

python tools/train_net.py  --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8

__maskrcnn-benchmark__: 通过sed -i ‘s/torch.uint8/torch.bool/g’ **/*.py使用commit 0ce8f6f与使其与最新的PyTorch兼容。然后,运行
```
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
```
我们观察到的速度比其model zoo快,这可能是由于软件版本不同所致。

__tensorpack__: 在提交caafda,export TF_CUDNN_USE_AUTOTUNE=0, 然后运行

mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz

__mmdetection__: commit 4d9a5f,应用以下diff,然后运行

./tools/dist_train.sh configs/mask_rcnn_r50_fpn_1x.py 8

我们观察到的速度比其model zoo快,这可能是由于软件版本不同所致。

(diff使其使用相同的超参数-单击展开)

diff --git i/configs/mask_rcnn_r50_fpn_1x.py w/configs/mask_rcnn_r50_fpn_1x.py
index 04f6d22..ed721f2 100644
--- i/configs/mask_rcnn_r50_fpn_1x.py
+++ w/configs/mask_rcnn_r50_fpn_1x.py
@@ -1,14 +1,15 @@
# model settings
model = dict(
  type='MaskRCNN',
-    pretrained='torchvision://resnet50',
+    pretrained='open-mmlab://resnet50_caffe',
  backbone=dict(
    type='ResNet',
    depth=50,
    num_stages=4,
    out_indices=(0, 1, 2, 3),
    frozen_stages=1,
-        style='pytorch'),
+        norm_cfg=dict(type="BN", requires_grad=False),
+        style='caffe'),
  neck=dict(
    type='FPN',

@@ -115,7 +116,7 @@ test_cfg = dict(
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(

mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
mean=[123.675, 116.28, 103.53], std=[1.0, 1.0, 1.0], to_rgb=False)

train_pipeline = [

dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),




* __SimpleDet__: 在commit`9187a1`时运行

python detection_train.py --config config/mask_r50v1_fpn_1x.py


* __Detectron__: 运行

python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml

请注意,它的许多操作都在CPU上运行,因此性能受到限制。

* __matterport/Mask_RCNN__:在commit时`3deaec`,应用以下diff ,`export TF_CUDNN_USE_AUTOTUNE=0`, 然后运行

python coco.py train --dataset=/data/coco/ --model=imagenet

 请注意,此实现中的许多小细节可能与Detectron的标准不同。




 (diff使其使用相同的超参数-单击展开)

diff --git i/mrcnn/model.py w/mrcnn/model.py
index 62cb2b0..61d7779 100644
--- i/mrcnn/model.py
+++ w/mrcnn/model.py
@@ -2367,8 +2367,8 @@ class MaskRCNN():

    epochs=epochs,
    steps_per_epoch=self.config.STEPS_PER_EPOCH,
    callbacks=callbacks,

validation_data=val_generator,
validation_steps=self.config.VALIDATION_STEPS,
validation_data=val_generator,

validation_steps=self.config.VALIDATION_STEPS,

  max_queue_size=100,
  workers=workers,
  use_multiprocessing=True,

diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py
index d2bf53b..060172a 100644
--- i/mrcnn/parallel_model.py
+++ w/mrcnn/parallel_model.py
@@ -32,6 +32,7 @@ class ParallelModel(KM.Model):

  keras_model: The Keras model to parallelize
  gpu_count: Number of GPUs. Must be > 1
  """

super().__init__()

self.inner_model = keras_model
self.gpu_count = gpu_count
merged_outputs = self.make_parallel()

diff --git i/samples/coco/coco.py w/samples/coco/coco.py
index 5d172b5..239ed75 100644
--- i/samples/coco/coco.py
+++ w/samples/coco/coco.py
@@ -81,7 +81,10 @@ class CocoConfig(Config):

IMAGES_PER_GPU = 2

# Uncomment to train on 8 GPUs (default is 1)

GPU_COUNT = 8
GPU_COUNT = 8
BACKBONE = "resnet50"
STEPS_PER_EPOCH = 50
TRAIN_ROIS_PER_IMAGE = 512
# Number of classes (including background)
NUM_CLASSES = 1 + 80 # COCO has 80 classes

@@ -496,29 +499,10 @@ if name == '__main__':

  # *** This training schedule is an example. Update to your needs ***

  # Training - Stage 1

print("Training network heads")

model.train(dataset_train, dataset_val,
      learning_rate=config.LEARNING_RATE,
      epochs=40,

layers='heads',
augmentation=augmentation)

Training - Stage 2
Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=120,
layers='4+',
augmentation=augmentation)

Training - Stage 3
Fine tune all layers
print("Fine tune all layers")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE / 10,
epochs=160,
layers='all',
layers='3+',
```
      augmentation=augmentation)
```
elif args.command == "evaluate":






原文链接：https://detectron2.readthedocs.io/notes/benchmarks.html


欢迎关注磐创AI博客站：
[http://panchuang.net/](http://panchuang.net/)

sklearn机器学习中文官方文档：
[http://sklearn123.com/](http://sklearn123.com)

欢迎关注磐创博客资源汇总站：
[http://docs.panchuang.net/](http://docs.panchuang.net/)

Detectron2 基准测试 | 十二

基准测试

设置

主要结果

validation_data=val_generator,

validation_steps=self.config.VALIDATION_STEPS,

GPU_COUNT = 8

Training - Stage 2

Finetune layers from ResNet stage 4 and up

Training - Stage 3

Fine tune all layers

你可能感兴趣的:(人工智能)