Detectron benchmark 12

By Facebook research Compile Flin Source: Github

Benchmarking

Here, we benchmark the training speed of Mask R-CNN in detectron 2 based on some other popular open source implementations of Mask R-CNN.

set up

Hardware: 8 NVIDIA V100 with NVLink.
Software: Python 3.7, CUDA 10.0, cuDNN 7.6.4, PyTorch 1.3.0 (link( https://download.pytorch.org/whl/nightly/cu100/torch-1.3.0%2Bcu100-cp37-cp37m-linux_x86_64.whl)), TensorFlow 1.15.0rc2, Keras 2.2.5, MxNet 1.6.0b20190820.
Model: end to end r-50-fpn mask RCNN model, using configuration with Detectron baseline( https://github.com/facebookresearch/Detectron/blob/master/configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml )Same super parameter.
Metric: we use the average throughput in 100-500 iterations to skip the GPU warm-up time. Note that for R-CNN style models, the throughput of the model usually changes during training because it depends on the prediction of the model. Therefore, this index can not be directly compared with the "training speed" in model zoo, which is the average speed of the whole training process.

Main results

<table> < tr > < td > tools < / td > < td > throughput (IMG / s) < td > < tr > <tr> <td>Detectron2</td><td>59</td> </tr> <tr> <td>maskrcnn-benchmark</td><td>51</td> </tr> <tr> <td>tensorpack </td><td>50</td> </tr> <tr> <td>mmdetection</td><td>41</td> </tr> <tr> <td>simpledet</td><td>39</td> </tr> <tr> <td>Detectron</td><td>19</td> </tr> <tr> <td>matterport/Mask_RCNN</td><td>14</td> </tr> </table> Links for each implementation:

Detectron2:https://github.com/facebookresearch/detectron2/
maskrcnn-benchmark:https://github.com/facebookresearch/maskrcnn-benchmark/
tensorpack:https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN
mmdetection:https://github.com/open-mmlab/mmdetection/
simpledet:https://github.com/TuSimple/simpledet/
Detectron:https://github.com/facebookresearch/Detectron
matterport/Mask_RCNN:https://github.com/matterport/Mask_RCNN/

Details of each implementation:

Detectron2:

python tools/train_net.py  --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8

Maskrcnn benchmark: through sed -i's/torch.uint8/torch.bool/g '* * / *. py use commit 0ce8f6f to make it compatible with the latest PyTorch. Then, run
```
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
```
We have observed that the speed is faster than its model zoo, which may be due to different software versions.

tensorpack: submitting caafda,export TF_CUDNN_USE_AUTOTUNE=0, then run

mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz

mmdetection: commit 4d9a5f, apply the following diff, and then run

./tools/dist_train.sh configs/mask_rcnn_r50_fpn_1x.py 8

We have observed that the speed is faster than its model zoo, which may be due to different software versions.

<details> <summary> (diff makes it use the same super parameter - click expand) </summary>

diff --git i/configs/mask_rcnn_r50_fpn_1x.py w/configs/mask_rcnn_r50_fpn_1x.py
index 04f6d22..ed721f2 100644
--- i/configs/mask_rcnn_r50_fpn_1x.py
+++ w/configs/mask_rcnn_r50_fpn_1x.py
@@ -1,14 +1,15 @@
# model settings
model = dict(
  type='MaskRCNN',
-    pretrained='torchvision://resnet50',
+    pretrained='open-mmlab://resnet50_caffe',
  backbone=dict(
    type='ResNet',
    depth=50,
    num_stages=4,
    out_indices=(0, 1, 2, 3),
    frozen_stages=1,
-        style='pytorch'),
+        norm_cfg=dict(type="BN", requires_grad=False),
+        style='caffe'),
  neck=dict(
    type='FPN',
    in_channels=[256, 512, 1024, 2048],
@@ -115,7 +116,7 @@ test_cfg = dict(
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
-    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+    mean=[123.675, 116.28, 103.53], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
  dict(type='LoadImageFromFile'),
  dict(type='LoadAnnotations', with_bbox=True, with_mask=True),

</details>

SimpleDet: run at commit9187a1

python detection_train.py --config config/mask_r50v1_fpn_1x.py

Detectron: running
```
python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml
```
Note that many of its operations run on the CPU, so performance is limited.
matterport/Mask_RCNN: 3deaec during commit, apply the following diff, export TF_ CUDNN_ USE_ Autotune = 0, then run
```
python coco.py train --dataset=/data/coco/ --model=imagenet
```

Note that many of the small details in this implementation may differ from the Detectron standard.

<details> <summary> (diff makes it use the same super parameter - click expand) </summary>

diff --git i/mrcnn/model.py w/mrcnn/model.py
index 62cb2b0..61d7779 100644
--- i/mrcnn/model.py
+++ w/mrcnn/model.py
@@ -2367,8 +2367,8 @@ class MaskRCNN():
      epochs=epochs,
      steps_per_epoch=self.config.STEPS_PER_EPOCH,
      callbacks=callbacks,
-            validation_data=val_generator,
-            validation_steps=self.config.VALIDATION_STEPS,
+            #validation_data=val_generator,
+            #validation_steps=self.config.VALIDATION_STEPS,
      max_queue_size=100,
      workers=workers,
      use_multiprocessing=True,
diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py
index d2bf53b..060172a 100644
--- i/mrcnn/parallel_model.py
+++ w/mrcnn/parallel_model.py
@@ -32,6 +32,7 @@ class ParallelModel(KM.Model):
    keras_model: The Keras model to parallelize
    gpu_count: Number of GPUs. Must be > 1
    """
+        super().__init__()
    self.inner_model = keras_model
    self.gpu_count = gpu_count
    merged_outputs = self.make_parallel()
diff --git i/samples/coco/coco.py w/samples/coco/coco.py
index 5d172b5..239ed75 100644
--- i/samples/coco/coco.py
+++ w/samples/coco/coco.py
@@ -81,7 +81,10 @@ class CocoConfig(Config):
  IMAGES_PER_GPU = 2

  # Uncomment to train on 8 GPUs (default is 1)
-    # GPU_COUNT = 8
+    GPU_COUNT = 8
+    BACKBONE = "resnet50"
+    STEPS_PER_EPOCH = 50
+    TRAIN_ROIS_PER_IMAGE = 512

  # Number of classes (including background)
  NUM_CLASSES = 1 + 80  # COCO has 80 classes
@@ -496,29 +499,10 @@ if __name__ == '__main__':
    # *** This training schedule is an example. Update to your needs ***

    # Training - Stage 1
-        print("Training network heads")
    model.train(dataset_train, dataset_val,
          learning_rate=config.LEARNING_RATE,
          epochs=40,
-                    layers='heads',
-                    augmentation=augmentation)
-
-        # Training - Stage 2
-        # Finetune layers from ResNet stage 4 and up
-        print("Fine tune Resnet stage 4 and up")
-        model.train(dataset_train, dataset_val,
-                    learning_rate=config.LEARNING_RATE,
-                    epochs=120,
-                    layers='4+',
-                    augmentation=augmentation)
-
-        # Training - Stage 3
-        # Fine tune all layers
-        print("Fine tune all layers")
-        model.train(dataset_train, dataset_val,
-                    learning_rate=config.LEARNING_RATE / 10,
-                    epochs=160,
-                    layers='all',
+                    layers='3+',
          augmentation=augmentation)

  elif args.command == "evaluate":

</details>

Original link: https://detectron2.readthedocs.io/notes/benchmarks.html

Welcome to pioneer AI blog: http://panchuang.net/

sklearn machine learning Chinese official document: http://sklearn123.com/

Welcome to pioneer blog Resource Hub: http://docs.panchuang.net/

Posted by johnkelly on Tue, 02 Jun 2020 21:27:10 -0700

Programmer Group

Detectron benchmark 12

Benchmarking

set up

Main results

Hot Keywords