mmrotate

转载mmrotate

1、环境安装

1.1使用虚拟环境安装

创建虚拟环境

conda create -n open-mmlab python=3.7

安装pytorch环境(包括torchvision、cudatookit)

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda install pytorch==1.7.0 cudatoolkit=10.1 torchvision==0.8.0

安装openmim

pip install openmim

通过openmim安装mmcv-full、mmdet、mmrotate

mim install mmcv-full
mim install mmdet
#安装mmrotate有点特殊,可以使用mim install mmrotate,如果不成功,使用手动安装。
#去官网下载mmrotate或者git clone https://github.com/open-mmlab/mmrotate.git
cd mmrotate
pip install -r requirements/build.txt
pip install -v -e .(过程中可能需要C++开发工具,下一个visual studio开发工具安装就行)

1.2使用docker进行安装

使用Dockerfile创建镜像(此为另外一种方法)

# build an image with PyTorch 1.6, CUDA 10.1
docker build -t mmrotate docker/
# use the image generate one container
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmrotate/data mmrotate

2、验证环境安装是否成功

# cd mmrotate 
python demo/image_demo.py \
        demo/demo.jpg \
        work_dirs1/oriented_rcnn_r50_fpn_1x_dota_v3/oriented_rcnn_r50_fpn_1x_dota_v3.py \
        work_dirs2/oriented_rcnn_r50_fpn_1x_dota_v3/epoch_12.pth
# 注:work_dirs1为你存放config的路径,work_dirs2为你存放权重的路径

3、构建属于自己的数据集

3.1数据集下载地址

目前mmrotate构建的数据集只有:DOTA数据集、SSDD数据集、HRSC数据集、HRSID 数据集,对于这几个数据集只需要更改config里面对应存放照片和标签的路径(data_root )。对于其他自己的数据集需要自己构建。

DOTA数据集下载地址:https://captain-whu.github.io/DOTA/dataset.html
SSDD数据集下载地址:https://pan.baidu.com/s/1_uezALB6eZ7DiPIozFoGJQ 密码:0518
HRSC数据集下载地址:https://aistudio.baidu.com/aistudio/datasetdetail/54106
HRSID数据集下载地址:https://pan.baidu.com/share/init?surl=vks9fj64Bb06U170GNL7mw 密码:0518

3.2数据集存储结构

DOTA 数据集存储结构

# DOTA                         
mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── DOTA
│   │   ├── train
│   │   ├── val
│   │   ├── test

ssdd 数据集存储结构

# ssdd
mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── ssdd
│   │   ├── train
│   │   ├── test

hrsc数据集存储结构

mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── hrsc
│   │   ├── FullDataSet
│   │   │   ├─ AllImages
│   │   │   ├─ Annotations
│   │   │   ├─ LandMask
│   │   │   ├─ Segmentations
│   │   ├── ImageSets

hrsid 数据集存储结构

mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── hrsid
│   │   ├── trainsplit
│   │   ├── valsplit
│   │   ├── testsplit

3.3对应配置文件修改

change data_root in configs/_base_/datasets/dotav1.py to split DOTA dataset.
change data_root in configs/_base_/datasets/ssdd.py to data/ssdd/.
change data_root in configs/_base_/datasets/hrsc.py to data/hrsc/.
change data_root in configs/_base_/datasets/hrisd.py to data/hrsid/.

3.4 Split dota dataset

Please crop the original images into 1024×1024 patches with an overlap of 200 by run

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ss_trainval.json

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ss_test.json

If you want to get a multiple scale dataset, you can run the following command.

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ms_trainval.json

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ms_test.json

Please change the img_dirs and ann_dirs in json.

4、测试模型

包括三类 单GPU、单节点多GPU、多节点

# single-gpu
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

# multi-gpu
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [optional arguments]

# multi-node in slurm environment
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] --launcher slurm

例子: 参考 RotatedRetinaNet on DOTA-1.0 dataset,可以生成压缩文件在线提交。(先更改data_root。)

# 单GPU
python ./tools/test.py  \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth --format-only \
  --eval-options submission_dir=work_dirs/Task1_results
# 单节点多GPU,指定GPU的数目为1
./tools/dist_test.sh  \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth 1 --format-only \
  --eval-options submission_dir=work_dirs/Task1_results

您可以将data_root中的测试集路径更改为 val 集或 trainval 集以进行离线评估。

# 单GPU
python ./tools/test.py \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth --eval mAP
# 单节点多GPU,指定GPU的数目为1
./tools/dist_test.sh  \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth 1 --eval mAP

将结果进行可视化

python ./tools/test.py \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth \
  --show-dir work_dirs/vis

5、训练模型

模型训练一共分为5种方式:使用单个 GPU 进行训练、使用多个 GPU 进行训练、多台机器训练、使用 Slurm 管理作业、在一台机器上启动多个作业。

1、使用单个 GPU 进行训练

# 单GPU,如果要在命令中指定工作目录,可以添加参数。--work_dir ${YOUR_WORK_DIR}
python tools/train.py ${CONFIG_FILE} [optional arguments]

2、使用多个 GPU 进行训练

# 单节点多GPU
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

可选参数是:
--no-validate(不建议):默认情况下,代码库将在训练期间执行评估。要禁用此行为,请使用--no-validate.
--work-dir ${WORK_DIR}:覆盖配置文件中指定的工作目录。
--resume-from ${CHECKPOINT_FILE}:从以前的检查点文件恢复。
resume-from和之间的区别load-from: resume-from同时加载模型权重和优化器状态,epoch 也是从指定的检查点继承而来的。它通常用于恢复意外中断的训练过程。 load-from只加载模型权重,训练epoch从0开始。通常用于finetuning。

3、使用多台机器进行训练

使用仅通过以太网连接的多台机器启动,可以简单地运行以下命令:
# 在第一台机器上:
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
# 在第二台机器上:
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
注:没有像 InfiniBand 这样的高速网络,通常会很慢。

4、在一台机器上启动多个作业

如果在单台机器上启动多个作业,例如在一台有 8 个 GPU 的机器上进行 2 个 4-GPU 训练的作业,则需要为每个作业指定不同的端口(默认为 29500)以避免通信冲突。

(1)若用dist_train.sh启动训练作业,您可以在命令中设置端口。

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4

(2)若使用 Slurm 启动训练作业,则需要修改配置文件(通常配置文件底部的第 6 行)以设置不同的通信端口。

# 在config1.py:
dist_params = dict(backend='nccl', port=29500)
# 在config2.py:
dist_params = dict(backend='nccl', port=29501)
#启动两个作业。
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}

5、基准和模型库

5.1基准和模型库

  • Rotated RetinaNet-OBB/HBB (ICCV’2017)
  • Rotated FasterRCNN-OBB (TPAMI’2017)
  • Rotated RepPoints-OBB (ICCV’2019)
  • RoI Transformer (CVPR’2019)
  • Gliding Vertex (TPAMI’2020)
  • Rotated ATSS-OBB (CVPR’2020)
  • CSL (ECCV’2020)
  • R3Det (AAAI’2021)
  • S2A-Net (TGRS’2021)
  • ReDet (CVPR’2021)
  • Beyond Bounding-Box (CVPR’2021)
  • Oriented R-CNN (ICCV’2021)
  • GWD (ICML’2021)
  • KLD (NeurIPS’2021)
  • [SASM](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/docs/en/configs/sasm_reppoints/README.md) (AAAI’2022)
  • KFIoU (arXiv)
  • G-Rep (stay tuned)

5.1Results on DOTA v1.0

Backbone mAP Angle lr schd Mem (GB) Inf Time (fps) Aug Batch Size Configs Download
ResNet50 (1024,1024,200) 59.44 oc 1x 3.45 15.6 - 2 [rotated_reppoints_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_reppoints/rotated_reppoints_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 64.55 oc 1x 3.38 15.7 - 2 [rotated_retinanet_hbb_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_hbb_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 65.59 oc 1x 3.12 18.5 - 2 [rotated_atss_hbb_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_atss/rotated_atss_hbb_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 66.45 oc 1x 3.53 15.3 - 2 [sasm_reppoints_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/sasm/sasm_reppoints_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 68.42 le90 1x 3.38 16.9 - 2 [rotated_retinanet_obb_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 68.79 le90 1x 2.36 22.4 - 2 [rotated_retinanet_obb_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet_obb_r50_fpn_fp16_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 69.49 le135 1x 4.05 8.6 - 2 [g_reppoints_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/g_reppoints/g_reppoints_r50_fpn_1x_dota_le135.py) model | log
ResNet50 (1024,1024,200) 69.51 le90 1x 4.40 24.0 - 2 [rotated_retinanet_obb_csl_gaussian_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/csl/rotated_retinanet_obb_csl_gaussian_r50_fpn_fp16_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 69.55 oc 1x 3.39 15.5 - 2 [rotated_retinanet_hbb_gwd_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/gwd/rotated_retinanet_hbb_gwd_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 69.60 le90 1x 3.38 15.1 - 2 [rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 69.63 le135 1x 3.45 16.1 - 2 [cfa_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/cfa/cfa_r50_fpn_1x_dota_le135.py) model | log
ResNet50 (1024,1024,200) 69.76 oc 1x 3.39 15.6 - 2 [rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 69.77 le135 1x 3.38 15.3 - 2 [rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le135.py) model | log
ResNet50 (1024,1024,200) 69.79 le135 1x 3.38 17.2 - 2 [rotated_retinanet_obb_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le135.py) model | log
ResNet50 (1024,1024,200) 69.80 oc 1x 3.54 12.4 - 2 [r3det_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/r3det/r3det_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 69.94 oc 1x 3.39 15.6 - 2 [rotated_retinanet_hbb_kld_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kld/rotated_retinanet_hbb_kld_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 70.18 oc 1x 3.23 15.6 - 2 [r3det_tiny_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/r3det/r3det_tiny_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 70.64 le90 1x 3.12 18.2 - 2 [rotated_atss_obb_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_atss/rotated_atss_obb_r50_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 71.83 oc 1x 3.54 12.4 - 2 [r3det_kld_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kld/r3det_kld_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 72.29 le135 1x 3.19 18.8 - 2 [rotated_atss_obb_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_atss/rotated_atss_obb_r50_fpn_1x_dota_le135.py) model | log
ResNet50 (1024,1024,200) 72.68 oc 1x 3.62 12.2 - 2 [r3det_kfiou_ln_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/r3det_kfiou_ln_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 72.76 oc 1x 3.44 14.0 - 2 [r3det_tiny_kld_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kld/r3det_tiny_kld_r50_fpn_1x_dota_oc.py) model | log
ResNet50 (1024,1024,200) 73.23 le90 1x 8.45 16.4 - 2 [gliding_vertex_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/gliding_vertex/gliding_vertex_r50_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 73.40 le90 1x 8.46 16.5 - 2 [rotated_faster_rcnn_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 73.45 oc 40e 3.45 16.1 - 2 [cfa_r50_fpn_40e_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/cfa/cfa_r50_fpn_40e_dota_oc.py) model | log
ResNet50 (1024,1024,200) 73.91 le135 1x 3.14 15.5 - 2 [s2anet_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/s2anet/s2anet_r50_fpn_1x_dota_le135.py) model | log
ResNet50 (1024,1024,200) 74.19 le135 1x 2.17 17.4 - 2 [s2anet_r50_fpn_fp16_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/s2anet_r50_fpn_fp16_1x_dota_le135.py) model | log
ResNet50 (1024,1024,200) 75.63 le90 1x 7.37 21.2 - 2 [oriented_rcnn_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/oriented_rcnn_r50_fpn_fp16_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 75.69 le90 1x 8.46 16.2 - 2 [oriented_rcnn_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 75.75 le90 1x 7.56 19.3 - 2 [roi_trans_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans_r50_fpn_fp16_1x_dota_le90.py) model | log
ReResNet50 (1024,1024,200) 75.99 le90 1x 7.71 13.3 - 2 [redet_re50_refpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/redet_re50_refpn_fp16_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 76.08 le90 1x 8.67 14.4 - 2 [roi_trans_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans/roi_trans_r50_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 76.50 le90 1x 17.5 MS+RR 2 [rotated_retinanet_obb_r50_fpn_1x_dota_ms_rr_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_ms_rr_le90.py) model | log
ReResNet50 (1024,1024,200) 76.68 le90 1x 9.32 10.9 - 2 [redet_re50_refpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/redet/redet_re50_refpn_1x_dota_le90.py) model | log
Swin-tiny (1024,1024,200) 77.51 le90 1x 10.9 - 2 [roi_trans_swin_tiny_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans/roi_trans_swin_tiny_fpn_1x_dota_le90.py) model | log
ResNet50 (1024,1024,200) 79.66 le90 1x 14.4 MS+RR 2 [roi_trans_r50_fpn_1x_dota_ms_rr_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans/roi_trans_r50_fpn_1x_dota_ms_rr_le90.py) model | log
ReResNet50 (1024,1024,200) 79.87 le90 1x 10.9 MS+RR 2 [redet_re50_refpn_1x_dota_ms_rr_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/redet/redet_re50_refpn_1x_dota_ms_rr_le90.py) model | log
  • MS means multiple scale image split.
  • RR means random rotation.
  • The above models are trained with 1 * 1080Ti/2080Ti and inferred with 1 * 2080Ti.
  • model|log : model weight|log

6、了解配置文件

6.1更新字典链的配置键。

  • 配置选项可以按照原始配置中的dict键的顺序指定。例如,——cfg-options model.backbone。norm_eval=False将模型骨干中的所有BN模块更改为训练模式。

6.2更新配置列表中的键。

  • 有些配置字典在配置中组成一个列表。例如,训练管道data.train.pipeline通常是一个列表,例如[dict(type=‘LoadImageFromFile’),…]。如果你想把LoadImageFromFile修改为LoadImageFromWebcam,你可以指定——cfg-options data.train.pipeline.0.type=LoadImageFromWebcam。

6.3更新列表/元组的值。

  • 如果要更新的值是列表或元组。例如,配置文件通常设置工作流=[(‘train’, 1)]。如果你想改变这个键,你可以指定——cfg-options workflow=“[(train,1),(val,1)]”。请注意,引号“是支持列表/元组数据类型所必需的,并且在指定值的引号内不允许有空格。

6.4配置文件命名约定

{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{dataset}_{data setting}_{angle version}

{xxx}是必填字段并且[yyy]是可选的。

  • {model}: 模型类型,如rotated_faster_rcnn,rotated_retinanet等。
  • [model setting]:某些模型的特定设置,例如hbbforrotated_retinanet等。
  • {backbone}:骨干类型,如r50(ResNet-50),swin_tiny(SWIN-tiny)。
  • {neck}: 颈型如fpn, refpn.
  • [norm_setting]: bn(Batch Normalization) 除非指定,否则使用,其他规范层类型可以是gn(Group Normalization), syncbn(Synchronized Batch Normalization)。 gn-head/gn-neck表示 GN 仅应用于头部/颈部,而gn-all表示 GN 应用于整个模型,例如骨干、颈部、头部。
  • [misc]: 模型的其他设置/插件,例如dconv, gcb, attention, albu, mstrain.
  • [gpu x batch_per_gpu]:GPU 和每个 GPU 的样本,1xb2默认使用。
  • {dataset}: 数据集如dota.
  • {angle version}: 像oc, le135, 或le90.

6.5配置Example

ResNet50 和 FPN 的 在 RotatedRetinaNet 中的配置

angle_version = 'oc'  # The angle version
model = dict(
    type='RotatedRetinaNet',  # The name of detector
    backbone=dict(  # The config of backbone
        type='ResNet',  # The type of the backbone
        depth=50,  # The depth of backbone
        num_stages=4,  # Number of stages of the backbone.
        out_indices=(0, 1, 2, 3),  # The index of output feature maps produced in each stages
        frozen_stages=1,  # The weights in the first 1 stage are fronzen
        zero_init_residual=False,  # Whether to use zero init for last norm layer in resblocks to let them behave as identity.
        norm_cfg=dict(  # The config of normalization layers.
            type='BN',  # Type of norm layer, usually it is BN or GN
            requires_grad=True),  # Whether to train the gamma and beta in BN
        norm_eval=True,  # Whether to freeze the statistics in BN
        style='pytorch',  # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs.
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),  # The ImageNet pretrained backbone to be loaded
    neck=dict(
        type='FPN',  # The neck of detector is FPN. We also support 'ReFPN'
        in_channels=[256, 512, 1024, 2048],  # The input channels, this is consistent with the output channels of backbone
        out_channels=256,  # The output channels of each level of the pyramid feature map
        start_level=1,  # Index of the start input backbone level used to build the feature pyramid
        add_extra_convs='on_input',  # It specifies the source feature map of the extra convs
        num_outs=5),  # The number of output scales
    bbox_head=dict(
        type='RotatedRetinaHead',# The type of bbox head is 'RRetinaHead'
        num_classes=15,  # Number of classes for classification
        in_channels=256,  # Input channels for bbox head
        stacked_convs=4,  # Number of stacking convs of the head
        feat_channels=256,  # Number of hidden channels
        assign_by_circumhbbox='oc',  # The angle version of obb2hbb
        anchor_generator=dict(  # The config of anchor generator
            type='RotatedAnchorGenerator',  # The type of anchor generator
            octave_base_scale=4,  # The base scale of octave.
            scales_per_octave=3,  #  Number of scales for each octave.
            ratios=[1.0, 0.5, 2.0],  # The ratio between height and width.
            strides=[8, 16, 32, 64, 128]),  # The strides of the anchor generator. This is consistent with the FPN feature strides.
        bbox_coder=dict(  # Config of box coder to encode and decode the boxes during training and testing
            type='DeltaXYWHAOBBoxCoder',  # Type of box coder.
            angle_range='oc',  # The angle version of box coder.
            norm_factor=None,  # The norm factor of box coder.
            edge_swap=False,  # The edge swap flag of box coder.
            proj_xy=False,  # The project flag of box coder.
            target_means=(0.0, 0.0, 0.0, 0.0, 0.0),  # The target means used to encode and decode boxes
            target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),  # The standard variance used to encode and decode boxes
        loss_cls=dict(  # Config of loss function for the classification branch
            type='FocalLoss',  # Type of loss for classification branch
            use_sigmoid=True,  #  Whether the prediction is used for sigmoid or softmax
            gamma=2.0,  # The gamma for calculating the modulating factor
            alpha=0.25,  # A balanced form for Focal Loss
            loss_weight=1.0),  # Loss weight of the classification branch
        loss_bbox=dict(  # Config of loss function for the regression branch
            type='L1Loss',  # Type of loss
            loss_weight=1.0)),  # Loss weight of the regression branch
    train_cfg=dict(  # Config of training hyperparameters
        assigner=dict(  # Config of assigner
            type='MaxIoUAssigner',  # Type of assigner
            pos_iou_thr=0.5,  # IoU >= threshold 0.5 will be taken as positive samples
            neg_iou_thr=0.4,  # IoU < threshold 0.4 will be taken as negative samples
            min_pos_iou=0,  # The minimal IoU threshold to take boxes as positive samples
            ignore_iof_thr=-1,  # IoF threshold for ignoring bboxes
            iou_calculator=dict(type='RBboxOverlaps2D')),  # Type of Calculator for IoU
        allowed_border=-1,  # The border allowed after padding for valid anchors.
        pos_weight=-1,  # The weight of positive samples during training.
        debug=False),  # Whether to set the debug mode
    test_cfg=dict(  # Config of testing hyperparameters
        nms_pre=2000,  # The number of boxes before NMS
        min_bbox_size=0,  # The allowed minimal box size
        score_thr=0.05,  # Threshold to filter out boxes
        nms=dict(iou_thr=0.1), # NMS threshold
        max_per_img=2000))  # The number of boxes to be kept after NMS.
dataset_type = 'DOTADataset'  # Dataset type, this will be used to define the dataset
data_root = '../datasets/split_1024_dota1_0/'  # Root path of data
img_norm_cfg = dict(  # Image normalization config to normalize the input images
    mean=[123.675, 116.28, 103.53],  # Mean values used to pre-training the pre-trained backbone models
    std=[58.395, 57.12, 57.375],  # Standard variance used to pre-training the pre-trained backbone models
    to_rgb=True)  # The channel orders of image used to pre-training the pre-trained backbone models
train_pipeline = [  # Training pipeline
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(type='LoadAnnotations',  # Second pipeline to load annotations for current image
         with_bbox=True),  # Whether to use bounding box, True for detection
    dict(type='RResize',  # Augmentation pipeline that resize the images and their annotations
         img_scale=(1024, 1024)),  # The largest scale of image
    dict(type='RRandomFlip',  # Augmentation pipeline that flip the images and their annotations
         flip_ratio=0.5,  # The ratio or probability to flip
         version='oc'),  # The angle version
    dict(
        type='Normalize',  # Augmentation pipeline that normalize the input images
        mean=[123.675, 116.28, 103.53],  # These keys are the same of img_norm_cfg since the
        std=[58.395, 57.12, 57.375],  # keys of img_norm_cfg are used here as arguments
        to_rgb=True),
    dict(type='Pad',  # Padding config
         size_divisor=32),  # The number the padded images should be divisible
    dict(type='DefaultFormatBundle'),  # Default format bundle to gather data in the pipeline
    dict(type='Collect',  # Pipeline that decides which keys in the data should be passed to the detector
         keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(
        type='MultiScaleFlipAug',  # An encapsulation that encapsulates the testing augmentations
        img_scale=(1024, 1024),  # Decides the largest scale for testing, used for the Resize pipeline
        flip=False,  # Whether to flip images during testing
        transforms=[
            dict(type='RResize'),  # Use resize augmentation
            dict(
                type='Normalize',  # Normalization config, the values are from img_norm_cfg
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad',  # Padding config to pad images divisible by 32.
                 size_divisor=32),
            dict(type='DefaultFormatBundle'),  # Default format bundle to gather data in the pipeline
            dict(type='Collect',  # Collect pipeline that collect necessary keys for testing.
                 keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,  # Batch size of a single GPU
    workers_per_gpu=2,  # Worker to pre-fetch data for each single GPU
    train=dict(  # Train dataset config
        type='DOTADataset',  # Type of dataset
        ann_file=
        '../datasets/split_1024_dota1_0/trainval/annfiles/',  # Path of annotation file
        img_prefix=
        '../datasets/split_1024_dota1_0/trainval/images/',  # Prefix of image path
        pipeline=[  # pipeline, this is passed by the train_pipeline created before.
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(type='RRandomFlip', flip_ratio=0.5, version='oc'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ],
        version='oc'),
    val=dict(  # Validation dataset config
        type='DOTADataset',
        ann_file=
        '../datasets/split_1024_dota1_0/trainval/annfiles/',
        img_prefix=
        '../datasets/split_1024_dota1_0/trainval/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='oc'),
    test=dict(  # Test dataset config, modify the ann_file for test-dev/test submission
        type='DOTADataset',
        ann_file=
        '../datasets/split_1024_dota1_0/test/images/',
        img_prefix=
        '../datasets/split_1024_dota1_0/test/images/',
        pipeline=[  # Pipeline is passed by test_pipeline created before
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='oc'))
evaluation = dict(  # The config to build the evaluation hook
    interval=12,  # Evaluation interval
    metric='mAP')  # Metrics used during evaluation
optimizer = dict(  # Config used to build optimizer
    type='SGD',  # Type of optimizers
    lr=0.0025,  # Learning rate of optimizers
    momentum=0.9,  # Momentum
    weight_decay=0.0001)  # Weight decay of SGD
optimizer_config = dict(  # Config used to build the optimizer hook
    grad_clip=dict(
        max_norm=35,
        norm_type=2))
lr_config = dict(  # Learning rate scheduler config used to register LrUpdater hook
    policy='step',  # The policy of scheduler
    warmup='linear',  # The warmup policy, also support `exp` and `constant`.
    warmup_iters=500,  # The number of iterations for warmup
    warmup_ratio=0.3333333333333333,  # The ratio of the starting learning rate used for warmup
    step=[8, 11])  # Steps to decay the learning rate
runner = dict(
    type='EpochBasedRunner',  # Type of runner to use (i.e. IterBasedRunner or EpochBasedRunner)
    max_epochs=12) # Runner that runs the workflow in total max_epochs. For IterBasedRunner use `max_iters`
checkpoint_config = dict(  # Config to set the checkpoint hook
    interval=12)  # The save interval is 12
log_config = dict(  # config to register logger hook
    interval=50,  # Interval to print the log
    hooks=[
        # dict(type='TensorboardLoggerHook')  # The Tensorboard logger is also supported
        dict(type='TextLoggerHook')
    ])  # The logger used to record the training process.
dist_params = dict(backend='nccl')  # Parameters to setup distributed training, the port can also be set.
log_level = 'INFO'  # The level of logging.
load_from = None  # load models as a pre-trained model from a given path. This will not resume training.
resume_from = None  # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved.
workflow = [('train', 1)]  # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 12 epochs according to the total_epochs.
work_dir = './work_dirs/rotated_retinanet_hbb_r50_fpn_1x_dota_oc'  # Directory to save the model checkpoints and logs for the current experiments.

6.6常见问题

配置文件中使用了一些中间变量,例如数据集中的train_pipelinetest_pipeline。值得注意的是,在修改子配置中的中间变量时,用户需要再次将中间变量传递到相应的字段中。例如,我们想使用离线多尺度策略来训练 RoI-Trans。train_pipeline是我们想要修改的中间变量。

#我们首先定义新的train_pipeline/test_pipeline并将它们传递给data.
_base_ = ['./roi_trans_r50_fpn_1x_dota_le90.py']

data_root = '../datasets/split_ms_dota1_0/'
angle_version = 'le90'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version=angle_version),
    dict(
        type='PolyRandomRotate',
        rotate_ratio=0.5,
        angles_range=180,
        auto_bound=False,
        version=angle_version),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
data = dict(
    train=dict(
        pipeline=train_pipeline,
        ann_file=data_root + 'trainval/annfiles/',
        img_prefix=data_root + 'trainval/images/'),
    val=dict(
        ann_file=data_root + 'trainval/annfiles/',
        img_prefix=data_root + 'trainval/images/'),
    test=dict(
        ann_file=data_root + 'test/images/',
        img_prefix=data_root + 'test/images/'))

#同样,如果我们想从 切换SyncBN到BNor MMSyncBN,我们需要替换norm_cfg配置中的 every 。
_base_ = './roi_trans_r50_fpn_1x_dota_le90.py'
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    backbone=dict(norm_cfg=norm_cfg),
    neck=dict(norm_cfg=norm_cfg),
    ...)

7、自定义数据集

7.1将新数据格式重组为现有格式

# 最简单的方法是将数据集转换为现有数据集格式 (DOTA)。
# DOTA格式的注解txt文件:
184 2875 193 2923 146 2932 137 2885 plane 0
66 2095 75 2142 21 2154 11 2107 plane 0
...
# 每行代表一个对象,并将其记录为一个 10 维数组A。
# A[0:8]: 具有格式的多边形。(x1, y1, x2, y2, x3, y3, x4, y4)
# A[8]: 类别。
# A[9]: 困难。

7.2自定义新数据集

  1. 修改配置文件以使用自定义数据集。
  2. 检查自定义数据集的注释。

7.3自定义新数据集Example

这里我们举一个例子来展示上述两个步骤,它使用一个自定义的 5 类 COCO 格式的数据集来训练一个现有的 Cascade Mask R-CNN R50-FPN 检测器。

1.修改配置文件以使用自定义数据集

配置文件的修改涉及两个方面:

  1. data方面。具体来说,您需要显式地在数据中添加classes字段在data.train``data.val``data.test
  2. 模型部分中的num_classes字段。显式重写所有num_classes的默认值(例如COCO中的80)到你的类号。
#configs/my_custom_config.py:

# the new config inherits the base configs to highlight the necessary modification
_base_ = './rotated_retinanet_hbb_r50_fpn_1x_dota_oc'

# 1. dataset settings
dataset_type = 'DOTADataset'
classes = ('a', 'b', 'c', 'd', 'e')
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/train/annotation_data',
        img_prefix='path/to/your/train/image_data'),
    val=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/val/annotation_data',
        img_prefix='path/to/your/val/image_data'),
    test=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/test/annotation_data',
        img_prefix='path/to/your/test/image_data'))

# 2. model settings
model = dict(
    bbox_head=dict(
        type='RotatedRetinaHead',
        # explicitly over-write all the `num_classes` field from default 15 to 5.
        num_classes=15))

2.查看自定义数据集的注解

假设您的自定义数据集是 DOTA 格式,请确保您在自定义数据集中具有正确的注释:

配置文件中的classes字段应该与txt注释中的A[8]具有完全相同的元素和顺序。MMRotate自动将类别中的不连续id映射到连续的标签索引,因此类别字段中名称的字符串顺序会影响标签索引的顺序。同时,类在config中的字符串顺序会影响到预测边界框可视化过程中的标签文本。

7.4通过数据集包装器自定义数据集

MMRotate 还支持许多数据集包装器来混合数据集或修改数据集分布以进行训练。目前它支持三个数据集包装器,如下所示:

  • RepeatDataset: 简单地重复整个数据集。
  • ClassBalancedDataset:以类平衡的方式重复数据集。
  • ConcatDataset: 连接数据集。

7.4.1重复数据集

# 我们`RepeatDataset`用作包装器来重复数据集。例如,假设原始数据集是`Dataset_A`,重复一遍,配置如下所示 
dataset_A_train = dict(
        type='RepeatDataset',
        times=N,
        dataset=dict(  # This is the original config of Dataset_A
            type='Dataset_A',
            ...
            pipeline=train_pipeline
        )
    )

7.4.2类平衡数据集

# 我们使用ClassBalancedDataset包装器来根据类别频率重复数据集。要重复的数据集需要实例化函数
# self.get_cat_ids(idx) 来支持ClassBalancedDataset。例如,重复Dataset_A使用oversample_thr=1e-3,配置如下所示:
dataset_A_train = dict(
        type='ClassBalancedDataset',
        oversample_thr=1e-3,
        dataset=dict(  # This is the original config of Dataset_A
            type='Dataset_A',
            ...
            pipeline=train_pipeline
        )
    )

7.4.3连接数据集

有三种方法可以连接数据集。

1、 如果要连接的数据集属于同一类型且具有不同的注释文件,则可以连接数据集配置,如下所示。

dataset_A_train = dict(
    type='Dataset_A',
    ann_file = ['anno_file_1', 'anno_file_2'],
    pipeline=train_pipeline
)
#如果拼接后的数据集用于测试或评估,这种方式支持对每个数据集分别进行评估。要测试连接的数据集作为一个整体,您可以设置separate_eval=False如下。
dataset_A_train = dict(
    type='Dataset_A',
    ann_file = ['anno_file_1', 'anno_file_2'],
    separate_eval=False,
    pipeline=train_pipeline
)

2、如果您要连接的数据集不同,您可以连接数据集配置,如下所示。

dataset_A_train = dict()
dataset_B_train = dict()

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train = [
        dataset_A_train,
        dataset_B_train
    ],
    val = dataset_A_val,
    test = dataset_A_test
    )
#注:如果拼接后的数据集用于测试或评估,这种方式还支持对每个数据集分别进行评估。

3、我们也支持ConcatDataset明确定义如下。

dataset_A_val = dict()
dataset_B_val = dict()

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dataset_A_train,
    val=dict(
        type='ConcatDataset',
        datasets=[dataset_A_val, dataset_B_val],
        separate_eval=False))
# 种方式允许用户通过设置将所有数据集评估为单个数据集separate_eval=False。

4、注意事项

  1. 该选项separate_eval=False假定数据集self.data_infos在评估期间使用。因此,COCO 数据集不支持这种行为,因为 COCO 数据集不完全依赖于self.data_infos评估。没有测试组合不同类型的数据集并对其进行整体评估,因此不建议这样做。
  2. 评估ClassBalancedDatasetRepeatDataset不支持因此评估这些类型的连接数据集也不支持。
  3. 一个更复杂的例子,分别重复NDataset_ADataset_BM 次,然后连接重复的数据集,如下所示。
dataset_A_train = dict(
    type='RepeatDataset',
    times=N,
    dataset=dict(
        type='Dataset_A',
        ...
        pipeline=train_pipeline
    )
)
dataset_A_val = dict(
    ...
    pipeline=test_pipeline
)
dataset_A_test = dict(
    ...
    pipeline=test_pipeline
)
dataset_B_train = dict(
    type='RepeatDataset',
    times=M,
    dataset=dict(
        type='Dataset_B',
        ...
        pipeline=train_pipeline
    )
)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train = [
        dataset_A_train,
        dataset_B_train
    ],
    val = dataset_A_val,
    test = dataset_A_test
)

8、自定义模型

模型基本上分为5种类型

  • 主干:通常是一个 FCN 网络来提取特征图,例如 ResNet、Swin。
  • 颈部:骨干和头部之间的组件,例如 FPN、ReFPN。
  • head:特定任务的组件,例如 bbox 预测。
  • roi extractor:用于从特征图中提取 RoI 特征的部分,例如 RoI Align Rotated。
  • loss:head中计算loss的组件,如FocalLoss、GWDLoss、KFIoULoss。

8.1开发新组件

8.1.1添加新的主干

1.定义一个新的主干网(例如MobileNet)

# 创建一个新文件mmrotate/models/backbones/mobilenet.py。
import torch.nn as nn
from mmrotate.models.builder import ROTATED_BACKBONES

@ROTATED_BACKBONES.register_module()
class MobileNet(nn.Module):

    def __init__(self, arg1, arg2):
        pass

    def forward(self, x):  # should return a tuple
        pass

2.导入模块

# 可以将以下行添加到mmrotate/models/backbones/__init__.py
from .mobilenet import MobileNet
# 或者添加以下代码到配置文件以避免修改原始代码。
custom_imports = dict(
    imports=['mmrotate.models.backbones.mobilenet'],
    allow_failed_imports=False)

3. 在你的配置文件中使用主干

model = dict(
    ...
    backbone=dict(
        type='MobileNet',
        arg1=xxx,
        arg2=xxx),
    ...

8.1.2添加新的脖子

1.定义颈部(例如PAFPN)

# 创建一个新文件mmrotate/models/necks/pafpn.py。
from mmrotate.models.builder import ROTATED_NECKS

@ROTATED_NECKS.register_module()
class PAFPN(nn.Module):

    def __init__(self,
                in_channels,
                out_channels,
                num_outs,
                start_level=0,
                end_level=-1,
                add_extra_convs=False):
        pass

    def forward(self, inputs):
        # implementation is ignored
        pass

2.导入模块

# 可以将以下行添加到mmrotate/models/necks/__init__.py
from .pafpn import PAFPN
# 或者添加以下代码到配置文件以避免修改原始代码。
custom_imports = dict(
    imports=['mmrotate.models.necks.pafpn.py'],
    allow_failed_imports=False)

3.修改配置文件

neck=dict(
    type='PAFPN',
    in_channels=[256, 512, 1024, 2048],
    out_channels=256,
    num_outs=5)

8.1.3添加新头

#首先,在mmrotate/models/roi_heads/bbox_heads/double_bbox_head.py. 双头 R-CNN 实现了一个新的 bbox 头来进行目标检测。要实现一个bbox head,基本上我们需要实现新模块的三个功能,如下所示。
from mmrotate.models.builder import ROTATED_HEADS
from mmrotate.models.roi_heads.bbox_heads.bbox_head import BBoxHead

@ROTATED_HEADS.register_module()
class DoubleConvFCBBoxHead(BBoxHead):
    r"""Bbox head used in Double-Head R-CNN

                                      /-> cls
                  /-> shared convs ->
                                      \-> reg
    roi features
                                      /-> cls
                  \-> shared fc    ->
                                      \-> reg
    """  # noqa: W605

    def __init__(self,
                 num_convs=0,
                 num_fcs=0,
                 conv_out_channels=1024,
                 fc_out_channels=1024,
                 conv_cfg=None,
                 norm_cfg=dict(type='BN'),
                 **kwargs):
        kwargs.setdefault('with_avg_pool', True)
        super(DoubleConvFCBBoxHead, self).__init__(**kwargs)

    def forward(self, x_cls, x_reg):

# 其次,如有必要,实施新的 RoI Head。我们计划DoubleHeadRoIHead从StandardRoIHead. 我们可以发现aStandardRoIHead已经实现了以下功能。
import torch
from mmdet.core import bbox2result, bbox2roi, build_assigner, build_sampler
from mmrotate.models.builder import ROTATED_HEADS, build_head, build_roi_extractor
from mmrotate.models.roi_heads.base_roi_head import BaseRoIHead
from mmrotate.models.roi_heads.test_mixins import BBoxTestMixin, MaskTestMixin


@ROTATED_HEADS.register_module()
class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin):
    """Simplest base roi head including one bbox head and one mask head.
    """

    def init_assigner_sampler(self):

    def init_bbox_head(self, bbox_roi_extractor, bbox_head):

    def forward_dummy(self, x, proposals):


    def forward_train(self,
                      x,
                      img_metas,
                      proposal_list,
                      gt_bboxes,
                      gt_labels,
                      gt_bboxes_ignore=None,
                      gt_masks=None):

    def _bbox_forward(self, x, rois):

    def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels,
                            img_metas):

    def simple_test(self,
                    x,
                    proposal_list,
                    img_metas,
                    proposals=None,
                    rescale=False):
        """Test without augmentation."""
# 双头的修改主要在 bbox_forward 逻辑中,它继承了StandardRoIHead. 在 中mmrotate/models/roi_heads/double_roi_head.py,我们实现了新的 RoI Head,如下所示:
from mmrotate.models.builder import ROTATED_HEADS
from mmrotate.models.roi_heads.standard_roi_head import StandardRoIHead


@ROTATED_HEADS.register_module()
class DoubleHeadRoIHead(StandardRoIHead):
    """RoI head for Double Head RCNN

    https://arxiv.org/abs/1904.06493
    """

    def __init__(self, reg_roi_scale_factor, **kwargs):
        super(DoubleHeadRoIHead, self).__init__(**kwargs)
        self.reg_roi_scale_factor = reg_roi_scale_factor

    def _bbox_forward(self, x, rois):
        bbox_cls_feats = self.bbox_roi_extractor(
            x[:self.bbox_roi_extractor.num_inputs], rois)
        bbox_reg_feats = self.bbox_roi_extractor(
            x[:self.bbox_roi_extractor.num_inputs],
            rois,
            roi_scale_factor=self.reg_roi_scale_factor)
        if self.with_shared_head:
            bbox_cls_feats = self.shared_head(bbox_cls_feats)
            bbox_reg_feats = self.shared_head(bbox_reg_feats)
        cls_score, bbox_pred = self.bbox_head(bbox_cls_feats, bbox_reg_feats)

        bbox_results = dict(
            cls_score=cls_score,
            bbox_pred=bbox_pred,
            bbox_feats=bbox_cls_feats)
        return bbox_results

最后,用户需要添加模块, mmrotate/models/bbox_heads/__init__.py相应mmrotate/models/roi_heads/__init__.py的注册表才能找到并加载它们。

或者,用户可以添加到配置文件并实现相同的目标。

custom_imports=dict(
    imports=['mmrotate.models.roi_heads.double_roi_head', 'mmrotate.models.bbox_heads.double_bbox_head'])

8.1.4添加损失

假设您想MyLoss为边界框回归添加一个新的损失。要添加新的损失函数,用户需要在mmrotate/models/losses/my_loss.py. 装饰器weighted_loss可以为每个元素加权损失。

import torch
import torch.nn as nn

from mmrotate.models.builder import ROTATED_LOSSES
from mmdet.models.losses.utils import weighted_loss

@weighted_loss
def my_loss(pred, target):
    assert pred.size() == target.size() and target.numel() > 0
    loss = torch.abs(pred - target)
    return loss

@ROTATED_LOSSES.register_module()
class MyLoss(nn.Module):

    def __init__(self, reduction='mean', loss_weight=1.0):
        super(MyLoss, self).__init__()
        self.reduction = reduction
        self.loss_weight = loss_weight

    def forward(self,
                pred,
                target,
                weight=None,
                avg_factor=None,
                reduction_override=None):
        assert reduction_override in (None, 'none', 'mean', 'sum')
        reduction = (
            reduction_override if reduction_override else self.reduction)
        loss_bbox = self.loss_weight * my_loss(
            pred, target, weight, reduction=reduction, avg_factor=avg_factor)
        return loss_bbox

然后用户需要将其添加到mmrotate/models/losses/__init__.py.

from .my_loss import MyLoss, my_loss

或者,可以添加 如下到配置文件并实现相同的目标。

custom_imports=dict(
    imports=['mmrotate.models.losses.my_loss'])

要使用它,请修改该loss_xxx字段。由于 MyLoss 是用于回归的,因此您需要修改loss_bbox头部中的字段。

loss_bbox=dict(type='MyLoss', loss_weight=1.0))

9、自定义运行时设置

9.1自定义优化设置

9.1.1自定义 Pytorch 支持的优化器

我们已经支持使用 PyTorch 实现的所有优化器,唯一的修改是更改optimizer配置文件的字段。比如你要使用ADAM(注意性能可能会下降很多),修改可以如下。

optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
# 修改模型的学习率,用户只需要修改lrconfig 中的optimizer. 用户可以按照PyTorch的API文档直接设置参数。

API 文档直接设置参数。

9.1.2自定义自行实现的优化器

(1)定义一个新的优化器

假设您要添加一个名为 的优化器MyOptimizer,它具有参数abc。您需要创建一个名为mmrotate/core/optimizer. 然后在一个文件中实现新的优化器,例如,在mmrotate/core/optimizer/my_optimizer.py

from mmdet.core.optimizer.registry import OPTIMIZERS
from torch.optim import Optimizer


@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):

    def __init__(self, a, b, c)

(2)将优化器添加到注册表

要找到上面定义的模块,首先应该将该模块导入主命名空间。有两种选择来实现它。

  • 修改mmrotate/core/optimizer/init.py以导入它。

应该导入新定义的模块,mmrotate/core/optimizer/init.py以便注册表找到新模块并添加它:

from .my_optimizer import MyOptimizer
  • custom_imports在配置中使用手动导入它
custom_imports = dict(imports=['mmrotate.core.optimizer.my_optimizer'], allow_failed_imports=False)

模块mmrotate.core.optimizer.my_optimizer将在程序开始时导入,MyOptimizer然后自动注册类。MyOptimizer请注意,只应导入包含该类的包。mmrotate.core.optimizer.my_optimizer.MyOptimizer 不能直接导入。

实际上用户可以通过这种导入方式使用完全不同的文件目录结构,只要模块根目录可以位于PYTHONPATH.

(3)在配置文件中指定优化器

然后您可以MyOptimizeroptimizer配置文件的字段中使用。在配置中,优化器由如下字段定义optimizer

optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)

要使用您自己的优化器,该字段可以更改为 :

optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)

9.1.3自定义优化器构造函数

一些模型可能有一些特定参数的优化设置,例如 BatchNorm 层的权重衰减。用户可以通过自定义优化器构造函数来进行这些细粒度的参数调整。

from mmcv.utils import build_from_cfg

from mmcv.runner.optimizer import OPTIMIZER_BUILDERS, OPTIMIZERS
from mmrotate.utils import get_root_logger
from .my_optimizer import MyOptimizer


@OPTIMIZER_BUILDERS.register_module()
class MyOptimizerConstructor(object):

    def __init__(self, optimizer_cfg, paramwise_cfg=None):

    def __call__(self, model):

        return my_optimizer
# 默认的优化器构造函数在这里实现,它也可以作为新的优化器构造函数的模板

默认的优化器构造函数在这里 。

9.1.4其他设置

优化器未实现的技巧应该通过优化器构造函数(例如,设置参数学习率)或钩子来实现。我们列出了一些可以稳定训练或加速训练的常见设置。随意创建 PR,发布更多设置。

  • 使用梯度剪辑来稳定训练:一些模型需要梯度剪辑来剪辑梯度以稳定训练过程。一个例子如下:
optimizer_config = dict(
    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))

如果您的配置继承了已经设置的基本配置optimizer_config,您可能需要_delete_=True覆盖不必要的设置。有关更多详细信息,请参阅配置文档。

  • 使用动量调度加速模型收敛:我们支持动量调度器根据学习率修改模型的动量,这可以使模型以更快的方式收敛。Momentum 调度器通常与 LR 调度器一起使用,例如,在 3D 检测中使用以下配置来加速收敛。更多细节请参考CyclicLrUpdater和CyclicMomentumUpdater的实现。
lr_config = dict(
    policy='cyclic',
    target_ratio=(10, 1e-4),
    cyclic_times=1,
    step_ratio_up=0.4,
)
momentum_config = dict(
    policy='cyclic',
    target_ratio=(0.85 / 0.95, 1),
    cyclic_times=1,
    step_ratio_up=0.4,
)

9.1.5自定义培训计划

工作流是(阶段,时期)的列表,用于指定运行顺序和时期。默认情况下,它设置为

workflow = [('train', 1)]

这意味着运行 1 个 epoch 进行训练。有时用户可能想要检查验证集上模型的一些指标(例如损失、准确性)。在这种情况下,我们可以将工作流设置为

# 这样 1 个 epoch 的训练和 1 个 epoch 的验证将被迭代运行。
[('train', 1), ('val', 1)]

注意

  1. 在 val epoch 期间模型的参数不会更新。
  2. 配置中的关键字total_epochs仅控制训练 epoch 的数量,不会影响验证工作流程。
  3. 工作流不会改变因为被调用的行为,而验证工作流只会影响被调用的钩子。因此,和之间的唯一区别是跑步者将在每个训练时期后计算验证集上的损失。[('train', 1), ('val', 1)]``[('train', 1)]``EvalHook``EvalHook``after_train_epoch``after_val_epoch``[('train', 1), ('val', 1)]``[('train', 1)]

9.1.6自定义挂钩

1. 实现一个新的钩子

在某些情况下,用户可能需要实现一个新的钩子。MMRotate 支持训练中的自定义钩子。因此,用户可以直接在 mmrotate 或其基于 mmdet 的代码库中实现钩子,并通过仅在训练中修改配置来使用钩子。这里我们举一个例子,在 mmrotate 中创建一个新的钩子并在训练中使用它。

from mmcv.runner import HOOKS, Hook

@HOOKS.register_module()

class MyHook(Hook):

def __init__(self, a, b):
    pass

def before_run(self, runner):
    pass

def after_run(self, runner):
    pass

def before_epoch(self, runner):
    pass

def after_epoch(self, runner):
    pass

def before_iter(self, runner):
    pass

def after_iter(self, runner):
    pass

注:根据钩子的功能,用户需要在before_runafter_runbefore_epochafter_epochbefore_iter和中指定钩子在训练的每个阶段将做什么after_iter

2.注册新的钩子

然后我们需要MyHook导入。假设文件在mmrotate/core/utils/my_hook.py有两种方法可以做到这一点:

  • 修改mmrotate/core/utils/__init__.py以导入它。

    应该导入新定义的模块,mmrotate/core/utils/__init__.py以便注册表找到新模块并添加它:

from .my_hook import MyHook
  • custom_imports在配置中使用手动导入它
custom_imports = dict(imports=['mmrotate.core.utils.my_hook'], allow_failed_imports=False)

3.修改配置

custom_hooks = [
    dict(type='MyHook', a=a_value, b=b_value)
]

您还可以通过添加键或如下设置挂钩priority'NORMAL'优先'HIGHEST'

custom_hooks = [
    dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
]

注:默认情况下,挂钩的优先级设置为NORMAL注册期间。

9.1.7使用在 MMCV 中实现的钩子

如果 MMCV 中已经实现了该钩子,则可以直接修改配置以使用该钩子,如下所示

4. 示例:NumClassCheckHook

我们实现了一个名为 NumClassCheckHook的自定义钩子来检查 in 头是否与innum_classes的长度匹配。CLASSSES``dataset

我们在default_runtime.py中设置它。

custom_hooks = [dict(type=‘NumClassCheckHook’)]

9.1.8修改默认运行时挂钩

有一些常见的钩子不是通过注册的custom_hooks,它们是

  • 日志配置
  • 检查点配置
  • 评估
  • lr_config
  • 优化器配置
  • 动量配置

在这些钩子中,只有 logger 钩子具有VERY_LOW优先级,其他钩子的优先级是NORMAL. 上述教程已经介绍了如何修改optimizer_configmomentum_configlr_config. 在这里,我们揭示了我们可以如何使用log_configcheckpoint_configevaluation

检查点配置

MMCV 运行器将用于checkpoint_config初始化CheckpointHook.

注:用户可以设置max_keep_ckpts只保存少量检查点或决定是否存储优化器的状态字典save_optimizer。论点的更多细节在这里

日志配置

包装多个记录器log_config挂钩并允许设置间隔。现在 MMCV 支持WandbLoggerHookMlflowLoggerHookTensorboardLoggerHook. 详细用法可以在文档中找到。

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

评估配置

评估配置evaluation将用于初始化EvalHook. 除了 key interval,其他参数如metric将传递给dataset.evaluate()

evaluation = dict(interval=1, metric='bbox')

10、日志分析

tools/analysis_tools/analyze_logs.py在给定训练日志文件的情况下绘制损失/mAP 曲线。首先运行以安装依赖项。pip install seaborn

python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]

例子:

  • 绘制一些运行的分类损失。
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls
  • 绘制一些运行的分类和回归损失,并将图形保存为 pdf。
python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf
  • 比较同一图中两次运行的 bbox mAP。
python tools/analysis_tools/analyze_logs.py plot_curve log1.json log2.json --keys bbox_mAP --legend run1 run2
  • 计算平均训练速度。
python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]

输出预计如下所示。

-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
slowest epoch 11, average time is 1.2024
fastest epoch 1, average time is 1.1909
time std over epochs is 0.0028
average iter time: 1.1959 s/iter

11、可视化

11.1可视化数据集

tools/misc/browse_dataset.py帮助用户直观地浏览检测数据集(包括图像和边界框注释),或将图像保存到指定目录。

python tools/misc/browse_dataset.py ${CONFIG} [-h] [--skip-type ${SKIP_TYPE[SKIP_TYPE...]}] [--output-dir ${OUTPUT_DIR}] [--not-show] [--show-interval ${SHOW_INTERVAL}]

你可能感兴趣的:(目标检测研究,pytorch,深度学习,人工智能)