sinian_四年

mmrotate

转载mmrotate

1、环境安装

1.1使用虚拟环境安装

创建虚拟环境

conda create -n open-mmlab python=3.7

安装pytorch环境（包括torchvision、cudatookit）

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda install pytorch==1.7.0 cudatoolkit=10.1 torchvision==0.8.0

安装openmim

pip install openmim

通过openmim安装mmcv-full、mmdet、mmrotate

mim install mmcv-full
mim install mmdet
#安装mmrotate有点特殊，可以使用mim install mmrotate,如果不成功，使用手动安装。
#去官网下载mmrotate或者git clone https://github.com/open-mmlab/mmrotate.git
cd mmrotate
pip install -r requirements/build.txt
pip install -v -e .(过程中可能需要C++开发工具，下一个visual studio开发工具安装就行)

1.2使用docker进行安装

使用Dockerfile创建镜像（此为另外一种方法）

# build an image with PyTorch 1.6, CUDA 10.1
docker build -t mmrotate docker/
# use the image generate one container
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmrotate/data mmrotate

2、验证环境安装是否成功

# cd mmrotate 
python demo/image_demo.py \
        demo/demo.jpg \
        work_dirs1/oriented_rcnn_r50_fpn_1x_dota_v3/oriented_rcnn_r50_fpn_1x_dota_v3.py \
        work_dirs2/oriented_rcnn_r50_fpn_1x_dota_v3/epoch_12.pth
# 注：work_dirs1为你存放config的路径，work_dirs2为你存放权重的路径

3、构建属于自己的数据集

3.1数据集下载地址

目前mmrotate构建的数据集只有：DOTA数据集、SSDD数据集、HRSC数据集、HRSID 数据集，对于这几个数据集只需要更改config里面对应存放照片和标签的路径（data_root ）。对于其他自己的数据集需要自己构建。

DOTA数据集下载地址：https://captain-whu.github.io/DOTA/dataset.html
SSDD数据集下载地址：https://pan.baidu.com/s/1_uezALB6eZ7DiPIozFoGJQ 密码：0518
HRSC数据集下载地址：https://aistudio.baidu.com/aistudio/datasetdetail/54106
HRSID数据集下载地址：https://pan.baidu.com/share/init?surl=vks9fj64Bb06U170GNL7mw 密码：0518

3.2数据集存储结构

DOTA 数据集存储结构

# DOTA                         
mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── DOTA
│   │   ├── train
│   │   ├── val
│   │   ├── test

ssdd 数据集存储结构

# ssdd
mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── ssdd
│   │   ├── train
│   │   ├── test

hrsc数据集存储结构

mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── hrsc
│   │   ├── FullDataSet
│   │   │   ├─ AllImages
│   │   │   ├─ Annotations
│   │   │   ├─ LandMask
│   │   │   ├─ Segmentations
│   │   ├── ImageSets

hrsid 数据集存储结构

mmrotate
├── mmrotate
├── tools
├── configs
├── data
│   ├── hrsid
│   │   ├── trainsplit
│   │   ├── valsplit
│   │   ├── testsplit

3.3对应配置文件修改

change data_root in configs/_base_/datasets/dotav1.py to split DOTA dataset.
change data_root in configs/_base_/datasets/ssdd.py to data/ssdd/.
change data_root in configs/_base_/datasets/hrsc.py to data/hrsc/.
change data_root in configs/_base_/datasets/hrisd.py to data/hrsid/.

3.4 Split dota dataset

Please crop the original images into 1024×1024 patches with an overlap of 200 by run

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ss_trainval.json

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ss_test.json

If you want to get a multiple scale dataset, you can run the following command.

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ms_trainval.json

python tools/data/dota/split/img_split.py --base-json \
  tools/data/dota/split/split_configs/ms_test.json

Please change the img_dirs and ann_dirs in json.

4、测试模型

包括三类单GPU、单节点多GPU、多节点

# single-gpu
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

# multi-gpu
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [optional arguments]

# multi-node in slurm environment
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments] --launcher slurm

例子：参考 RotatedRetinaNet on DOTA-1.0 dataset，可以生成压缩文件在线提交。（先更改data_root。）

# 单GPU
python ./tools/test.py  \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth --format-only \
  --eval-options submission_dir=work_dirs/Task1_results
# 单节点多GPU,指定GPU的数目为1
./tools/dist_test.sh  \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth 1 --format-only \
  --eval-options submission_dir=work_dirs/Task1_results

您可以将data_root中的测试集路径更改为 val 集或 trainval 集以进行离线评估。

# 单GPU
python ./tools/test.py \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth --eval mAP
# 单节点多GPU,指定GPU的数目为1
./tools/dist_test.sh  \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth 1 --eval mAP

将结果进行可视化

python ./tools/test.py \
  configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py \
  checkpoints/SOME_CHECKPOINT.pth \
  --show-dir work_dirs/vis

5、训练模型

模型训练一共分为5种方式：使用单个 GPU 进行训练、使用多个 GPU 进行训练、多台机器训练、使用 Slurm 管理作业、在一台机器上启动多个作业。

1、使用单个 GPU 进行训练

# 单GPU，如果要在命令中指定工作目录，可以添加参数。--work_dir ${YOUR_WORK_DIR}
python tools/train.py ${CONFIG_FILE} [optional arguments]

2、使用多个 GPU 进行训练

# 单节点多GPU
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

可选参数是：
--no-validate（不建议）：默认情况下，代码库将在训练期间执行评估。要禁用此行为，请使用--no-validate.
--work-dir ${WORK_DIR}：覆盖配置文件中指定的工作目录。
--resume-from ${CHECKPOINT_FILE}：从以前的检查点文件恢复。
resume-from和之间的区别load-from： resume-from同时加载模型权重和优化器状态，epoch 也是从指定的检查点继承而来的。它通常用于恢复意外中断的训练过程。 load-from只加载模型权重，训练epoch从0开始。通常用于finetuning。

3、使用多台机器进行训练

使用仅通过以太网连接的多台机器启动，可以简单地运行以下命令：
# 在第一台机器上：
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
# 在第二台机器上：
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
注：没有像 InfiniBand 这样的高速网络，通常会很慢。

4、在一台机器上启动多个作业

如果在单台机器上启动多个作业，例如在一台有 8 个 GPU 的机器上进行 2 个 4-GPU 训练的作业，则需要为每个作业指定不同的端口（默认为 29500）以避免通信冲突。

（1）若用dist_train.sh启动训练作业，您可以在命令中设置端口。

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4

（2）若使用 Slurm 启动训练作业，则需要修改配置文件（通常配置文件底部的第 6 行）以设置不同的通信端口。

# 在config1.py：
dist_params = dict(backend='nccl', port=29500)
# 在config2.py：
dist_params = dict(backend='nccl', port=29501)
#启动两个作业。
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}

5、基准和模型库

5.1基准和模型库

Rotated RetinaNet-OBB/HBB (ICCV’2017)
Rotated FasterRCNN-OBB (TPAMI’2017)
Rotated RepPoints-OBB (ICCV’2019)
RoI Transformer (CVPR’2019)
Gliding Vertex (TPAMI’2020)
Rotated ATSS-OBB (CVPR’2020)
CSL (ECCV’2020)
R3Det (AAAI’2021)
S2A-Net (TGRS’2021)
ReDet (CVPR’2021)
Beyond Bounding-Box (CVPR’2021)
Oriented R-CNN (ICCV’2021)
GWD (ICML’2021)
KLD (NeurIPS’2021)
[SASM](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/docs/en/configs/sasm_reppoints/README.md) (AAAI’2022)
KFIoU (arXiv)
G-Rep (stay tuned)

5.1Results on DOTA v1.0

Backbone	mAP	Angle	lr schd	Mem (GB)	Inf Time (fps)	Aug	Batch Size	Configs	Download
ResNet50 (1024,1024,200)	59.44	oc	1x	3.45	15.6	-	2	[rotated_reppoints_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_reppoints/rotated_reppoints_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	64.55	oc	1x	3.38	15.7	-	2	[rotated_retinanet_hbb_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_hbb_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	65.59	oc	1x	3.12	18.5	-	2	[rotated_atss_hbb_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_atss/rotated_atss_hbb_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	66.45	oc	1x	3.53	15.3	-	2	[sasm_reppoints_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/sasm/sasm_reppoints_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	68.42	le90	1x	3.38	16.9	-	2	[rotated_retinanet_obb_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	68.79	le90	1x	2.36	22.4	-	2	[rotated_retinanet_obb_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet_obb_r50_fpn_fp16_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	69.49	le135	1x	4.05	8.6	-	2	[g_reppoints_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/g_reppoints/g_reppoints_r50_fpn_1x_dota_le135.py)	model \| log
ResNet50 (1024,1024,200)	69.51	le90	1x	4.40	24.0	-	2	[rotated_retinanet_obb_csl_gaussian_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/csl/rotated_retinanet_obb_csl_gaussian_r50_fpn_fp16_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	69.55	oc	1x	3.39	15.5	-	2	[rotated_retinanet_hbb_gwd_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/gwd/rotated_retinanet_hbb_gwd_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	69.60	le90	1x	3.38	15.1	-	2	[rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	69.63	le135	1x	3.45	16.1	-	2	[cfa_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/cfa/cfa_r50_fpn_1x_dota_le135.py)	model \| log
ResNet50 (1024,1024,200)	69.76	oc	1x	3.39	15.6	-	2	[rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	69.77	le135	1x	3.38	15.3	-	2	[rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/rotated_retinanet_hbb_kfiou_r50_fpn_1x_dota_le135.py)	model \| log
ResNet50 (1024,1024,200)	69.79	le135	1x	3.38	17.2	-	2	[rotated_retinanet_obb_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le135.py)	model \| log
ResNet50 (1024,1024,200)	69.80	oc	1x	3.54	12.4	-	2	[r3det_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/r3det/r3det_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	69.94	oc	1x	3.39	15.6	-	2	[rotated_retinanet_hbb_kld_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kld/rotated_retinanet_hbb_kld_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	70.18	oc	1x	3.23	15.6	-	2	[r3det_tiny_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/r3det/r3det_tiny_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	70.64	le90	1x	3.12	18.2	-	2	[rotated_atss_obb_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_atss/rotated_atss_obb_r50_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	71.83	oc	1x	3.54	12.4	-	2	[r3det_kld_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kld/r3det_kld_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	72.29	le135	1x	3.19	18.8	-	2	[rotated_atss_obb_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_atss/rotated_atss_obb_r50_fpn_1x_dota_le135.py)	model \| log
ResNet50 (1024,1024,200)	72.68	oc	1x	3.62	12.2	-	2	[r3det_kfiou_ln_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kfiou/r3det_kfiou_ln_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	72.76	oc	1x	3.44	14.0	-	2	[r3det_tiny_kld_r50_fpn_1x_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/kld/r3det_tiny_kld_r50_fpn_1x_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	73.23	le90	1x	8.45	16.4	-	2	[gliding_vertex_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/gliding_vertex/gliding_vertex_r50_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	73.40	le90	1x	8.46	16.5	-	2	[rotated_faster_rcnn_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	73.45	oc	40e	3.45	16.1	-	2	[cfa_r50_fpn_40e_dota_oc](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/cfa/cfa_r50_fpn_40e_dota_oc.py)	model \| log
ResNet50 (1024,1024,200)	73.91	le135	1x	3.14	15.5	-	2	[s2anet_r50_fpn_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/s2anet/s2anet_r50_fpn_1x_dota_le135.py)	model \| log
ResNet50 (1024,1024,200)	74.19	le135	1x	2.17	17.4	-	2	[s2anet_r50_fpn_fp16_1x_dota_le135](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/s2anet_r50_fpn_fp16_1x_dota_le135.py)	model \| log
ResNet50 (1024,1024,200)	75.63	le90	1x	7.37	21.2	-	2	[oriented_rcnn_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/oriented_rcnn_r50_fpn_fp16_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	75.69	le90	1x	8.46	16.2	-	2	[oriented_rcnn_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	75.75	le90	1x	7.56	19.3	-	2	[roi_trans_r50_fpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans_r50_fpn_fp16_1x_dota_le90.py)	model \| log
ReResNet50 (1024,1024,200)	75.99	le90	1x	7.71	13.3	-	2	[redet_re50_refpn_fp16_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/redet_re50_refpn_fp16_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	76.08	le90	1x	8.67	14.4	-	2	[roi_trans_r50_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans/roi_trans_r50_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	76.50	le90	1x		17.5	MS+RR	2	[rotated_retinanet_obb_r50_fpn_1x_dota_ms_rr_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_ms_rr_le90.py)	model \| log
ReResNet50 (1024,1024,200)	76.68	le90	1x	9.32	10.9	-	2	[redet_re50_refpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/redet/redet_re50_refpn_1x_dota_le90.py)	model \| log
Swin-tiny (1024,1024,200)	77.51	le90	1x		10.9	-	2	[roi_trans_swin_tiny_fpn_1x_dota_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans/roi_trans_swin_tiny_fpn_1x_dota_le90.py)	model \| log
ResNet50 (1024,1024,200)	79.66	le90	1x		14.4	MS+RR	2	[roi_trans_r50_fpn_1x_dota_ms_rr_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/roi_trans/roi_trans_r50_fpn_1x_dota_ms_rr_le90.py)	model \| log
ReResNet50 (1024,1024,200)	79.87	le90	1x		10.9	MS+RR	2	[redet_re50_refpn_1x_dota_ms_rr_le90](file:/C:/Users/hp-pc/Desktop/Experiment/mmrotate/configs/redet/redet_re50_refpn_1x_dota_ms_rr_le90.py)	model \| log

MS means multiple scale image split.
RR means random rotation.
The above models are trained with 1 * 1080Ti/2080Ti and inferred with 1 * 2080Ti.
model|log : model weight|log

6、了解配置文件

6.1更新字典链的配置键。

配置选项可以按照原始配置中的dict键的顺序指定。例如，——cfg-options model.backbone。norm_eval=False将模型骨干中的所有BN模块更改为训练模式。

6.2更新配置列表中的键。

有些配置字典在配置中组成一个列表。例如，训练管道data.train.pipeline通常是一个列表，例如[dict(type=‘LoadImageFromFile’)，…]。如果你想把LoadImageFromFile修改为LoadImageFromWebcam，你可以指定——cfg-options data.train.pipeline.0.type=LoadImageFromWebcam。

6.3更新列表/元组的值。

如果要更新的值是列表或元组。例如，配置文件通常设置工作流=[(‘train’， 1)]。如果你想改变这个键，你可以指定——cfg-options workflow=“[(train,1)，(val,1)]”。请注意，引号“是支持列表/元组数据类型所必需的，并且在指定值的引号内不允许有空格。

6.4配置文件命名约定

{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{dataset}_{data setting}_{angle version}

{xxx}是必填字段并且[yyy]是可选的。

{model}: 模型类型，如rotated_faster_rcnn,rotated_retinanet等。
[model setting]：某些模型的特定设置，例如hbbforrotated_retinanet等。
{backbone}：骨干类型，如r50（ResNet-50），swin_tiny（SWIN-tiny）。
{neck}: 颈型如fpn, refpn.
[norm_setting]: bn(Batch Normalization) 除非指定，否则使用，其他规范层类型可以是gn(Group Normalization), syncbn(Synchronized Batch Normalization)。 gn-head/gn-neck表示 GN 仅应用于头部/颈部，而gn-all表示 GN 应用于整个模型，例如骨干、颈部、头部。
[misc]: 模型的其他设置/插件，例如dconv, gcb, attention, albu, mstrain.
[gpu x batch_per_gpu]：GPU 和每个 GPU 的样本，1xb2默认使用。
{dataset}: 数据集如dota.
{angle version}: 像oc, le135, 或le90.

6.5配置Example

ResNet50 和 FPN 的在 RotatedRetinaNet 中的配置

angle_version = 'oc'  # The angle version
model = dict(
    type='RotatedRetinaNet',  # The name of detector
    backbone=dict(  # The config of backbone
        type='ResNet',  # The type of the backbone
        depth=50,  # The depth of backbone
        num_stages=4,  # Number of stages of the backbone.
        out_indices=(0, 1, 2, 3),  # The index of output feature maps produced in each stages
        frozen_stages=1,  # The weights in the first 1 stage are fronzen
        zero_init_residual=False,  # Whether to use zero init for last norm layer in resblocks to let them behave as identity.
        norm_cfg=dict(  # The config of normalization layers.
            type='BN',  # Type of norm layer, usually it is BN or GN
            requires_grad=True),  # Whether to train the gamma and beta in BN
        norm_eval=True,  # Whether to freeze the statistics in BN
        style='pytorch',  # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs.
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),  # The ImageNet pretrained backbone to be loaded
    neck=dict(
        type='FPN',  # The neck of detector is FPN. We also support 'ReFPN'
        in_channels=[256, 512, 1024, 2048],  # The input channels, this is consistent with the output channels of backbone
        out_channels=256,  # The output channels of each level of the pyramid feature map
        start_level=1,  # Index of the start input backbone level used to build the feature pyramid
        add_extra_convs='on_input',  # It specifies the source feature map of the extra convs
        num_outs=5),  # The number of output scales
    bbox_head=dict(
        type='RotatedRetinaHead',# The type of bbox head is 'RRetinaHead'
        num_classes=15,  # Number of classes for classification
        in_channels=256,  # Input channels for bbox head
        stacked_convs=4,  # Number of stacking convs of the head
        feat_channels=256,  # Number of hidden channels
        assign_by_circumhbbox='oc',  # The angle version of obb2hbb
        anchor_generator=dict(  # The config of anchor generator
            type='RotatedAnchorGenerator',  # The type of anchor generator
            octave_base_scale=4,  # The base scale of octave.
            scales_per_octave=3,  #  Number of scales for each octave.
            ratios=[1.0, 0.5, 2.0],  # The ratio between height and width.
            strides=[8, 16, 32, 64, 128]),  # The strides of the anchor generator. This is consistent with the FPN feature strides.
        bbox_coder=dict(  # Config of box coder to encode and decode the boxes during training and testing
            type='DeltaXYWHAOBBoxCoder',  # Type of box coder.
            angle_range='oc',  # The angle version of box coder.
            norm_factor=None,  # The norm factor of box coder.
            edge_swap=False,  # The edge swap flag of box coder.
            proj_xy=False,  # The project flag of box coder.
            target_means=(0.0, 0.0, 0.0, 0.0, 0.0),  # The target means used to encode and decode boxes
            target_stds=(1.0, 1.0, 1.0, 1.0, 1.0)),  # The standard variance used to encode and decode boxes
        loss_cls=dict(  # Config of loss function for the classification branch
            type='FocalLoss',  # Type of loss for classification branch
            use_sigmoid=True,  #  Whether the prediction is used for sigmoid or softmax
            gamma=2.0,  # The gamma for calculating the modulating factor
            alpha=0.25,  # A balanced form for Focal Loss
            loss_weight=1.0),  # Loss weight of the classification branch
        loss_bbox=dict(  # Config of loss function for the regression branch
            type='L1Loss',  # Type of loss
            loss_weight=1.0)),  # Loss weight of the regression branch
    train_cfg=dict(  # Config of training hyperparameters
        assigner=dict(  # Config of assigner
            type='MaxIoUAssigner',  # Type of assigner
            pos_iou_thr=0.5,  # IoU >= threshold 0.5 will be taken as positive samples
            neg_iou_thr=0.4,  # IoU < threshold 0.4 will be taken as negative samples
            min_pos_iou=0,  # The minimal IoU threshold to take boxes as positive samples
            ignore_iof_thr=-1,  # IoF threshold for ignoring bboxes
            iou_calculator=dict(type='RBboxOverlaps2D')),  # Type of Calculator for IoU
        allowed_border=-1,  # The border allowed after padding for valid anchors.
        pos_weight=-1,  # The weight of positive samples during training.
        debug=False),  # Whether to set the debug mode
    test_cfg=dict(  # Config of testing hyperparameters
        nms_pre=2000,  # The number of boxes before NMS
        min_bbox_size=0,  # The allowed minimal box size
        score_thr=0.05,  # Threshold to filter out boxes
        nms=dict(iou_thr=0.1), # NMS threshold
        max_per_img=2000))  # The number of boxes to be kept after NMS.
dataset_type = 'DOTADataset'  # Dataset type, this will be used to define the dataset
data_root = '../datasets/split_1024_dota1_0/'  # Root path of data
img_norm_cfg = dict(  # Image normalization config to normalize the input images
    mean=[123.675, 116.28, 103.53],  # Mean values used to pre-training the pre-trained backbone models
    std=[58.395, 57.12, 57.375],  # Standard variance used to pre-training the pre-trained backbone models
    to_rgb=True)  # The channel orders of image used to pre-training the pre-trained backbone models
train_pipeline = [  # Training pipeline
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(type='LoadAnnotations',  # Second pipeline to load annotations for current image
         with_bbox=True),  # Whether to use bounding box, True for detection
    dict(type='RResize',  # Augmentation pipeline that resize the images and their annotations
         img_scale=(1024, 1024)),  # The largest scale of image
    dict(type='RRandomFlip',  # Augmentation pipeline that flip the images and their annotations
         flip_ratio=0.5,  # The ratio or probability to flip
         version='oc'),  # The angle version
    dict(
        type='Normalize',  # Augmentation pipeline that normalize the input images
        mean=[123.675, 116.28, 103.53],  # These keys are the same of img_norm_cfg since the
        std=[58.395, 57.12, 57.375],  # keys of img_norm_cfg are used here as arguments
        to_rgb=True),
    dict(type='Pad',  # Padding config
         size_divisor=32),  # The number the padded images should be divisible
    dict(type='DefaultFormatBundle'),  # Default format bundle to gather data in the pipeline
    dict(type='Collect',  # Pipeline that decides which keys in the data should be passed to the detector
         keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(
        type='MultiScaleFlipAug',  # An encapsulation that encapsulates the testing augmentations
        img_scale=(1024, 1024),  # Decides the largest scale for testing, used for the Resize pipeline
        flip=False,  # Whether to flip images during testing
        transforms=[
            dict(type='RResize'),  # Use resize augmentation
            dict(
                type='Normalize',  # Normalization config, the values are from img_norm_cfg
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad',  # Padding config to pad images divisible by 32.
                 size_divisor=32),
            dict(type='DefaultFormatBundle'),  # Default format bundle to gather data in the pipeline
            dict(type='Collect',  # Collect pipeline that collect necessary keys for testing.
                 keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,  # Batch size of a single GPU
    workers_per_gpu=2,  # Worker to pre-fetch data for each single GPU
    train=dict(  # Train dataset config
        type='DOTADataset',  # Type of dataset
        ann_file=
        '../datasets/split_1024_dota1_0/trainval/annfiles/',  # Path of annotation file
        img_prefix=
        '../datasets/split_1024_dota1_0/trainval/images/',  # Prefix of image path
        pipeline=[  # pipeline, this is passed by the train_pipeline created before.
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(type='RRandomFlip', flip_ratio=0.5, version='oc'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ],
        version='oc'),
    val=dict(  # Validation dataset config
        type='DOTADataset',
        ann_file=
        '../datasets/split_1024_dota1_0/trainval/annfiles/',
        img_prefix=
        '../datasets/split_1024_dota1_0/trainval/images/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='oc'),
    test=dict(  # Test dataset config, modify the ann_file for test-dev/test submission
        type='DOTADataset',
        ann_file=
        '../datasets/split_1024_dota1_0/test/images/',
        img_prefix=
        '../datasets/split_1024_dota1_0/test/images/',
        pipeline=[  # Pipeline is passed by test_pipeline created before
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        version='oc'))
evaluation = dict(  # The config to build the evaluation hook
    interval=12,  # Evaluation interval
    metric='mAP')  # Metrics used during evaluation
optimizer = dict(  # Config used to build optimizer
    type='SGD',  # Type of optimizers
    lr=0.0025,  # Learning rate of optimizers
    momentum=0.9,  # Momentum
    weight_decay=0.0001)  # Weight decay of SGD
optimizer_config = dict(  # Config used to build the optimizer hook
    grad_clip=dict(
        max_norm=35,
        norm_type=2))
lr_config = dict(  # Learning rate scheduler config used to register LrUpdater hook
    policy='step',  # The policy of scheduler
    warmup='linear',  # The warmup policy, also support `exp` and `constant`.
    warmup_iters=500,  # The number of iterations for warmup
    warmup_ratio=0.3333333333333333,  # The ratio of the starting learning rate used for warmup
    step=[8, 11])  # Steps to decay the learning rate
runner = dict(
    type='EpochBasedRunner',  # Type of runner to use (i.e. IterBasedRunner or EpochBasedRunner)
    max_epochs=12) # Runner that runs the workflow in total max_epochs. For IterBasedRunner use `max_iters`
checkpoint_config = dict(  # Config to set the checkpoint hook
    interval=12)  # The save interval is 12
log_config = dict(  # config to register logger hook
    interval=50,  # Interval to print the log
    hooks=[
        # dict(type='TensorboardLoggerHook')  # The Tensorboard logger is also supported
        dict(type='TextLoggerHook')
    ])  # The logger used to record the training process.
dist_params = dict(backend='nccl')  # Parameters to setup distributed training, the port can also be set.
log_level = 'INFO'  # The level of logging.
load_from = None  # load models as a pre-trained model from a given path. This will not resume training.
resume_from = None  # Resume checkpoints from a given path, the training will be resumed from the epoch when the checkpoint's is saved.
workflow = [('train', 1)]  # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 12 epochs according to the total_epochs.
work_dir = './work_dirs/rotated_retinanet_hbb_r50_fpn_1x_dota_oc'  # Directory to save the model checkpoints and logs for the current experiments.

6.6常见问题

配置文件中使用了一些中间变量，例如数据集中的train_pipeline/ test_pipeline。值得注意的是，在修改子配置中的中间变量时，用户需要再次将中间变量传递到相应的字段中。例如，我们想使用离线多尺度策略来训练 RoI-Trans。train_pipeline是我们想要修改的中间变量。

#我们首先定义新的train_pipeline/test_pipeline并将它们传递给data.
_base_ = ['./roi_trans_r50_fpn_1x_dota_le90.py']

data_root = '../datasets/split_ms_dota1_0/'
angle_version = 'le90'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version=angle_version),
    dict(
        type='PolyRandomRotate',
        rotate_ratio=0.5,
        angles_range=180,
        auto_bound=False,
        version=angle_version),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
data = dict(
    train=dict(
        pipeline=train_pipeline,
        ann_file=data_root + 'trainval/annfiles/',
        img_prefix=data_root + 'trainval/images/'),
    val=dict(
        ann_file=data_root + 'trainval/annfiles/',
        img_prefix=data_root + 'trainval/images/'),
    test=dict(
        ann_file=data_root + 'test/images/',
        img_prefix=data_root + 'test/images/'))

#同样，如果我们想从 切换SyncBN到BNor MMSyncBN，我们需要替换norm_cfg配置中的 every 。
_base_ = './roi_trans_r50_fpn_1x_dota_le90.py'
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    backbone=dict(norm_cfg=norm_cfg),
    neck=dict(norm_cfg=norm_cfg),
    ...)

7、自定义数据集

7.1将新数据格式重组为现有格式

# 最简单的方法是将数据集转换为现有数据集格式 (DOTA)。
# DOTA格式的注解txt文件：
184 2875 193 2923 146 2932 137 2885 plane 0
66 2095 75 2142 21 2154 11 2107 plane 0
...
# 每行代表一个对象，并将其记录为一个 10 维数组A。
# A[0:8]: 具有格式的多边形。(x1, y1, x2, y2, x3, y3, x4, y4)
# A[8]： 类别。
# A[9]: 困难。

7.2自定义新数据集

修改配置文件以使用自定义数据集。
检查自定义数据集的注释。

7.3自定义新数据集Example

这里我们举一个例子来展示上述两个步骤，它使用一个自定义的 5 类 COCO 格式的数据集来训练一个现有的 Cascade Mask R-CNN R50-FPN 检测器。

1.修改配置文件以使用自定义数据集

配置文件的修改涉及两个方面：

data方面。具体来说，您需要显式地在数据中添加classes字段在data.train``data.val``data.test中

模型部分中的num_classes字段。显式重写所有num_classes的默认值(例如COCO中的80)到你的类号。

#configs/my_custom_config.py：

# the new config inherits the base configs to highlight the necessary modification
_base_ = './rotated_retinanet_hbb_r50_fpn_1x_dota_oc'

# 1. dataset settings
dataset_type = 'DOTADataset'
classes = ('a', 'b', 'c', 'd', 'e')
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/train/annotation_data',
        img_prefix='path/to/your/train/image_data'),
    val=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/val/annotation_data',
        img_prefix='path/to/your/val/image_data'),
    test=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file='path/to/your/test/annotation_data',
        img_prefix='path/to/your/test/image_data'))

# 2. model settings
model = dict(
    bbox_head=dict(
        type='RotatedRetinaHead',
        # explicitly over-write all the `num_classes` field from default 15 to 5.
        num_classes=15))

2.查看自定义数据集的注解

假设您的自定义数据集是 DOTA 格式，请确保您在自定义数据集中具有正确的注释：

配置文件中的classes字段应该与txt注释中的A[8]具有完全相同的元素和顺序。MMRotate自动将类别中的不连续id映射到连续的标签索引，因此类别字段中名称的字符串顺序会影响标签索引的顺序。同时，类在config中的字符串顺序会影响到预测边界框可视化过程中的标签文本。

7.4通过数据集包装器自定义数据集

MMRotate 还支持许多数据集包装器来混合数据集或修改数据集分布以进行训练。目前它支持三个数据集包装器，如下所示：

RepeatDataset: 简单地重复整个数据集。
ClassBalancedDataset：以类平衡的方式重复数据集。
ConcatDataset: 连接数据集。

7.4.1重复数据集

# 我们`RepeatDataset`用作包装器来重复数据集。例如，假设原始数据集是`Dataset_A`，重复一遍，配置如下所示 
dataset_A_train = dict(
        type='RepeatDataset',
        times=N,
        dataset=dict(  # This is the original config of Dataset_A
            type='Dataset_A',
            ...
            pipeline=train_pipeline
        )
    )

7.4.2类平衡数据集

# 我们使用ClassBalancedDataset包装器来根据类别频率重复数据集。要重复的数据集需要实例化函数
# self.get_cat_ids(idx) 来支持ClassBalancedDataset。例如，重复Dataset_A使用oversample_thr=1e-3，配置如下所示:
dataset_A_train = dict(
        type='ClassBalancedDataset',
        oversample_thr=1e-3,
        dataset=dict(  # This is the original config of Dataset_A
            type='Dataset_A',
            ...
            pipeline=train_pipeline
        )
    )

7.4.3连接数据集

有三种方法可以连接数据集。

1、如果要连接的数据集属于同一类型且具有不同的注释文件，则可以连接数据集配置，如下所示。

dataset_A_train = dict(
    type='Dataset_A',
    ann_file = ['anno_file_1', 'anno_file_2'],
    pipeline=train_pipeline
)
#如果拼接后的数据集用于测试或评估，这种方式支持对每个数据集分别进行评估。要测试连接的数据集作为一个整体，您可以设置separate_eval=False如下。
dataset_A_train = dict(
    type='Dataset_A',
    ann_file = ['anno_file_1', 'anno_file_2'],
    separate_eval=False,
    pipeline=train_pipeline
)

2、如果您要连接的数据集不同，您可以连接数据集配置，如下所示。

dataset_A_train = dict()
dataset_B_train = dict()

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train = [
        dataset_A_train,
        dataset_B_train
    ],
    val = dataset_A_val,
    test = dataset_A_test
    )
#注：如果拼接后的数据集用于测试或评估，这种方式还支持对每个数据集分别进行评估。

3、我们也支持ConcatDataset明确定义如下。

dataset_A_val = dict()
dataset_B_val = dict()

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dataset_A_train,
    val=dict(
        type='ConcatDataset',
        datasets=[dataset_A_val, dataset_B_val],
        separate_eval=False))
# 种方式允许用户通过设置将所有数据集评估为单个数据集separate_eval=False。

4、注意事项

该选项separate_eval=False假定数据集self.data_infos在评估期间使用。因此，COCO 数据集不支持这种行为，因为 COCO 数据集不完全依赖于self.data_infos评估。没有测试组合不同类型的数据集并对其进行整体评估，因此不建议这样做。
评估ClassBalancedDataset和RepeatDataset不支持因此评估这些类型的连接数据集也不支持。
一个更复杂的例子，分别重复NDataset_A和Dataset_BM 次，然后连接重复的数据集，如下所示。

dataset_A_train = dict(
    type='RepeatDataset',
    times=N,
    dataset=dict(
        type='Dataset_A',
        ...
        pipeline=train_pipeline
    )
)
dataset_A_val = dict(
    ...
    pipeline=test_pipeline
)
dataset_A_test = dict(
    ...
    pipeline=test_pipeline
)
dataset_B_train = dict(
    type='RepeatDataset',
    times=M,
    dataset=dict(
        type='Dataset_B',
        ...
        pipeline=train_pipeline
    )
)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train = [
        dataset_A_train,
        dataset_B_train
    ],
    val = dataset_A_val,
    test = dataset_A_test
)

8、自定义模型

模型基本上分为5种类型

主干：通常是一个 FCN 网络来提取特征图，例如 ResNet、Swin。
颈部：骨干和头部之间的组件，例如 FPN、ReFPN。
head：特定任务的组件，例如 bbox 预测。
roi extractor：用于从特征图中提取 RoI 特征的部分，例如 RoI Align Rotated。
loss：head中计算loss的组件，如FocalLoss、GWDLoss、KFIoULoss。

8.1开发新组件

8.1.1添加新的主干

1.定义一个新的主干网（例如MobileNet）

# 创建一个新文件mmrotate/models/backbones/mobilenet.py。
import torch.nn as nn
from mmrotate.models.builder import ROTATED_BACKBONES

@ROTATED_BACKBONES.register_module()
class MobileNet(nn.Module):

    def __init__(self, arg1, arg2):
        pass

    def forward(self, x):  # should return a tuple
        pass

2.导入模块

# 可以将以下行添加到mmrotate/models/backbones/__init__.py
from .mobilenet import MobileNet
# 或者添加以下代码到配置文件以避免修改原始代码。
custom_imports = dict(
    imports=['mmrotate.models.backbones.mobilenet'],
    allow_failed_imports=False)

3. 在你的配置文件中使用主干

model = dict(
    ...
    backbone=dict(
        type='MobileNet',
        arg1=xxx,
        arg2=xxx),
    ...

8.1.2添加新的脖子

1.定义颈部（例如PAFPN）

# 创建一个新文件mmrotate/models/necks/pafpn.py。
from mmrotate.models.builder import ROTATED_NECKS

@ROTATED_NECKS.register_module()
class PAFPN(nn.Module):

    def __init__(self,
                in_channels,
                out_channels,
                num_outs,
                start_level=0,
                end_level=-1,
                add_extra_convs=False):
        pass

    def forward(self, inputs):
        # implementation is ignored
        pass

2.导入模块

# 可以将以下行添加到mmrotate/models/necks/__init__.py
from .pafpn import PAFPN
# 或者添加以下代码到配置文件以避免修改原始代码。
custom_imports = dict(
    imports=['mmrotate.models.necks.pafpn.py'],
    allow_failed_imports=False)

3.修改配置文件

neck=dict(
    type='PAFPN',
    in_channels=[256, 512, 1024, 2048],
    out_channels=256,
    num_outs=5)

8.1.3添加新头

#首先，在mmrotate/models/roi_heads/bbox_heads/double_bbox_head.py. 双头 R-CNN 实现了一个新的 bbox 头来进行目标检测。要实现一个bbox head，基本上我们需要实现新模块的三个功能，如下所示。
from mmrotate.models.builder import ROTATED_HEADS
from mmrotate.models.roi_heads.bbox_heads.bbox_head import BBoxHead

@ROTATED_HEADS.register_module()
class DoubleConvFCBBoxHead(BBoxHead):
    r"""Bbox head used in Double-Head R-CNN

                                      /-> cls
                  /-> shared convs ->
                                      \-> reg
    roi features
                                      /-> cls
                  \-> shared fc    ->
                                      \-> reg
    """  # noqa: W605

    def __init__(self,
                 num_convs=0,
                 num_fcs=0,
                 conv_out_channels=1024,
                 fc_out_channels=1024,
                 conv_cfg=None,
                 norm_cfg=dict(type='BN'),
                 **kwargs):
        kwargs.setdefault('with_avg_pool', True)
        super(DoubleConvFCBBoxHead, self).__init__(**kwargs)

    def forward(self, x_cls, x_reg):

# 其次，如有必要，实施新的 RoI Head。我们计划DoubleHeadRoIHead从StandardRoIHead. 我们可以发现aStandardRoIHead已经实现了以下功能。
import torch
from mmdet.core import bbox2result, bbox2roi, build_assigner, build_sampler
from mmrotate.models.builder import ROTATED_HEADS, build_head, build_roi_extractor
from mmrotate.models.roi_heads.base_roi_head import BaseRoIHead
from mmrotate.models.roi_heads.test_mixins import BBoxTestMixin, MaskTestMixin


@ROTATED_HEADS.register_module()
class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin):
    """Simplest base roi head including one bbox head and one mask head.
    """

    def init_assigner_sampler(self):

    def init_bbox_head(self, bbox_roi_extractor, bbox_head):

    def forward_dummy(self, x, proposals):


    def forward_train(self,
                      x,
                      img_metas,
                      proposal_list,
                      gt_bboxes,
                      gt_labels,
                      gt_bboxes_ignore=None,
                      gt_masks=None):

    def _bbox_forward(self, x, rois):

    def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels,
                            img_metas):

    def simple_test(self,
                    x,
                    proposal_list,
                    img_metas,
                    proposals=None,
                    rescale=False):
        """Test without augmentation."""

# 双头的修改主要在 bbox_forward 逻辑中，它继承了StandardRoIHead. 在 中mmrotate/models/roi_heads/double_roi_head.py，我们实现了新的 RoI Head，如下所示：
from mmrotate.models.builder import ROTATED_HEADS
from mmrotate.models.roi_heads.standard_roi_head import StandardRoIHead


@ROTATED_HEADS.register_module()
class DoubleHeadRoIHead(StandardRoIHead):
    """RoI head for Double Head RCNN

    https://arxiv.org/abs/1904.06493
    """

    def __init__(self, reg_roi_scale_factor, **kwargs):
        super(DoubleHeadRoIHead, self).__init__(**kwargs)
        self.reg_roi_scale_factor = reg_roi_scale_factor

    def _bbox_forward(self, x, rois):
        bbox_cls_feats = self.bbox_roi_extractor(
            x[:self.bbox_roi_extractor.num_inputs], rois)
        bbox_reg_feats = self.bbox_roi_extractor(
            x[:self.bbox_roi_extractor.num_inputs],
            rois,
            roi_scale_factor=self.reg_roi_scale_factor)
        if self.with_shared_head:
            bbox_cls_feats = self.shared_head(bbox_cls_feats)
            bbox_reg_feats = self.shared_head(bbox_reg_feats)
        cls_score, bbox_pred = self.bbox_head(bbox_cls_feats, bbox_reg_feats)

        bbox_results = dict(
            cls_score=cls_score,
            bbox_pred=bbox_pred,
            bbox_feats=bbox_cls_feats)
        return bbox_results

最后，用户需要添加模块， mmrotate/models/bbox_heads/__init__.py相应mmrotate/models/roi_heads/__init__.py的注册表才能找到并加载它们。

或者，用户可以添加到配置文件并实现相同的目标。

custom_imports=dict(
    imports=['mmrotate.models.roi_heads.double_roi_head', 'mmrotate.models.bbox_heads.double_bbox_head'])

8.1.4添加损失

假设您想MyLoss为边界框回归添加一个新的损失。要添加新的损失函数，用户需要在mmrotate/models/losses/my_loss.py. 装饰器weighted_loss可以为每个元素加权损失。

import torch
import torch.nn as nn

from mmrotate.models.builder import ROTATED_LOSSES
from mmdet.models.losses.utils import weighted_loss

@weighted_loss
def my_loss(pred, target):
    assert pred.size() == target.size() and target.numel() > 0
    loss = torch.abs(pred - target)
    return loss

@ROTATED_LOSSES.register_module()
class MyLoss(nn.Module):

    def __init__(self, reduction='mean', loss_weight=1.0):
        super(MyLoss, self).__init__()
        self.reduction = reduction
        self.loss_weight = loss_weight

    def forward(self,
                pred,
                target,
                weight=None,
                avg_factor=None,
                reduction_override=None):
        assert reduction_override in (None, 'none', 'mean', 'sum')
        reduction = (
            reduction_override if reduction_override else self.reduction)
        loss_bbox = self.loss_weight * my_loss(
            pred, target, weight, reduction=reduction, avg_factor=avg_factor)
        return loss_bbox

然后用户需要将其添加到mmrotate/models/losses/__init__.py.

from .my_loss import MyLoss, my_loss

或者，可以添加如下到配置文件并实现相同的目标。

custom_imports=dict(
    imports=['mmrotate.models.losses.my_loss'])

要使用它，请修改该loss_xxx字段。由于 MyLoss 是用于回归的，因此您需要修改loss_bbox头部中的字段。

loss_bbox=dict(type='MyLoss', loss_weight=1.0))

9、自定义运行时设置

9.1自定义优化设置

9.1.1自定义 Pytorch 支持的优化器

我们已经支持使用 PyTorch 实现的所有优化器，唯一的修改是更改optimizer配置文件的字段。比如你要使用ADAM（注意性能可能会下降很多），修改可以如下。

optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
# 修改模型的学习率，用户只需要修改lrconfig 中的optimizer. 用户可以按照PyTorch的API文档直接设置参数。

API 文档直接设置参数。

9.1.2自定义自行实现的优化器

（1）定义一个新的优化器

假设您要添加一个名为的优化器MyOptimizer，它具有参数a、b和c。您需要创建一个名为mmrotate/core/optimizer. 然后在一个文件中实现新的优化器，例如，在mmrotate/core/optimizer/my_optimizer.py：

from mmdet.core.optimizer.registry import OPTIMIZERS
from torch.optim import Optimizer


@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):

    def __init__(self, a, b, c)

（2）将优化器添加到注册表

要找到上面定义的模块，首先应该将该模块导入主命名空间。有两种选择来实现它。

修改mmrotate/core/optimizer/init.py以导入它。

应该导入新定义的模块，mmrotate/core/optimizer/init.py以便注册表找到新模块并添加它：

from .my_optimizer import MyOptimizer

custom_imports在配置中使用手动导入它

custom_imports = dict(imports=['mmrotate.core.optimizer.my_optimizer'], allow_failed_imports=False)

模块mmrotate.core.optimizer.my_optimizer将在程序开始时导入，MyOptimizer然后自动注册类。MyOptimizer请注意，只应导入包含该类的包。mmrotate.core.optimizer.my_optimizer.MyOptimizer 不能直接导入。

实际上用户可以通过这种导入方式使用完全不同的文件目录结构，只要模块根目录可以位于PYTHONPATH.

（3）在配置文件中指定优化器

然后您可以MyOptimizer在optimizer配置文件的字段中使用。在配置中，优化器由如下字段定义optimizer：

optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)

要使用您自己的优化器，该字段可以更改为：

optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)

9.1.3自定义优化器构造函数

一些模型可能有一些特定参数的优化设置，例如 BatchNorm 层的权重衰减。用户可以通过自定义优化器构造函数来进行这些细粒度的参数调整。

from mmcv.utils import build_from_cfg

from mmcv.runner.optimizer import OPTIMIZER_BUILDERS, OPTIMIZERS
from mmrotate.utils import get_root_logger
from .my_optimizer import MyOptimizer


@OPTIMIZER_BUILDERS.register_module()
class MyOptimizerConstructor(object):

    def __init__(self, optimizer_cfg, paramwise_cfg=None):

    def __call__(self, model):

        return my_optimizer
# 默认的优化器构造函数在这里实现，它也可以作为新的优化器构造函数的模板

默认的优化器构造函数在这里。

9.1.4其他设置

优化器未实现的技巧应该通过优化器构造函数（例如，设置参数学习率）或钩子来实现。我们列出了一些可以稳定训练或加速训练的常见设置。随意创建 PR，发布更多设置。

使用梯度剪辑来稳定训练：一些模型需要梯度剪辑来剪辑梯度以稳定训练过程。一个例子如下：

optimizer_config = dict(
    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))

如果您的配置继承了已经设置的基本配置optimizer_config，您可能需要_delete_=True覆盖不必要的设置。有关更多详细信息，请参阅配置文档。

使用动量调度加速模型收敛：我们支持动量调度器根据学习率修改模型的动量，这可以使模型以更快的方式收敛。Momentum 调度器通常与 LR 调度器一起使用，例如，在 3D 检测中使用以下配置来加速收敛。更多细节请参考CyclicLrUpdater和CyclicMomentumUpdater的实现。

lr_config = dict(
    policy='cyclic',
    target_ratio=(10, 1e-4),
    cyclic_times=1,
    step_ratio_up=0.4,
)
momentum_config = dict(
    policy='cyclic',
    target_ratio=(0.85 / 0.95, 1),
    cyclic_times=1,
    step_ratio_up=0.4,
)

9.1.5自定义培训计划

工作流是（阶段，时期）的列表，用于指定运行顺序和时期。默认情况下，它设置为

workflow = [('train', 1)]

这意味着运行 1 个 epoch 进行训练。有时用户可能想要检查验证集上模型的一些指标（例如损失、准确性）。在这种情况下，我们可以将工作流设置为

# 这样 1 个 epoch 的训练和 1 个 epoch 的验证将被迭代运行。
[('train', 1), ('val', 1)]

注意：

在 val epoch 期间模型的参数不会更新。
配置中的关键字total_epochs仅控制训练 epoch 的数量，不会影响验证工作流程。
工作流不会改变因为被调用的行为，而验证工作流只会影响被调用的钩子。因此，和之间的唯一区别是跑步者将在每个训练时期后计算验证集上的损失。[('train', 1), ('val', 1)]``[('train', 1)]``EvalHook``EvalHook``after_train_epoch``after_val_epoch``[('train', 1), ('val', 1)]``[('train', 1)]

9.1.6自定义挂钩

1. 实现一个新的钩子

在某些情况下，用户可能需要实现一个新的钩子。MMRotate 支持训练中的自定义钩子。因此，用户可以直接在 mmrotate 或其基于 mmdet 的代码库中实现钩子，并通过仅在训练中修改配置来使用钩子。这里我们举一个例子，在 mmrotate 中创建一个新的钩子并在训练中使用它。

from mmcv.runner import HOOKS, Hook

@HOOKS.register_module()

class MyHook(Hook):

def __init__(self, a, b):
    pass

def before_run(self, runner):
    pass

def after_run(self, runner):
    pass

def before_epoch(self, runner):
    pass

def after_epoch(self, runner):
    pass

def before_iter(self, runner):
    pass

def after_iter(self, runner):
    pass

注：根据钩子的功能，用户需要在before_run、after_run、before_epoch、after_epoch、before_iter和中指定钩子在训练的每个阶段将做什么after_iter。

2.注册新的钩子

然后我们需要MyHook导入。假设文件在mmrotate/core/utils/my_hook.py有两种方法可以做到这一点：

修改mmrotate/core/utils/__init__.py以导入它。

应该导入新定义的模块，mmrotate/core/utils/__init__.py以便注册表找到新模块并添加它：

from .my_hook import MyHook

custom_imports在配置中使用手动导入它

custom_imports = dict(imports=['mmrotate.core.utils.my_hook'], allow_failed_imports=False)

3.修改配置

custom_hooks = [
    dict(type='MyHook', a=a_value, b=b_value)
]

您还可以通过添加键或如下设置挂钩priority的'NORMAL'优先'HIGHEST'级

custom_hooks = [
    dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
]

注：默认情况下，挂钩的优先级设置为NORMAL注册期间。

9.1.7使用在 MMCV 中实现的钩子

如果 MMCV 中已经实现了该钩子，则可以直接修改配置以使用该钩子，如下所示

4. 示例：`NumClassCheckHook`

我们实现了一个名为 NumClassCheckHook的自定义钩子来检查 in 头是否与innum_classes的长度匹配。CLASSSES``dataset

我们在default_runtime.py中设置它。

custom_hooks = [dict(type=‘NumClassCheckHook’)]

9.1.8修改默认运行时挂钩

有一些常见的钩子不是通过注册的custom_hooks，它们是

日志配置
检查点配置
评估
lr_config
优化器配置
动量配置

在这些钩子中，只有 logger 钩子具有VERY_LOW优先级，其他钩子的优先级是NORMAL. 上述教程已经介绍了如何修改optimizer_config、momentum_config和lr_config. 在这里，我们揭示了我们可以如何使用log_config、checkpoint_config和evaluation。

检查点配置

MMCV 运行器将用于checkpoint_config初始化CheckpointHook.

注：用户可以设置max_keep_ckpts只保存少量检查点或决定是否存储优化器的状态字典save_optimizer。论点的更多细节在这里

日志配置

包装多个记录器log_config挂钩并允许设置间隔。现在 MMCV 支持WandbLoggerHook、MlflowLoggerHook和TensorboardLoggerHook. 详细用法可以在文档中找到。

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

评估配置

评估配置evaluation将用于初始化EvalHook. 除了 key interval，其他参数如metric将传递给dataset.evaluate()

evaluation = dict(interval=1, metric='bbox')

10、日志分析

tools/analysis_tools/analyze_logs.py在给定训练日志文件的情况下绘制损失/mAP 曲线。首先运行以安装依赖项。pip install seaborn

python tools/analysis_tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]

例子：

绘制一些运行的分类损失。

python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls

绘制一些运行的分类和回归损失，并将图形保存为 pdf。

python tools/analysis_tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf

比较同一图中两次运行的 bbox mAP。

python tools/analysis_tools/analyze_logs.py plot_curve log1.json log2.json --keys bbox_mAP --legend run1 run2

计算平均训练速度。

python tools/analysis_tools/analyze_logs.py cal_train_time log.json [--include-outliers]

输出预计如下所示。

-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
slowest epoch 11, average time is 1.2024
fastest epoch 1, average time is 1.1909
time std over epochs is 0.0028
average iter time: 1.1959 s/iter

11、可视化

11.1可视化数据集

tools/misc/browse_dataset.py帮助用户直观地浏览检测数据集（包括图像和边界框注释），或将图像保存到指定目录。

python tools/misc/browse_dataset.py ${CONFIG} [-h] [--skip-type ${SKIP_TYPE[SKIP_TYPE...]}] [--output-dir ${OUTPUT_DIR}] [--not-show] [--show-interval ${SHOW_INTERVAL}]

你可能感兴趣的:(目标检测研究,pytorch,深度学习,人工智能)

AIGC空间智能在服装设计领域的颠覆性变革 AI天才研究院 ChatGPT 实战 ChatGPT AI大模型应用入门实战与进阶 AIGC ai
AIGC空间智能在服装设计领域的颠覆性变革关键词：AIGC、空间智能、服装设计、数字孪生、生成式AI、3D人体建模、智能设计系统摘要：本文深入探讨AIGC（人工智能生成内容）与空间智能技术在服装设计领域的融合创新，揭示其如何通过三维人体建模、场景模拟、智能生成算法重构传统设计流程。从技术原理层解析空间智能的核心模块，结合生成对抗网络（GAN）、Transformer模型等前沿算法，展示从创意生成到
编程效率的飞跃、创新驱动的测试与行业应用的新篇章
###引言在人工智能技术飞速发展的今天，AI工具、大模型及行业应用正在深刻改变着开发者的工作模式与各领域的发展格局。从智能编码助手到自动化测试平台，从大模型落地实践到垂直行业解决方案，AI正成为提升效率、驱动创新的核心引擎。本文将围绕“AI技术如何重塑你的工作与行业”这一主题，探讨AI工具、AI编程、AI测试以及AI行业应用和大模型落地等方面的影响。 ###一、AI工具重塑开发工作 #
中电金信：十问高质量数据集：金融大模型价值重塑有“据”可循
2025年，随着大模型在金融领域的深度应用，高质量数据集已逐渐成为决定模型性能的“基石”。面对数据要素价值释放的关键机遇期，国家政策不断深入推进：2月，国务院国资委启动“AI+”专项行动，着力攻克数据难题；5月，数字中国峰会发布了首批30项央企AI高质量数据集成果；6月，在央国企金融领域人工智能高质量数据集工作推进会上，14家企业共同签署了“央国企金融数据产业共同体倡议书”，旨在推动人工智能与数据
毕业论文 | 人工智能侵权责任法律问题研究——以无人驾驶汽车为例北斗猿毕业论文设计人工智能无人驾驶法律侵权责任法民法典
===========================================github：https://github.com/MichaelBeechanCSDN：https://blog.csdn.net/u011344545===========================================人工智能侵权责任法律问题研究——以无人驾驶汽车为例目录摘要一、绪论(一)课
人工智能发展简史——未来是属于AI人工智能的。 AI天才研究院 ChatGPT AI人工智能与大数据人工智能
目录人工智能发展简史第一章：起步期-20世纪50年代及以前1.1计算机象棋博弈（Programmingacomputerforplayingchess）1.2图灵测试（TuringTest）1.3达特茅斯学院人工智能夏季研讨会（DartmouthSummerResearchConferenceonArtificialIntelligence）1.4感知机（Perceptrons）第二章：第一次浪潮
青少年编程与数学 02-022 专业应用软件简介 20 法律专业软件：Westlaw
青少年编程与数学02-022专业应用软件简介20法律专业软件：Westlaw一、Westlaw法律专业软件概述（一）软件简介1.软件发展历程2.软件的主要特点（二）软件的应用领域1.法律研究2.法律实践3.法律教育二、Westlaw软件的功能模块（一）检索功能1.多种检索方式2.检索结果筛选与排序（二）法律研究工具1.KeyCite关键引用2.Headnotes判例摘要3.NotesofDecis
算法化资本——智能投顾技术重构金融生态的深度解析田园Coder 人工智能科普人工智能科普
金融市场的数字化进程正经历着本质性跃迁。当传统交易大厅的开放式喊价被服务器集群的低频嗡鸣取代，当投资决策从人类直觉转向概率矩阵计算，一场由人工智能驱动的资本范式革命已悄然降临。智能投顾作为这场变革的核心载体，其技术架构不仅重塑财富管理的运作逻辑，更在认知层面挑战着金融市场的存在根基。理解这场变革的深度与广度，需要穿透技术表象，审视算法与资本结合引发的复杂生态嬗变。智能投顾系统的技术支柱建立于三重认
Python 爬虫实战：Selenium 爬取豆瓣相册（图片分类 + 标签提取）西攻城狮北 python 爬虫 selenium
一、引言豆瓣作为国内知名的社区平台，其相册功能允许用户上传和分享各类图片，涵盖电影海报、音乐专辑、生活记录等多个领域。这些图片数据对于了解用户兴趣、进行内容推荐和市场调研具有重要价值。然而，豆瓣对直接的数据访问设定了诸多限制，因此，本文将介绍如何通过Python爬虫技术结合Selenium自动化工具，合法高效地爬取豆瓣相册图片，并运用深度学习技术实现图片分类和标签提取。二、开发环境搭建（一）编程语
AI“大航海”时代：企业人力资源的AI-HR实践与效能提升策略
在数字化浪潮的推动下，人工智能（AI）正以前所未有的速度渗透各行各业，人力资源管理（HR）领域也不例外。AI技术的引入与应用落地，不仅提升HR管理效率，更在深层次上带来人力资源运作模式的变革。什么是AI-HR所谓AI-HR，是指将人工智能技术应用于人力资源管理，并通过机器学习、自然语言处理、数据挖掘等技术，优化招聘、培训、绩效评估、员工关系等人力资源各个业务模块。近年来，随着AI技术的成熟和普及，
基于Abp Vnext、FastMCP构建一个企业级的模型即服务（MaaS）平台方案 NetX行者 Abp vnext Maas Abp vnext FastMCP 企业级平台解决方案开源 python
企业级MaaS平台技术可行性分析报告一、总体技术架构HTTP/WebSocketgRPC/RESTgRPC/RESTgRPCVue3前端ABPvNextAPI网关.NET9业务微服务ABPvNextMCPClientFastMCP模型仓库PyTorch/TensorFlowHuggingFaceHeyGem/ChatGLM自定义模型统一鉴权中心二、核心框架与中间件组件技术选型官方链接作用前端框架V
生成式人工智能认证（GAI认证）含金量怎么样？技能咖 GAI认证生成式人工智能认证人工智能
当生成式人工智能（GenerativeAI）的浪潮以摧枯拉朽之势重塑职业版图时，一个尖锐的问题正悬在无数人的心头：在技术迭代比眨眼更快的时代，如何证明自己具备驾驭AI的核心能力？这场认知革命的背后，一张认证证书的价值早已超越了纸面——它既是个人能力的“信用背书”，也是企业筛选人才的“技术密码”。而生成式人工智能认证（GAI认证）的诞生，恰似一把打开未来之门的密钥，其含金量究竟几何？答案藏在三个维度
【深度学习】大模型GLM-4-9B Chat ，微调与部署(3) TensorRT-LLM、TensorRT量化加速、Triton部署 XD742971636 深度学习机器学习深度学习人工智能
文章目录获取TensorRT-LLM代码：构建docker镜像并安装TensorRT-LLM：运行docker镜像：安装依赖魔改下部分package代码：量化：构建图：全局参数插件配置常用配置参数测试推理是否可以代码推理CLI推理性能测试小结验证是否严重退化使用NVIDIATriton部署在线推理服务器代码弄下来编译镜像启动容器安装依赖量化构建trtengines图Triton模板说明实操发起Tr
大白话解释深度学习中多尺度特征融合及其意义来自宇宙的曹先生深度学习人工智能
想象一下，你正在看一幅城市街道的照片。在这张照片中，你可能会看到：远处的小汽车，它们在图像中看起来很小。近处的大巴士，它们在图像中看起来很大。还有一些行人，他们可能在不同的距离上，大小各异。假设你想训练一个计算机程序来识别和分割这些不同的物体（汽车、巴士、行人）。如果这个程序只能在一个固定的尺度上“看”图像，比如说只能处理大物体，它可能会错过那些远处的小汽车，因为这些小汽车在图像中占据的像素很少。
SpringBoot多数据源动态切换方案：AbstractRoutingDataSource详解 fanxbl957 Web spring boot 后端 java
博主介绍：Java、Python、js全栈开发“多面手”，精通多种编程语言和技术，痴迷于人工智能领域。秉持着对技术的热爱与执着，持续探索创新，愿在此分享交流和学习，与大家共进步。DeepSeek-行业融合之万象视界(附实战案例详解100+)全栈开发环境搭建运行攻略：多语言一站式指南(环境搭建+运行+调试+发布+保姆级详解)感兴趣的可以先收藏起来，希望帮助更多的人SpringBoot多数据源动态切换
TensorRT-LLM：大模型推理加速引擎的架构与实践
前言：技术背景与发展历程：随着GPT-4、LLaMA等千亿级参数模型的出现，传统推理框架面临三大瓶颈：显存占用高（单卡可达80GB）、计算延迟大（生成式推理需迭代处理）、硬件利用率低（Transformer结构存在计算冗余）。根据MLPerf基准测试，原始PyTorch推理的token生成速度仅为12.3tokens/s（A100显卡）。一、TensorRT-LLM介绍：TensorRT-LLM是
深入解读MaaS技术架构：从模型服务到智能部署的全流程分析 Cc不爱吃洋葱架构人工智能大语言模型大模型智能部署 MaaS技术架构 LLM
随着人工智能（AI）的迅速发展，MaaS（ModelasaService，模型即服务）技术架构应运而生。它通过将复杂的AI模型封装为标准化服务，降低了模型的开发和部署门槛，帮助企业快速实现业务场景的智能化升级。本文将深入解析MaaS技术架构，详细阐述其各个组成部分以及如何在实际应用中高效发挥其功能。一、使用方层：从应用接入到业务赋能MaaS技术架构的顶层是使用方层，它主要面向第三方应用，是企业与M
想要了解大模型，看懂这一篇就够了！大模型工作流程及核心参数介绍！ Gq.xxu qwen3 vllm transforms 大语言模型部署深度学习人工智能
若想深入探究大模型核心参数的效果与作用，就务必先弄清大模型的工作流程，明确核心参数在流程各阶段的效能与功能，知晓其具体含义。一，大模型的工作流程大模型运行时的工作原理可以概括为输入处理→特征提取→模型推理→结果生成四个核心阶段，整个过程融合了深度学习架构、自然语言处理技术以及分布式计算能力。从用户输入到大模型输出，整个工作的处理流程如下：输入文本→分词→嵌入+位置编码→Transformer多层处
【初阶学习Linux】初识Linux 鳄鱼皮坡 linux 学习运维开发语言
1.Linux背景介绍发展史:本门课程学习Linux系统编程，你可能要问Linux从哪里来？它是怎么发展的？在这里简要介绍Linuxs的发展史。要说Linux，还得从UNIX说起。UNIX发展的历史：1968年，一些来自通用电器公司、贝尔实验室和麻省理工学院的研究人员开发了一个名叫Multics的特殊操作系统。Multics在多任务文件管理和用户连接中综合了许多新概念。1969－1970年，AT&
护照阅读器在医疗行业的应用 2401_83623586 科技
随着医疗全球化程度加深和患者流动性增加，护照阅读器在医疗行业的应用价值将愈发凸显。这项技术不仅解决了医疗机构的效率和安全问题，也为患者提供了更加顺畅、安全的就医体验，成为现代医疗信息化基础设施中不可或缺的一环。护照阅读器在医疗旅游中的革新应用跨境医疗旅游作为全球增长最快的健康产业分支之一，正经历着前所未有的扩张。据市场研究数据显示，全球医疗旅游产业规模预计在2025年达到惊人的1,250亿美元，年
LSTM 论文（Hochreiter & Schmidhuber, 1997）精读（三）
文章：SeppHochreiter,JürgenSchmidhuber;LongShort-TermMemory.NeuralComput1997;9(8):1735–1780.doi:https://doi.org/10.1162/neco.1997.9.8.1735第2节PreviousWork（已有研究），这是论文对以往方法的一个评述，总结了已有递归神经网络在面对时间序列学习、尤其是长时依赖
windows script host 无法找到脚本文件program files(x86)\游戏平台\steamalypc\estm.vbs 开机弹窗如何解决电气之子游戏
安装了某第三方游戏租号平台后每次开机都有这个弹窗，很烦，然后研究了这个解决方法分享给大家1.开始界面搜索任务计划程序，然后打开2.在中间的任务计划中找到boottriggertesttask以及boottriggertesttaskusers，分别单击选中后，点击最右侧的删除即可3.重启完美解决问题！！！
深度学习-Tensor
Tensor张量：与numpy中的ndarray不同之处：tensor可以在GPU或其他专用硬件上运行，以加速计算。一、Tensor初始化1.直接从数据中创建data=[[1,2],[3,4]]x_data=torch.tensor(data)2.从numpy数组创建np_array=np.array(data)x_np=torch.from_numpy(np_array)3.从另一个Tensor
基于存算一体架构的实时深度学习推理优化瑕疵热点资讯架构深度学习人工智能
博客主页：瑕疵的CSDN主页Gitee主页：瑕疵的gitee主页⏩文章专栏：《热点资讯》基于存算一体架构的实时深度学习推理优化基于存算一体架构的实时深度学习推理优化基于存算一体架构的实时深度学习推理优化引言存算一体架构的核心优势1.能效比突破2.实时性保障架构设计与实现技术1.存储单元创新2.硬件加速器设计3.电路级优化深度学习推理优化策略1.模型压缩技术2.硬件-软件协同优化3.运行时调度典型应
人工智能LLM | 基础配置 | 通过环境变量配置API-KEY 一文通教程 H-大叔人工智能大模型实战与教程人工智能
在实战开发大语言模型的过程中，经常会遇到各种API-KEY的配置问题，例如GPTOpenAIKEY的配置，而且目前大部分都要求将其配置在环境变量中，下面将会讲解如何在Linux、macOS、Windows中配置，本文一文通教程。您可以使用配置环境变量的方法，避免在调用各种SDK时显式地配置API-KEY，从而降低泄漏风险。环境变量是操作系统中用于存储有关系统环境的信息的变量。您可以通过环境变量来配
AI智能体长期记忆系统架构设计与落地实践：从理论到生产部署一休哥助手人工智能人工智能
摘要长期记忆能力是AI智能体实现持续个性化服务的核心瓶颈。本文基于Mem0、MemoryOS等前沿研究，系统解析长期记忆系统的三级架构、六大原子操作与生产级优化方案，结合金融、医疗等场景案例，通过7张架构图与4张对比表格，揭示如何实现91%延迟降低与90%成本节约的企业级记忆系统。全文超6000字，提供可落地的架构范式。1长期记忆：AI智能体的认知基石1.1人类记忆与AI记忆的类比情景记忆语义记忆
【人工智能】ChatGPT、DeepSeek-R1、DeepSeek-V3 辨析 G皮T #大语言模型人工智能 LLM 大语言模型 chatgpt deepseek DeepSeek-R1 DeepSeek-V3
ChatGPT、DeepSeek-R1、DeepSeek-V3辨析1.ChatGPT对比DeepSeek1.1技术相似点1.2主要差异1.3关键区别1.4如何选择1.5总结2.DeepSeek-R1对比DeepSeek-V32.1DeepSeek-R12.2DeepSeek-V32.3核心区别总结2.4如何选择3.R1和V3有什么含义3.1DeepSeekR1的"R"3.2DeepSeekV3的"
服务器无对应cuda版本安装pytorch-gpu[自用] 片月斜生梦泽南 pytorch
服务器无对应cuda版本安装pytorch-gpu服务器无对应cuda版本安装pytorch-gpu网址下载非root用户安装tmux查看服务器ubuntu版本conda安装tensorflow-gpu安装1.x版本服务器无对应cuda版本安装pytorch-gpu网址GPU版本的pytorch、pytorchvision的下载链接https://download.pytorch.org/whl/
「论文导读」LLM高效推理与模型量化雷羿 LexChien prompt 人工智能 LLM 论文阅读
1.论文背景作者：HugoTouvron等人，來自MetaAI来源：arXiv:2302.13971，2023年2月主题：介绍LLaMA系列模型（LLaMA-7B、13B、33B、65B），专为研究用途设计，强调高效能与低资源需求的语言模型推理。论文探讨如何通过优化训练数据、模型架构和推理技术，在有限硬体资源（如单一GPU或CPU）上实现高效推理。学术背景：随着大型语言模型（LLM）如GPT-3的
在学校研究学习的偏算法，秋招投递开发岗位还有希望吗程序员
前言Thelasttime,Ihavelearned这是星球同学，在周五晚上答疑聊天的时候对我的提问：如果简历上的项目偏算法，但是自学了一些操作系统和计网的知识，秋招的时候投递偏开发的岗位有希望吗？简历上是否也要加上相关项目？估计也是很多朋友的疑问，毕竟很多同学读研，有些老师疯狂push，要成果，发论文。要想尽快发论文，那只能“研究”人工智能、算法的一些东西了。但是众所周知，算法要求很高，不仅要求
文献分享: BGE-M3——打通三种方式的嵌入模型
文章目录1.\textbf{1.}1.背景与导论1.1.\textbf{1.1.}1.1.研究背景1.2.\textbf{1.2.}1.2.本文的研究1.3.\textbf{1.3.}1.3.有关工作2.M3-Embedding\textbf{2.M3-Embedding}2.M3-Embedding2.1.\textbf{2.1.}2.1.模型核心:混合检索方式2.1.1.\textbf{2.1
Java序列化进阶篇 g21121 java序列化
1.transient 类一旦实现了Serializable 接口即被声明为可序列化，然而某些情况下并不是所有的属性都需要序列化，想要人为的去阻止这些属性被序列化，就需要用到transient 关键字。
escape()、encodeURI()、encodeURIComponent()区别详解 aigo JavaScript Web
原文：http://blog.sina.com.cn/s/blog_4586764e0101khi0.html JavaScript中有三个可以对字符串编码的函数，分别是： escape,encodeURI,encodeURIComponent，相应3个解码函数：,decodeURI,decodeURIComponent 。下面简单介绍一下它们的区别 1 escape()函
ArcgisEngine实现对地图的放大、缩小和平移 Cb123456 添加矢量数据对地图的放大、缩小和平移 Engine
ArcgisEngine实现对地图的放大、缩小和平移: 个人觉得是平移，不过网上的都是漫游，通俗的说就是把一个地图对象从一边拉到另一边而已。就看人说话吧. 具体实现: 一、引入命名空间 using ESRI.ArcGIS.Geometry; using ESRI.ArcGIS.Controls; 二、代码实现.
Java集合框架概述天子之骄 Java集合框架概述
集合框架集合框架可以理解为一个容器，该容器主要指映射(map)、集合(set)、数组(array)和列表(list)等抽象数据结构。从本质上来说，Java集合框架的主要组成是用来操作对象的接口。不同接口描述不同的数据类型。简单介绍： Collection接口是最基本的接口，它定义了List和Set，List又定义了LinkLi
旗正4.0页面跳转传值问题何必如此 java jsp
跳转和成功提示 a) 成功字段非空forward 成功字段非空forward，不会弹出成功字段，为jsp转发，页面能超链接传值,传输变量时需要拼接。接拼接方式list.jsp?test="+strweightUnit+"或list.jsp?test="+weightUnit+&qu
全网唯一:移动互联网服务器端开发课程 cocos2d-x小菜 web开发移动开发移动端开发移动互联程序员
移动互联网时代来了！ App市场爆发式增长为Web开发程序员带来新一轮机遇，近两年新增创业者，几乎全部选择了移动互联网项目！传统互联网企业中超过98%的门户网站已经或者正在从单一的网站入口转向PC、手机、Pad、智能电视等多端全平台兼容体系。据统计，AppStore中超过85%的App项目都选择了PHP作为后端程
Log4J通用配置|注意问题笔记 7454103 DAO apache tomcat log4j Web
关于日志的等级那些去百度就知道了！这几天要搭个新框架配置了日志记下来！做个备忘！ #这里定义能显示到的最低级别,若定义到INFO级别,则看不到DEBUG级别的信息了~! log4j.rootLogger=INFO,allLog # DAO层 log记录到dao.log 控制台和总日志文件 log4j.logger.DAO=INFO,dao,C
SQLServer TCP/IP 连接失败问题 ---SQL Server Configuration Manager darkranger sql c windows SQL Server XP
当你安装完之后,连接数据库的时候可能会发现你的TCP/IP 没有启动.. 发现需要启动客户端协议 : TCP/IP 需要打开 SQL Server Configuration Manager... 却发现无法打开 SQL Server Configuration Manager..?? 解决方法: C:\WINDOWS\system32目录搜索framedyn.
[置顶] 做有中国特色的程序员 aijuans 程序员
从出版业说起网络作品排到靠前的，都不会太难看，一般人不爱看某部作品也是因为不喜欢这个类型，而此人也不会全不喜欢这些网络作品。究其原因，是因为网络作品都是让人先白看的，看的好了才出了头。而纸质作品就不一定了，排行榜靠前的，有好作品，也有垃圾。许多大牛都是写了博客，后来出了书。这些书也都不次，可能有人让为不好，是因为技术书不像小说，小说在读故事，技术书是在学知识或温习知识，有些技术书读得可
document.domain 跨域问题 avords document
document.domain用来得到当前网页的域名。比如在地址栏里输入：javascript:alert(document.domain); //www.315ta.com我们也可以给document.domain属性赋值，不过是有限制的，你只能赋成当前的域名或者基础域名。比如：javascript:alert(document.domain = "315ta.com");
关于管理软件的一些思考 houxinyou 管理
工作好多看年了,一直在做管理软件,不知道是我最开始做的时候产生了一些惯性的思维,还是现在接触的管理软件水平有所下降.换过好多年公司,越来越感觉现在的管理软件做的越来越乱. 在我看来,管理软件不论是以前的结构化编程,还是现在的面向对象编程,不管是CS模式,还是BS模式.模块的划分是很重要的.当然,模块的划分有很多种方式.我只是以我自己的划分方式来说一下. 做为管理软件,就像现在讲究MVC这
NoSQL数据库之Redis数据库管理(String类型和hash类型) bijian1013 redis 数据库 NoSQL
一.Redis的数据类型 1.String类型及操作 String是最简单的类型，一个key对应一个value，string类型是二进制安全的。Redis的string可以包含任何数据，比如jpg图片或者序列化的对象。 Set方法：设置key对应的值为string类型的value
Tomcat 一些技巧征客丶 java tomcat dos
以下操作都是在windows 环境下一、Tomcat 启动时配置 JAVA_HOME 在 tomcat 安装目录，bin 文件夹下的 catalina.bat 或 setclasspath.bat 中添加 set JAVA_HOME=JAVA 安装目录 set JRE_HOME=JAVA 安装目录/jre 即可；二、查看Tomcat 版本在 tomcat 安装目
【Spark七十二】Spark的日志配置 bit1129 spark
在测试Spark Streaming时，大量的日志显示到控制台，影响了Spark Streaming程序代码的输出结果的查看(代码中通过println将输出打印到控制台上)，可以通过修改Spark的日志配置的方式，不让Spark Streaming把它的日志显示在console 在Spark的conf目录下，把log4j.properties.template修改为log4j.p
Haskell版冒泡排序 bookjovi 冒泡排序 haskell
面试的时候问的比较多的算法题要么是binary search，要么是冒泡排序，真的不想用写C写冒泡排序了，贴上个Haskell版的，思维简单，代码简单，下次谁要是再要我用C写冒泡排序，直接上个haskell版的，让他自己去理解吧。 sort [] = [] sort [x] = [x] sort (x:x1:xs) | x>x1 = x1:so
java 路径配置文件读取 bro_feng java
这几天做一个项目，关于路径做如下笔记，有需要供参考。取工程内的文件，一般都要用相对路径，这个自然不用多说。在src统计目录建配置文件目录res,在res中放入配置文件。读取文件使用方式： 1. MyTest.class.getResourceAsStream("/res/xx.properties") 2. properties.load(MyTest.
读《研磨设计模式》-代码笔记-简单工厂模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ package design.pattern; /* * 个人理解：简单工厂模式就是IOC; * 客户端要用到某一对象，本来是由客户创建的，现在改成由工厂创建，客户直接取就好了 */ interface IProduct {
SVN与JIRA的关联 chenyu19891124 SVN
SVN与JIRA的关联一直都没能装成功，今天凝聚心思花了一天时间整合好了。下面是自己整理的步骤：一、搭建好SVN环境，尤其是要把SVN的服务注册成系统服务二、装好JIRA，自己用是jira-4.3.4破解版三、下载SVN与JIRA的插件并解压，然后拷贝插件包下lib包里的三个jar，放到Atlassian\JIRA 4.3.4\atlassian-jira\WEB-INF\lib下，再
JWFDv0.96 最新设计思路 comsci 数据结构算法工作企业应用公告
随着工作流技术的发展，工作流产品的应用范围也不断的在扩展，开始进入了像金融行业(我已经看到国有四大商业银行的工作流产品招标公告了)，实时生产控制和其它比较重要的工程领域，而
vi 保存复制内容格式粘贴 daizj vi 粘贴复制保存原格式不变形
vi是linux中非常好用的文本编辑工具，功能强大无比，但对于复制带有缩进格式的内容时，粘贴的时候内容错位很严重，不会按照复制时的格式排版，vi能不能在粘贴时，按复制进的格式进行粘贴呢？答案是肯定的，vi有一个很强大的命令可以实现此功能。在命令模式输入:set paste，则进入paste模式，这样再进行粘贴时
shell脚本运行时报错误：/bin/bash^M: bad interpreter 的解决办法 dongwei_6688 shell脚本
出现原因：windows上写的脚本，直接拷贝到linux系统上运行由于格式不兼容导致解决办法： 1. 比如文件名为myshell.sh，vim myshell.sh 2. 执行vim中的命令 : set ff?查看文件格式，如果显示fileformat=dos，证明文件格式有问题 3. 执行vim中的命令 :set fileformat=unix 将文件格式改过来就可以了，然后:w
高一上学期难记忆单词 dcj3sjt126com word english
honest 诚实的；正直的 argue 争论 classical 古典的 hammer 锤子 share 分享；共有 sorrow 悲哀；悲痛 adventure 冒险 error 错误；差错 closet 壁橱；储藏室 pronounce 发音；宣告 repeat 重做；重复 majority 大多数；大半 native 本国的，本地的，本国
hibernate查询返回DTO对象，DTO封装了多个pojo对象的属性 frankco POJO hibernate查询 DTO
DTO-数据传输对象；pojo-最纯粹的java对象与数据库中的表一一对应。简单讲：DTO起到业务数据的传递作用，pojo则与持久层数据库打交道。有时候我们需要查询返回DTO对象，因为DTO
Partition List hcx2013 partition
Given a linked list and a value x, partition it such that all nodes less than x come before nodes greater than or equal to x. You should preserve the original relative order of th
Spring MVC测试框架详解——客户端测试 jinnianshilongnian
上一篇《Spring MVC测试框架详解——服务端测试》已经介绍了服务端测试，接下来再看看如果测试Rest客户端，对于客户端测试以前经常使用的方法是启动一个内嵌的jetty/tomcat容器，然后发送真实的请求到相应的控制器；这种方式的缺点就是速度慢；自Spring 3.2开始提供了对RestTemplate的模拟服务器测试方式，也就是说使用RestTemplate测试时无须启动服务器，而是模拟一
关于推荐个人观点 liyonghui160com 推荐系统关于推荐个人观点
回想起来，我也做推荐了3年多了，最近公司做了调整招聘了很多算法工程师，以为需要多么高大上的算法才能搭建起来的，从实践中走过来，我只想说【不是这样的】第一次接触推荐系统是在四年前入职的时候，那时候，机器学习和大数据都是没有的概念，什么大数据处理开源软件根本不存在，我们用多台计算机web程序记录用户行为，用.net的w
不间断旋转的动画 pangyulei 动画
CABasicAnimation* rotationAnimation; rotationAnimation = [CABasicAnimation animationWithKeyPath:@"transform.rotation.z"]; rotationAnimation.toValue = [NSNumber numberWithFloat: M
自定义annotation sha1064616837 java enum annotation reflect
对象有的属性在页面上可编辑，有的属性在页面只可读，以前都是我们在页面上写死的，时间一久有时候会混乱，此处通过自定义annotation在类属性中定义。越来越发现Java的Annotation真心很强大，可以帮我们省去很多代码，让代码看上去简洁。下面这个例子主要用到了 1.自定义annotation：@interface，以及几个配合着自定义注解使用的几个注解 2.简单的反射 3.枚举
Spring 源码 up2pu spring
1.Spring源代码 https://github.com/SpringSource/spring-framework/branches/3.2.x 注：兼容svn检出 2.运行脚本 import-into-eclipse.bat 注：需要设置JAVA_HOME为jdk 1.7 build.gradle compileJava { sourceCompatibilit
利用word分词来计算文本相似度 yangshangchuan word word分词文本相似度余弦相似度简单共有词
word分词提供了多种文本相似度计算方式：方式一：余弦相似度，通过计算两个向量的夹角余弦值来评估他们的相似度实现类：org.apdplat.word.analysis.CosineTextSimilarity 用法如下： String text1 = "我爱购物"; String text2 = "我爱读书"; String text3 =

mmrotate

转载mmrotate

1、环境安装

1.1使用虚拟环境安装

1.2使用docker进行安装

2、验证环境安装是否成功

3、构建属于自己的数据集

3.1数据集下载地址

3.2数据集存储结构

3.3对应配置文件修改

3.4 Split dota dataset

4、测试模型

5、训练模型

1、使用单个 GPU 进行训练

2、使用多个 GPU 进行训练

3、使用多台机器进行训练

4、在一台机器上启动多个作业

5、基准和模型库

5.1基准和模型库

5.1Results on DOTA v1.0

6、了解配置文件

6.1更新字典链的配置键。

6.2更新配置列表中的键。

6.3更新列表/元组的值。

6.4配置文件命名约定

6.5配置Example

6.6常见问题

7、自定义数据集

7.1将新数据格式重组为现有格式

7.2自定义新数据集

7.3自定义新数据集Example

1.修改配置文件以使用自定义数据集

2.查看自定义数据集的注解

7.4通过数据集包装器自定义数据集

7.4.1重复数据集

7.4.2类平衡数据集

7.4.3连接数据集

8、自定义模型

8.1开发新组件

8.1.1添加新的主干

1.定义一个新的主干网（例如MobileNet）

2.导入模块

3. 在你的配置文件中使用主干

8.1.2添加新的脖子

1.定义颈部（例如PAFPN）

2.导入模块

3.修改配置文件

8.1.3添加新头

8.1.4添加损失

9、自定义运行时设置

9.1自定义优化设置

9.1.1自定义 Pytorch 支持的优化器

9.1.2自定义自行实现的优化器

9.1.3自定义优化器构造函数

9.1.4其他设置

9.1.5自定义培训计划

9.1.6自定义挂钩

1. 实现一个新的钩子

2.注册新的钩子

3.修改配置

9.1.7使用在 MMCV 中实现的钩子

4. 示例：NumClassCheckHook

9.1.8修改默认运行时挂钩

检查点配置

日志配置

评估配置

10、日志分析

11、可视化

11.1可视化数据集

你可能感兴趣的:(目标检测研究,pytorch,深度学习,人工智能)

4. 示例：`NumClassCheckHook`