MMDetection——GPU训练

预先知识

${CONFIG_FILE}:config/里面的文件

config/faster_rcnn_r50_fpn_1x_coco.py

${CHECKPOINT_FILE}:模型权重所在位置

checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

[–out ${RESULT_FILE}]:测试生成的文件输出位置

[–eval ${EVAL_METRICS}]:选用的测试方法

${GPU_NUM}:GPU数量

测试数据集

# single-gpu
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]

# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

模型训练

单机单GPU训练

python tools/train.py ${CONFIG_FILE}

举例:

python tools/train.py ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py

如需指定工作目录,后接参数:--work_dir${WORK_DIR}

单机多GPU训练

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

举例:

./tools/dist_train.sh ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py 4

可选参数:
--validate:训练过程中,每隔k代执行一次评估(默认为1)
--work_dir ${WOR_DIR}:指定工作目录
--resume_from ${CHECKPOINT_FILE}:从先前的检查点文件恢复

多机多GPU训练

使用slurm集群管理:

./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [${GPUS}]

举例:16GPU,test分区,训练faster R-CNN

./tools/slurm_train.sh test Faster_r50_1x configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py /home/xxx/faster_rcnn_r50_fpn_1x 16 

Reference

MMDetection中文文档——2.入门

你可能感兴趣的:(MMDetection,深度学习,pytorch)