(2022.10.24)注意,scannet-frames-25k 这个子集的 instance maps 与原始 ScanNetV2 的 instance maps(即 scene*_*/instance/ 或 scene*_*/instance-filt/ 里面的那些 .png)格式是不同的,[5] 有讲这一点。所以如果想用原始 ScanNetV2 数据训练,在用本文改的转换代码转成 COCO object detection 格式 annotations 之前,要用原始 ScanNetV2 中的
合并成 scannet-frames-25k 格式的 instance maps 先。步骤是:
instance_map = 1000 * raw_label_map + raw_instance_map
。这两步转换分别在 [2] 中 ScanNet/BenchmarkScripts/2d_helpers/ 下的 convert_scannet_label_image.py 和 convert_scannet_instance_image.py 中有实现。(不过,自己画一下原始 instance map 就会发现,同一 scan 下的 instance ID 是全局的,即同一 scan 下同一 instance 在不同帧里拥有相同 ID,可以跨帧追踪同一个 instance,evaluate 的时候也许会用得上。[2] 提供的 instance map 转换代码好像会破坏这一特性,需要的话可以自己 hack 一份,保留全局 instance ID)
需要用 ScanNet[1,2] 训练一个 object detection 模型,使用 MMDetection[3,4]。步骤:
ScanNet 主要用于 3D 领域,但其数据形式是 RGB-D 序列,其中 RGB 的序列可以当视频,有 v1、v2 两个版本,完整的 v2 大约 1.8T。其它信息见 [1-2,5-7],下载脚本 download-scannet.py 见 [8]。
[5] 有提到 scannet_frames_25k 这个子集,本文主要用它。对照 [8] 的代码可知,它是从完整的 v2 中抽出来的,大概每 100 帧抽一帧。下载的文件是:
执行:
python download-scannet.py -o . --preprocessed_frames
python download-scannet.py -o . --test_frames_2d
下载(我脚本下不了,把下载链粘到迅雷下的)。解压看文件结构:
scannet_frames_25k/
|- scene0000_00/ # 一份 scan
| |- color/ # RGB 序列,视频 (jpg)
| |- depth/ # depth 序列 (png)
| |- instance/ # instance mask (png)
| |- label/
| |- pose/
| |- intrinsics_color.txt
| `- intrinsics_depth.txt
|- scene0000_01/ # 另一份 scan
...
scannet_frames_test/
|- scene0707_00/
| |- color/
| |- depth/
| |- pose/
| |- intrinsics_color.txt
| `- intrinsics_depth.txt
|- scene0708_00/
...
可见 test set 的文件少了跟 label 有关的文件。
[2] 中提供了官方的划分,在 ScanNet/Tasks/Benchmark/ 下,分成 train/val/test 3 部分。对照其中 v2 的几个划分文件(txt)和上述两个 zip 可知:
所以这个子集 scan 应该是跟完整版一样多,只是对每个 scan 下的序列抽样了。
从 MMDetection 的 configuration 文件看,要将 train/val 分在不同的目录,并生成不同的 json 标注文件。因为 test 缺少 instance/,而后续转成 COCO 格式时又要用到这个目录的东西,所以本文弃用 test set 的数据,用 val。
# split-scannet.py
import os
import os.path as osp
"""split ScanNet
Only `scannet_frames_25k/` is used while `scannet_frames_test/` is ignored
because the scenes in it have no `**/instance/` sub-folder which
is needed by `convert2panopic.py`.
So I simply reuse validation set as the test set as in the COCO
configuration file in MMDetection.
These 2 subset are then converted separately to produce separate
annotation json files as needed in the configuration.
"""
DATA_ROOT = "/data"
# DATA_P = [osp.join(DATA_ROOT, p) for p in ("scannet_frames_25k", "scannet_frames_test")]
DATA_P = osp.join(DATA_ROOT, "scannet_frames_25k")
# 检查 scans 数
dir_list = next(os.walk(DATA_P))[1]
print("#data:", len(dir_list)) # 1513 -> ALL
print("conclusion: contains all data, only frames are down-sampled")
SPLIT_P = osp.join(os.environ["HOME"], "codes", "ScanNet", "Tasks", "Benchmark")
# scannet_frames_25k is only available for ScanNetv2
VER = "v2"
DEST = "data/scannet-frames" # 代码目录中的 data/,不是 /data/
if not osp.exists(DEST):
os.makedirs(DEST)
# test set has NO `**/instance/*.png` needed by `convert2panopic.py`
for subset in ["train", "val"]:
split_file = osp.join(SPLIT_P, "scannet"+VER+"_"+subset+".txt")
# soft-link all scans of this subset to `sub_dest`
sub_dest = osp.join(DEST, subset)
if not osp.exists(sub_dest):
os.makedirs(sub_dest)
with open(split_file, "r") as f:
for line in f:
line = line.strip()
if "" == line:
continue
os.system("ln -s {} {}".format(
osp.join(DATA_P, line),
osp.join(sub_dest, line)))
print(subset, "DONE")
MMDetection 推荐的组织数据方式之一是转化为 COCO 的格式[9-11]。rvc_devkit[12] 提供了一份转化的脚本:rconvert_scannet_coco.sh,其核心用的是 [2] 提供的转化代码:convert2panoptic.py。但这其实是为 panoptic segmentation 任务准备的(见 [9] 第 4 条),而此处于需要 object detection 的 annotation 格式([9] 第 1 条)。
于是基于 convert2panoptic.py 改一份适用于 object detection 的转化代码(convert-scannet-coco-objdet.py):
# convert-scannet-coco-objdet.py
#!/usr/bin/python
#
# Convert to COCO-style panoptic segmentation format (http://cocodataset.org/#format-data).
#
"""iTom's modified version (2022.9.13)
This file is inherited from
rvc_devkit/segmentation/conv_scannet/convert2panoptic.py
which is the same as
ScanNet/BenchmarkScripts/convert2panoptic.py
But I modify it to fit objection detection format and be
able to dinstiguish different subsets (i.e. train/val/test)
in terms of the output json annotation files.
There are several modifications:
(a) Addition argment
- an additional optional arguement of `convert2panoptic`:
subset_tag, default = None
- an additional optional command-line argument:
--subset-tag -> args.subsetTag, default = None
If this argument is used, the name of output json annotation file
will be modified accordingly.
(b) Move to COCO object detection annotation format instead of the
original panoptic format. I borrowed the functions, i.e.
- binary_mask_to_polygon
- close_contour
for polygon format segmentation info calculation. But the results
are discarded due to
- their wierdly large volumn
(~38G for val set & ~114G for the training set !)
- that they are not used in detection task
If you want to reenable it, you may need to install
- scikit-image
(NOTE: I suspect this is buggy somewhere.)
(c) Change extension to ".jpg" in `images/file_name` field.
"""
# python imports
from __future__ import print_function, absolute_import, division, unicode_literals
from itertools import count
import os
import glob
import sys
import argparse
import json
import numpy as np
# iTom: for polygon calculation
from skimage import measure
# Image processing
from PIL import Image
EVAL_LABELS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39]
EVAL_LABEL_NAMES = ["wall", "floor", "cabinet", "bed", "chair", "sofa", "table", "door", "window", "bookshelf", "picture", "counter", "desk", "curtain", "refrigerator", "shower curtain", "toilet", "sink", "bathtub", "otherfurniture"]
EVAL_LABEL_CATS = ["indoor", "indoor", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "appliance", "furniture", "furniture", "appliance", "furniture", "furniture"]
EVAL_LABEL_COLORS = [(174, 199, 232), (152, 223, 138), (31, 119, 180), (255, 187, 120), (188, 189, 34), (140, 86, 75), (255, 152, 150), (214, 39, 40), (197, 176, 213), (148, 103, 189), (196, 156, 148), (23, 190, 207), (247, 182, 210), (219, 219, 141), (255, 127, 14), (158, 218, 229), (44, 160, 44), (112, 128, 144), (227, 119, 194), (82, 84, 163)]
def splitall(path):
allparts = []
while 1:
parts = os.path.split(path)
if parts[0] == path: # sentinel for absolute paths
allparts.insert(0, parts[0])
break
elif parts[1] == path: # sentinel for relative paths
allparts.insert(0, parts[1])
break
else:
path = parts[0]
allparts.insert(0, parts[1])
return allparts
def close_contour(contour):
"""iTom: helper function for binary mask -> polygon conversion
from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L20
"""
if not np.array_equal(contour[0], contour[-1]):
contour = np.vstack((contour, contour[0]))
return contour
def binary_mask_to_polygon(binary_mask, tolerance=0):
"""iTom: Converts a binary mask to COCO polygon representation
Args:
binary_mask: a 2D binary numpy array where '1's represent the object
tolerance: Maximum distance from original points of polygon to approximated
polygonal chain. If tolerance is 0, the original coordinate array is returned.
from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L35
ref:
- https://github.com/cocodataset/cocoapi/issues/131
- https://stackoverflow.com/questions/68663512/image-segmentation-mask-to-polygon-for-coco-json
- https://stackoverflow.com/questions/58884265/python-convert-binary-mask-to-polygon
"""
polygons = []
# pad mask to close contours of shapes which start and end at an edge
padded_binary_mask = np.pad(binary_mask, pad_width=1, mode='constant', constant_values=0)
contours = measure.find_contours(padded_binary_mask, 0.5)
# contours = np.subtract(contours, 1) # iTom: original but buggy
for i in range(len(contours)): # iTom: change to for-loop subtraction
contours[i] = np.subtract(contours[0], 1)
for contour in contours:
contour = close_contour(contour)
contour = measure.approximate_polygon(contour, tolerance)
if len(contour) < 3:
continue
contour = np.flip(contour, axis=1)
segmentation = contour.ravel().tolist()
# after padding and subtracting 1 we may get -0.5 points in our segmentation
segmentation = [0 if i < 0 else i for i in segmentation]
polygons.append(segmentation)
return polygons
# The main method
def convert2panoptic(scannetPath, outputFolder=None, subset_tag=None, beginAnnoId=0, beginImageId=0, thingsOnly=False):
"""iTom's modification
subset_tag: str, an optional subset distinguishing string for
train/val seperation. One can simply ignore it to get the
original output json file name.
"""
if outputFolder is None:
outputFolder = scannetPath
# find files
search = os.path.join(scannetPath, "*", "instance", "*.png")
files = glob.glob(search)
files.sort()
# quit if we did not find anything
if not files:
print(
"Did not find any files for using matching pattern {}. Please consult the README.".format(search)
)
sys.exit(-1)
# a bit verbose
print("Converting {} annotation files.".format(len(files)))
outputBaseFile = "scannet_objdet"
if subset_tag is not None:
outputBaseFile = outputBaseFile + "_" + subset_tag
print("iTom: modifying json annotation file name to:", outputBaseFile)
outFile = os.path.join(outputFolder, "{}.json".format(outputBaseFile))
print("Json file with the annotations in panoptic format will be saved in {}".format(outFile))
# panopticFolder = os.path.join(outputFolder, outputBaseFile)
# if not os.path.isdir(panopticFolder):
# print("Creating folder {} for panoptic segmentation PNGs".format(panopticFolder))
# os.mkdir(panopticFolder)
# print("Corresponding segmentations in .png format will be saved in {}".format(panopticFolder))
categories = []
cls_is_things = {} # iTom
for idx in range(len(EVAL_LABELS)):
label = EVAL_LABELS[idx]
name = EVAL_LABEL_NAMES[idx]
cat = EVAL_LABEL_CATS[idx]
color = EVAL_LABEL_COLORS[idx]
isthing = label > 2
cls_is_things[int(label)] = isthing # iTom
if thingsOnly and not isthing: # iTom
continue
categories.append({'id': int(label),
'name': name,
'color': color,
'supercategory': cat,
'isthing': isthing})
images = []
annotations = []
for progress, f in enumerate(files):
originalFormat = np.array(Image.open(f))
parts = splitall(f)
fileName = parts[-1]
sceneName = parts[-3]
outputFileName = "{}__{}".format(sceneName, fileName)
inputFileName = os.path.join(sceneName, "color", fileName)
# imageId = os.path.splitext(outputFileName)[0]
imageId = beginImageId
beginImageId += 1
# image entry, id for image is its filename without extension
images.append({"id": imageId,
"width": int(originalFormat.shape[1]),
"height": int(originalFormat.shape[0]),
"file_name": inputFileName.replace(".png", ".jpg")})
# "file_name": inputFileName})
# pan_format = np.zeros(
# (originalFormat.shape[0], originalFormat.shape[1], 3), dtype=np.uint8
# )
segmentIds = np.unique(originalFormat)
segmInfo = []
for i_seg, segmentId in enumerate(segmentIds):
isCrowd = 0
if segmentId < 1000:
semanticId = segmentId
else:
semanticId = segmentId // 1000
if semanticId not in EVAL_LABELS:
continue
if thingsOnly and not cls_is_things[semanticId]: # iTom
continue
mask = originalFormat == segmentId
color = [segmentId % 256, segmentId // 256, segmentId // 256 // 256]
# pan_format[mask] = color
area = np.sum(mask) # segment area computation
# bbox computation for a segment
hor = np.sum(mask, axis=0)
hor_idx = np.nonzero(hor)[0]
x = hor_idx[0]
width = hor_idx[-1] - x + 1
vert = np.sum(mask, axis=1)
vert_idx = np.nonzero(vert)[0]
y = vert_idx[0]
height = vert_idx[-1] - y + 1
bbox = [int(x), int(y), int(width), int(height)]
segmInfo.append({"id": int(segmentId),
"category_id": int(semanticId),
"area": int(area),
"bbox": bbox,
"iscrowd": isCrowd})
# COCO objectoin detectoin format:
# - https://cocodataset.org/#format-data
# ref:
# - https://zhuanlan.zhihu.com/p/29393415
# - https://zhuanlan.zhihu.com/p/263454360
# polygon = binary_mask_to_polygon(mask) # wiredly large, discarded
polygon = []
# # annoId = imageId + "_" + str(i_seg) # "scene0046_00__000200_2"
# spaceId, scanId = sceneName.split("scene")[1].split("_")
# imgFileNum = os.path.splitext(fileName)[0]
# annoId = int(spaceId) * 1000000 + int(scanId) * 10000 + int(imgFileNum) + i_seg
# # print("annoId:", annoId, "<-", spaceId, scanId, imgFileNum, i_seg)
annoId = beginAnnoId
beginAnnoId += 1
annotations.append({'id': annoId,
'image_id': imageId,
'category_id': int(semanticId),
"segmentation": polygon,
'area': int(area),
'bbox': bbox,
"iscrowd": isCrowd})
# break # debug
## iTom: original panoptic annotation, removed
# annotations.append({'image_id': imageId,
# 'file_name': outputFileName,
# "segments_info": segmInfo})
# Image.fromarray(pan_format).save(os.path.join(panopticFolder, outputFileName))
print("\rProgress: {:>3.2f} %".format((progress + 1) * 100 / len(files)), end=' ')
sys.stdout.flush()
# break # debug
print("\nSaving the json file {}".format(outFile))
d = {'images': images,
'annotations': annotations,
'categories': categories}
with open(outFile, 'w') as f:
json.dump(d, f, sort_keys=True, indent=4)
return beginAnnoId, beginImageId
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--dataset-folder",
dest="scannetPath",
help="path to the ScanNet data 'scannet_frames_25k' folder",
required=True,
type=str)
parser.add_argument("--output-folder",
dest="outputFolder",
help="path to the output folder.",
default=None,
type=str)
# iTom-added, optional
parser.add_argument("--subset-tag",
dest="subsetTag",
help="(iTom, optional) distinguishing str for train/val separation",
default=None,
type=str)
parser.add_argument("--begin-anno-id",
dest="beginAnnoId",
help="(iTom) annotation IDs will start from this number." \
"When convert for each subset sequentially, " \
"use this to ensure that there is no duplicated annotation ID",
default=0,
type=int)
parser.add_argument("--begin-image-id",
dest="beginImageId",
help="(iTom) image IDs will start from this number." \
"When convert for each subset sequentially, " \
"use this to ensure that there is no duplicated annotation ID",
default=0,
type=int)
parser.add_argument('--things-only',
dest="thingsOnly",
action="store_true",
help="keep thing classes & drop stuff classes")
args = parser.parse_args()
last_unused_anno_id, last_unused_image_id = convert2panoptic(
args.scannetPath, args.outputFolder, args.subsetTag, args.beginAnnoId, args.beginImageId, args.thingsOnly)
# record the last unused annotation & image ID to interact with `scripts/split-cvt2coco.sh`
with open("last-unused-anno-id.txt", "w") as f:
f.write(str(last_unused_anno_id))
with open("last-unused-image-id.txt", "w") as f:
f.write(str(last_unused_image_id))
# call the main
if __name__ == "__main__":
main()
主体还是 convert2panoptic.py 的内容,改变有:
annotations
;
segmentation
域按 [9,10] 介绍,用 polygon 格式(因为原 convert2panoptic.py 写死 isCrowd = 0
),而由 binary mask 计算 polygon 的代码抄自 [13],相关可见 [14-16]。annotations
的长度,即 annotation 的个数,等于整个数据(子)集中的 bounding box 数量,那 annotations/id
这个域应该给每个 bbox 赋一个不同的整数 ID 就行。注意:一定要是一个数,否则会报错,说不能转化为数字。annotations/id
的惟一性。而我又针对这个数据集统计过:image 文件名的数字都是 100 整数倍,除以 100 之后不超过 100;每幅 image 内 segmentation 的个数也不超过 100。是以有代码中 annoId
的计算方式。images/file_name
这个域将 .png
后缀改为 .jpg
。
.png
后缀,然而实际上 color/ 下的 RGB 帧是 .jpg
后缀的。调用:
#!/bin/bash
DEST=data/scannet-frames # 前面 split 的输出
anno_id=0 # interact with `convert-scannet-coco-objdet.py`
image_id=0
for subset in train val; do
python convert-scannet-coco-objdet.py \
--dataset-folder $DEST/$subset \
--output-folder $DEST \
--things-only \
--subset-tag $subset \
--begin-anno-id $anno_id \
--begin-image-id $image_id
# update beginning (i.e. last unused) annotation & image ID
anno_id=`cat last-unused-anno-id.txt`
image_id=`cat last-unused-image-id.txt`
done
rm last-unused-anno-id.txt
rm last-unused-image-id.txt
因为后续需要用到 MMDetection 的文件,所以用源码安装的方式,参考 [4] 的安装教程:get_started.md,安装脚本:
#!/bin/bash
# env-mmdetection.sh
echo conda 安装虚拟环境
CONDA_P=~/miniconda3
ENV=openmmlab
if [ ! -d $CONDA_P/envs/$ENV ]; then
conda create --name $ENV python=3.8 -y
fi
CONDA_BIN=$CONDA_P/envs/$ENV/bin
$CONDA_BIN/pip install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
# used in mmdetection/demo/video_gpuaccel_demo.py
$CONDA_BIN/pip install ffmpegcv scipy scikit-image
conda install -n $ENV ffmpeg -y
$CONDA_BIN/pip install -U openmim
$CONDA_BIN/mim install mmcv-full==1.6.0 mmengine
# avoid bug: KeyError: 'Cascade Mask R-CNN'
# (i.e. open-mmlab/mim issues #125)
# https://github.com/open-mmlab/mim/issues/125
$CONDA_BIN/mim install mmdet==2.24.0
if [ ! -d mmdetection ]; then
echo try to clone from the original github repo
git clone https://github.com/open-mmlab/mmdetection.git
# git submodule add https://github.com/open-mmlab/mmdetection.git
if [ $? -ne 0 ]; then
echo * FAILED to clone from github
echo clone from a gitee transit repo instead
git clone https://gitee.com/xoxleoxox/mmdetection
# git submodule add https://gitee.com/xoxleoxox/mmdetection
fi
fi
cd mmdetection
$CONDA_BIN/pip install -v -e .
echo 验证安装
$CONDA_BIN/mim download mmdet --config yolov3_mobilenetv2_320_300e_coco --dest .
$CONDA_BIN/python demo/image_demo.py demo/demo.jpg yolov3_mobilenetv2_320_300e_coco.py \
yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth --device cpu --out-file result.jpg
注:之前从源码安装,会在验证安装、跑 demo 的时候报过一个错,见 [17],所以里面有句装 mmdet==2.24.0
的;但训练如果用 mmdet 2.24.0 又会报错,还是需要从源码安装新的版本,而这会覆盖掉 2.24.0 的旧版。
要用到 MMDetection 的代码,故用了 git submodule 在工程目录中添加 MMDetection 的仓库,参考 [18-20]。脚本:
#!/bin/bash
# add-submodules.sh
# echo rvc_devkit
# git submodule add https://github.com/ozendelait/rvc_devkit.git
# if [ $? -ne 0 ]; then
# git submodule add https://gitee.com/tyloeng/rvc_devkit.git
# fi
echo mmdetection
git submodule add https://github.com/open-mmlab/mmdetection.git
if [ $? -ne 0 ]; then
git submodule add https://gitee.com/xoxleoxox/mmdetection
fi
# echo ScanNet
# git submodule add https://github.com/ScanNet/ScanNet.git
# if [ $? -ne 0 ]; then
# git submodule add https://gitee.com/gxdcode/ScanNet.git
# fi
git submodule update --init --recursive
git submodule update --remote
#CONDA_P=~/miniconda3
#ENV=openmmlab
#CONDA_BIN=$CONDA_P/envs/$ENV/bin
#cd rvc_devkit
#$CONDA_BIN/pip install -r requirements.txt
#cd objdet
#$CONDA_BIN/pip install -r requirements.txt
利用 MMDetection 在新数据集上训练已经写好的模型可以参考 [4] 的示例 2_new_data_model.md、1_exist_data_model.md。数据前面准备好了,这里主要是要准备配置文件。我放照 [4] 的结构,在自己工程目录也建了个 configs/ 目录,从 [4] 复制了两份配置文件并改名:
此时工程目录结构形如:
my-project/
|- convert-scannet-coco-objdet.py
|- split-scannet.py
|- data/
| `- scannet-frames/
| |- train/ # splitting 产生
| |- val/ # splitting 产生
| |- scannet_objdet_train/ # anno 格式转化产生
| |- scannet_objdet_val/ # anno 格式转化产生
| |- scannet_objdet_train.json # anno 格式转化产生
| `- scannet_objdet_val.json # anno 格式转化产生
|- mmdetection/ # submodule
|- configs/ # 仿照 mmdetection/configs/ 结构
| |- common/
| | `- mstrain_3x_scannet.py
| `- faster_rcnn/
| `- faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py
`- scripts/
|- add-modules.sh
|- env-mmdetection.sh
|- find_gpu.sh
`- train-faster-rcnn-scannet-frames.sh
其中两个配置文件为:
data
部分改成自己的、改 _base_
引用路径)classes
、data/train/dataset/classes
、data/val/classes
、data/test/classes
。(但这个修改我未测试过,不知道有没有其它要相应改的。)## iTom Notes
# Inherited from `mmdetection/configs/common/mstrain_3x_coco.py`,
# this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.
import os.path as osp
_base_ = '../../mmdetection/configs/_base_/default_runtime.py'
# dataset settings
dataset_type = 'CocoDataset'
classes = (
"wall", "floor", "cabinet", "bed", "chair",
"sofa", "table", "door", "window", "bookshelf",
"picture", "counter", "desk", "curtain", "refrigerator",
"shower curtain", "toilet", "sink", "bathtub", "otherfurniture"
)
data_root = 'data/scannet-frames/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)],
# multiscale_mode='range'
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='Resize',
img_scale=[(1333, 640), (1333, 800)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
# Use RepeatDataset to speed up training
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='RepeatDataset',
times=3,
dataset=dict(
type=dataset_type,
ann_file=osp.join(data_root, 'scannet_panoptic_train.json'),
img_prefix=osp.join(data_root, 'train/'),
pipeline=train_pipeline,
classes=classes)),
val=dict(
type=dataset_type,
ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
img_prefix=osp.join(data_root, 'val/'),
pipeline=test_pipeline,
classes=classes),
test=dict(
type=dataset_type,
ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
img_prefix=osp.join(data_root, 'val/'),
pipeline=test_pipeline,
classes=classes))
evaluation = dict(interval=1, metric='bbox')
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
# Experiments show that using step=[9, 11] has higher performance
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[9, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
_base_
引用上述自己改的配置文件、改引用路径)model/roi_head/bbox_head/num_classes
。(但这个修改我未测试过,不知道有没有其它要相应改的。)## iTom Notes
# Inherited from `mmdetection/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_coco.py`,
# this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.
_base_ = [
# '../common/mstrain_3x_coco.py',
'../common/mstrain_3x_scannet.py',
# '../_base_/models/faster_rcnn_r50_fpn.py'
'../../mmdetection/configs/_base_/models/faster_rcnn_r50_fpn.py'
]
model = dict(
backbone=dict(
type='ResNeXt',
depth=101,
groups=64,
base_width=4,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
style='pytorch',
init_cfg=dict(
type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d')),
roi_head=dict(bbox_head=dict(num_classes=20)))
用 MMDetection 提供的脚本进行分布式训练:
#!/bin/bash
# train-faster-rcnn-scannet-frames.sh
clear
echo run \`conda activate openmmlab\` first
config=configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py
. scripts/find_gpu.sh 4 14845
PATH=/usr/local/cuda/bin:$PATH \
PYTHONPATH=mmdetection/mmdet:$PYTHONPATH \
CUDA_VISIBLE_DEVICES=${gpu_id} \
MMDET_DATASETS=`pwd`/data/scannet-frames/ \
bash mmdetection/tools/dist_train.sh \
$config ${n_gpu_found}
# python mmdetection/tools/train.py \
# $config
其中:
$PATH
开头,保证使用 cuda 目录里的 nvcc
而不是 /usr/bin/nvcc,见 [22]。执行 bash scripts/train-faster-rcnn-scannet-frames.sh
开始训练。
num_classes
(3) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES
80) in CocoDataset #4828