MMDetection在ScanNet上训练

(2022.10.24)注意,scannet-frames-25k 这个子集的 instance maps 与原始 ScanNetV2 的 instance maps(即 scene*_*/instance/ 或 scene*_*/instance-filt/ 里面的那些 .png)格式是同的,[5] 有讲这一点。所以如果想用原始 ScanNetV2 数据训练,在用本文改的转换代码转成 COCO object detection 格式 annotations 之前,要用原始 ScanNetV2 中的

  • 原始 instance maps:scene*_*/instance/*.png(或 scene*_*/instance-filt/*.png),和
  • 原始 label maps:scene*_*/label/*.png(或 scene*_*/label-filt/*.png

合并成 scannet-frames-25k 格式的 instance maps 先。步骤是:

  1. 原始 label maps 中的 class IDs 是 NYU40 的 class ID,要用 scannetv2-labels.combined.tsv 将其转换成 ScanNet 的 class IDs;
  2. 合并 instance map 和 label map:instance_map = 1000 * raw_label_map + raw_instance_map

这两步转换分别在 [2] 中 ScanNet/BenchmarkScripts/2d_helpers/ 下的 convert_scannet_label_image.pyconvert_scannet_instance_image.py 中有实现。(不过,自己画一下原始 instance map 就会发现,同一 scan 下的 instance ID 是全局的,即同一 scan 下同一 instance 在不同帧里拥有相同 ID,可以跨帧追踪同一个 instance,evaluate 的时候也许会用得上。[2] 提供的 instance map 转换代码好像会破坏这一特性,需要的话可以自己 hack 一份,保留全局 instance ID


需要用 ScanNet[1,2] 训练一个 object detection 模型,使用 MMDetection[3,4]。步骤:

  1. 下载 ScanNet-frames-25k(ScanNet 的子集);
  2. 划分数据集;
  3. 将 annotations 转化成 COCO object detecion 的格式;
  4. 配 MMDetection 环境(从源码安装);
  5. 改 MMDetection 文件,用来训练。

ScanNet

ScanNet 主要用于 3D 领域,但其数据形式是 RGB-D 序列,其中 RGB 的序列可以当视频,有 v1、v2 两个版本,完整的 v2 大约 1.8T。其它信息见 [1-2,5-7],下载脚本 download-scannet.py 见 [8]。

[5] 有提到 scannet_frames_25k 这个子集,本文主要用它。对照 [8] 的代码可知,它是从完整的 v2 中抽出来的,大概每 100 帧抽一帧。下载的文件是:

  • scannet_frames_25k.zip,~5.6G,1513 份 scans(即 RGB-D 序列,这里简单当成 videos);
  • scannet_frames_test.zip,~610M,100 份 scans,对应的测试集。

执行:

python download-scannet.py -o . --preprocessed_frames
python download-scannet.py -o . --test_frames_2d

下载(我脚本下不了,把下载链粘到迅雷下的)。解压看文件结构:

scannet_frames_25k/
|- scene0000_00/	# 一份 scan
|  |- color/		# RGB 序列,视频 (jpg)
|  |- depth/		# depth 序列 (png)
|  |- instance/		# instance mask (png)
|  |- label/
|  |- pose/
|  |- intrinsics_color.txt
|  `- intrinsics_depth.txt
|- scene0000_01/	# 另一份 scan
...

scannet_frames_test/
|- scene0707_00/
|  |- color/
|  |- depth/
|  |- pose/
|  |- intrinsics_color.txt
|  `- intrinsics_depth.txt
|- scene0708_00/
...

可见 test set 的文件少了跟 label 有关的文件。

[2] 中提供了官方的划分,在 ScanNet/Tasks/Benchmark/ 下,分成 train/val/test 3 部分。对照其中 v2 的几个划分文件(txt)和上述两个 zip 可知:

  • scannet_frames_25k.zip = train + val
  • scannet_frames_test.zip = test

所以这个子集 scan 应该是跟完整版一样多,只是对每个 scan 下的序列抽样了。

Splitting

从 MMDetection 的 configuration 文件看,要将 train/val 分在不同的目录,并生成不同的 json 标注文件。因为 test 缺少 instance/,而后续转成 COCO 格式时又要用到这个目录的东西,所以本文弃用 test set 的数据,用 val。

  • 输出路径:data/scannet-frames/
# split-scannet.py
import os
import os.path as osp

"""split ScanNet
Only `scannet_frames_25k/` is used while `scannet_frames_test/` is ignored
    because the scenes in it have no `**/instance/` sub-folder which
    is needed by `convert2panopic.py`.
So I simply reuse validation set as the test set as in the COCO
    configuration file in MMDetection.
These 2 subset are then converted separately to produce separate
    annotation json files as needed in the configuration.
"""

DATA_ROOT = "/data"
# DATA_P = [osp.join(DATA_ROOT, p) for p in ("scannet_frames_25k", "scannet_frames_test")]
DATA_P = osp.join(DATA_ROOT, "scannet_frames_25k")

# 检查 scans 数
dir_list = next(os.walk(DATA_P))[1]
print("#data:", len(dir_list))  # 1513 -> ALL
print("conclusion: contains all data, only frames are down-sampled")

SPLIT_P = osp.join(os.environ["HOME"], "codes", "ScanNet", "Tasks", "Benchmark")
# scannet_frames_25k is only available for ScanNetv2
VER = "v2"
DEST = "data/scannet-frames"  # 代码目录中的 data/,不是 /data/
if not osp.exists(DEST):
    os.makedirs(DEST)

# test set has NO `**/instance/*.png` needed by `convert2panopic.py`
for subset in ["train", "val"]:
    split_file = osp.join(SPLIT_P, "scannet"+VER+"_"+subset+".txt")

    # soft-link all scans of this subset to `sub_dest`
    sub_dest = osp.join(DEST, subset)
    if not osp.exists(sub_dest):
        os.makedirs(sub_dest)

    with open(split_file, "r") as f:
        for line in f:
            line = line.strip()
            if "" == line:
                continue
            os.system("ln -s {} {}".format(
                osp.join(DATA_P, line),
                osp.join(sub_dest, line)))
    print(subset, "DONE")

Convert to COCO Format

MMDetection 推荐的组织数据方式之一是转化为 COCO 的格式[9-11]。rvc_devkit[12] 提供了一份转化的脚本:rconvert_scannet_coco.sh,其核心用的是 [2] 提供的转化代码:convert2panoptic.py。但这其实是为 panoptic segmentation 任务准备的(见 [9] 第 4 条),而此处于需要 object detection 的 annotation 格式([9] 第 1 条)。

  • (2022.10.8) 另一份转化代码[26]:scannet_train_val_to_efficientps.py,核心是 panoptic2detection_coco_format.py [27]

于是基于 convert2panoptic.py 改一份适用于 object detection 的转化代码(convert-scannet-coco-objdet.py):

# convert-scannet-coco-objdet.py

#!/usr/bin/python
#
# Convert to COCO-style panoptic segmentation format (http://cocodataset.org/#format-data).
#

"""iTom's modified version (2022.9.13)
This file is inherited from
    rvc_devkit/segmentation/conv_scannet/convert2panoptic.py
which is the same as
    ScanNet/BenchmarkScripts/convert2panoptic.py
But I modify it to fit objection detection format and be
    able to dinstiguish different subsets (i.e. train/val/test)
    in terms of the output json annotation files.

There are several modifications:

(a) Addition argment
    - an additional optional arguement of `convert2panoptic`:
        subset_tag, default = None
    - an additional optional command-line argument:
        --subset-tag -> args.subsetTag, default = None
If this argument is used, the name of output json annotation file
    will be modified accordingly.

(b) Move to COCO object detection annotation format instead of the
    original panoptic format. I borrowed the functions, i.e.
        - binary_mask_to_polygon
        - close_contour
    for polygon format segmentation info calculation. But the results
    are discarded due to
        - their wierdly large volumn
            (~38G for val set & ~114G for the training set !)
        - that they are not used in detection task
    If you want to reenable it, you may need to install
        - scikit-image
    (NOTE: I suspect this is buggy somewhere.)

(c) Change extension to ".jpg" in `images/file_name` field.
"""

# python imports
from __future__ import print_function, absolute_import, division, unicode_literals
from itertools import count
import os
import glob
import sys
import argparse
import json
import numpy as np

# iTom: for polygon calculation
from skimage import measure

# Image processing
from PIL import Image

EVAL_LABELS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39]
EVAL_LABEL_NAMES = ["wall", "floor", "cabinet", "bed", "chair", "sofa", "table", "door", "window", "bookshelf", "picture", "counter", "desk", "curtain", "refrigerator", "shower curtain", "toilet", "sink", "bathtub", "otherfurniture"]
EVAL_LABEL_CATS = ["indoor", "indoor", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "furniture", "appliance", "furniture", "furniture", "appliance", "furniture", "furniture"]
EVAL_LABEL_COLORS = [(174, 199, 232), (152, 223, 138), (31, 119, 180), (255, 187, 120), (188, 189, 34), (140, 86, 75), (255, 152, 150), (214, 39, 40), (197, 176, 213), (148, 103, 189), (196, 156, 148), (23, 190, 207), (247, 182, 210), (219, 219, 141), (255, 127, 14), (158, 218, 229), (44, 160, 44), (112, 128, 144), (227, 119, 194), (82, 84, 163)]

def splitall(path):
    allparts = []
    while 1:
        parts = os.path.split(path)
        if parts[0] == path:  # sentinel for absolute paths
            allparts.insert(0, parts[0])
            break
        elif parts[1] == path: # sentinel for relative paths
            allparts.insert(0, parts[1])
            break
        else:
            path = parts[0]
            allparts.insert(0, parts[1])
    return allparts


def close_contour(contour):
    """iTom: helper function for binary mask -> polygon conversion
    from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L20
    """
    if not np.array_equal(contour[0], contour[-1]):
        contour = np.vstack((contour, contour[0]))
    return contour


def binary_mask_to_polygon(binary_mask, tolerance=0):
    """iTom: Converts a binary mask to COCO polygon representation
    Args:
        binary_mask: a 2D binary numpy array where '1's represent the object
        tolerance: Maximum distance from original points of polygon to approximated
            polygonal chain. If tolerance is 0, the original coordinate array is returned.

    from: https://github.com/waspinator/pycococreator/blob/master/pycococreatortools/pycococreatortools.py#L35
    ref:
    - https://github.com/cocodataset/cocoapi/issues/131
    - https://stackoverflow.com/questions/68663512/image-segmentation-mask-to-polygon-for-coco-json
    - https://stackoverflow.com/questions/58884265/python-convert-binary-mask-to-polygon
    """
    polygons = []
    # pad mask to close contours of shapes which start and end at an edge
    padded_binary_mask = np.pad(binary_mask, pad_width=1, mode='constant', constant_values=0)
    contours = measure.find_contours(padded_binary_mask, 0.5)
    # contours = np.subtract(contours, 1)  # iTom: original but buggy
    for i in range(len(contours)):  # iTom: change to for-loop subtraction
        contours[i] = np.subtract(contours[0], 1)
    for contour in contours:
        contour = close_contour(contour)
        contour = measure.approximate_polygon(contour, tolerance)
        if len(contour) < 3:
            continue
        contour = np.flip(contour, axis=1)
        segmentation = contour.ravel().tolist()
        # after padding and subtracting 1 we may get -0.5 points in our segmentation
        segmentation = [0 if i < 0 else i for i in segmentation]
        polygons.append(segmentation)

    return polygons


# The main method
def convert2panoptic(scannetPath, outputFolder=None, subset_tag=None, beginAnnoId=0, beginImageId=0, thingsOnly=False):
    """iTom's modification
    subset_tag: str, an optional subset distinguishing string for
        train/val seperation. One can simply ignore it to get the
        original output json file name.
    """

    if outputFolder is None:
        outputFolder = scannetPath

    # find files
    search = os.path.join(scannetPath, "*", "instance", "*.png")
    files = glob.glob(search)
    files.sort()
    # quit if we did not find anything
    if not files:
        print(
            "Did not find any files for using matching pattern {}. Please consult the README.".format(search)
        )
        sys.exit(-1)
    # a bit verbose
    print("Converting {} annotation files.".format(len(files)))

    outputBaseFile = "scannet_objdet"
    if subset_tag is not None:
        outputBaseFile = outputBaseFile + "_" + subset_tag
        print("iTom: modifying json annotation file name to:", outputBaseFile)
    outFile = os.path.join(outputFolder, "{}.json".format(outputBaseFile))
    print("Json file with the annotations in panoptic format will be saved in {}".format(outFile))
    # panopticFolder = os.path.join(outputFolder, outputBaseFile)
    # if not os.path.isdir(panopticFolder):
    #     print("Creating folder {} for panoptic segmentation PNGs".format(panopticFolder))
    #     os.mkdir(panopticFolder)
    # print("Corresponding segmentations in .png format will be saved in {}".format(panopticFolder))

    categories = []
    cls_is_things = {}  # iTom
    for idx in range(len(EVAL_LABELS)):
        label = EVAL_LABELS[idx]
        name = EVAL_LABEL_NAMES[idx]
        cat = EVAL_LABEL_CATS[idx]
        color = EVAL_LABEL_COLORS[idx]
        isthing = label > 2
        cls_is_things[int(label)] = isthing  # iTom
        if thingsOnly and not isthing:  # iTom
            continue
        categories.append({'id': int(label),
                           'name': name,
                           'color': color,
                           'supercategory': cat,
                           'isthing': isthing})

    images = []
    annotations = []
    for progress, f in enumerate(files):

        originalFormat = np.array(Image.open(f))

        parts = splitall(f)
        fileName = parts[-1]
        sceneName = parts[-3]
        outputFileName = "{}__{}".format(sceneName, fileName)
        inputFileName = os.path.join(sceneName, "color", fileName)
        # imageId = os.path.splitext(outputFileName)[0]
        imageId = beginImageId
        beginImageId += 1
        # image entry, id for image is its filename without extension
        images.append({"id": imageId,
                       "width": int(originalFormat.shape[1]),
                       "height": int(originalFormat.shape[0]),
                       "file_name": inputFileName.replace(".png", ".jpg")})
                       # "file_name": inputFileName})

        # pan_format = np.zeros(
        #     (originalFormat.shape[0], originalFormat.shape[1], 3), dtype=np.uint8
        # )
        segmentIds = np.unique(originalFormat)
        segmInfo = []
        for i_seg, segmentId in enumerate(segmentIds):
            isCrowd = 0
            if segmentId < 1000:
                semanticId = segmentId
            else:
                semanticId = segmentId // 1000
            if semanticId not in EVAL_LABELS:
                continue
            if thingsOnly and not cls_is_things[semanticId]:  # iTom
                continue

            mask = originalFormat == segmentId
            color = [segmentId % 256, segmentId // 256, segmentId // 256 // 256]
            # pan_format[mask] = color

            area = np.sum(mask) # segment area computation

            # bbox computation for a segment
            hor = np.sum(mask, axis=0)
            hor_idx = np.nonzero(hor)[0]
            x = hor_idx[0]
            width = hor_idx[-1] - x + 1
            vert = np.sum(mask, axis=1)
            vert_idx = np.nonzero(vert)[0]
            y = vert_idx[0]
            height = vert_idx[-1] - y + 1
            bbox = [int(x), int(y), int(width), int(height)]

            segmInfo.append({"id": int(segmentId),
                            "category_id": int(semanticId),
                            "area": int(area),
                            "bbox": bbox,
                            "iscrowd": isCrowd})

            # COCO objectoin detectoin format:
            #   - https://cocodataset.org/#format-data
            # ref:
            #   - https://zhuanlan.zhihu.com/p/29393415
            #   - https://zhuanlan.zhihu.com/p/263454360

            # polygon = binary_mask_to_polygon(mask)  # wiredly large, discarded
            polygon = []

            # # annoId = imageId + "_" + str(i_seg)  # "scene0046_00__000200_2"
            # spaceId, scanId = sceneName.split("scene")[1].split("_")
            # imgFileNum = os.path.splitext(fileName)[0]
            # annoId = int(spaceId) * 1000000 + int(scanId) * 10000 + int(imgFileNum) + i_seg
            # # print("annoId:", annoId, "<-", spaceId, scanId, imgFileNum, i_seg)
            annoId = beginAnnoId
            beginAnnoId += 1

            annotations.append({'id': annoId,
                                'image_id': imageId,
                                'category_id': int(semanticId),
                                "segmentation": polygon,
                                'area': int(area),
                                'bbox': bbox,
                                "iscrowd": isCrowd})
            # break  # debug

        ## iTom: original panoptic annotation, removed
        # annotations.append({'image_id': imageId,
        #                     'file_name': outputFileName,
        #                     "segments_info": segmInfo})

        # Image.fromarray(pan_format).save(os.path.join(panopticFolder, outputFileName))

        print("\rProgress: {:>3.2f} %".format((progress + 1) * 100 / len(files)), end=' ')
        sys.stdout.flush()
        # break  # debug

    print("\nSaving the json file {}".format(outFile))
    d = {'images': images,
        'annotations': annotations,
        'categories': categories}
    with open(outFile, 'w') as f:
        json.dump(d, f, sort_keys=True, indent=4)

    return beginAnnoId, beginImageId


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dataset-folder",
                        dest="scannetPath",
                        help="path to the ScanNet data 'scannet_frames_25k' folder",
                        required=True,
                        type=str)
    parser.add_argument("--output-folder",
                        dest="outputFolder",
                        help="path to the output folder.",
                        default=None,
                        type=str)
    # iTom-added, optional
    parser.add_argument("--subset-tag",
                        dest="subsetTag",
                        help="(iTom, optional) distinguishing str for train/val separation",
                        default=None,
                        type=str)
    parser.add_argument("--begin-anno-id",
                        dest="beginAnnoId",
                        help="(iTom) annotation IDs will start from this number." \
                            "When convert for each subset sequentially, " \
                            "use this to ensure that there is no duplicated annotation ID",
                        default=0,
                        type=int)
    parser.add_argument("--begin-image-id",
                        dest="beginImageId",
                        help="(iTom) image IDs will start from this number." \
                            "When convert for each subset sequentially, " \
                            "use this to ensure that there is no duplicated annotation ID",
                        default=0,
                        type=int)
    parser.add_argument('--things-only',
                        dest="thingsOnly",
                        action="store_true",
                        help="keep thing classes & drop stuff classes")
    args = parser.parse_args()

    last_unused_anno_id, last_unused_image_id = convert2panoptic(
        args.scannetPath, args.outputFolder, args.subsetTag, args.beginAnnoId, args.beginImageId, args.thingsOnly)
    # record the last unused annotation & image ID to interact with `scripts/split-cvt2coco.sh`
    with open("last-unused-anno-id.txt", "w") as f:
        f.write(str(last_unused_anno_id))
    with open("last-unused-image-id.txt", "w") as f:
        f.write(str(last_unused_image_id))


# call the main
if __name__ == "__main__":
    main()

主体还是 convert2panoptic.py 的内容,改变有:

  • 按 [9] 中 COCO object detection 需要的格式改写 annotations
    • 其中 segmentation 域按 [9,10] 介绍,用 polygon 格式(因为原 convert2panoptic.py 写死 isCrowd = 0),而由 binary mask 计算 polygon 的代码抄自 [13],相关可见 [14-16]。
    • 但是我弃用了,因为这样做出来的 json 文件大得出奇(val ~38G、train ~114G;COCO 数据量大过 ScanNet 很多,整个 annotation 的 zip 才 ~241M),我怀疑有 bug,而且 object detection 似乎用不着。
    • 由 [10],annotations 的长度,即 annotation 的个数,等于整个数据(子)集中的 bounding box 数量,那 annotations/id 这个域应该给每个 bbox 赋一个不同的整数 ID 就行。注意:一定要是一个数,否则会报错,说不能转化为数字。
    • 我此处用将 space ID、scan ID、scan 内 image 文件名的数字、一幅 image 内的 segmentation 顺序压缩成一个整数的方法保证 annotations/id 的惟一性。而我又针对这个数据集统计过:image 文件名的数字都是 100 整数倍,除以 100 之后不超过 100;每幅 image 内 segmentation 的个数也不超过 100。是以有代码中 annoId 的计算方式。
  • 加了一个 subset tag 参数,使得 train/val 生成不同的 json 文件。
    • 简单忽略这个参数就可以像原 convert2panoptic.py 一样,只生成一个 json。
  • images/file_name 这个域将 .png 后缀改为 .jpg
    • 原本的代码是用 **/instance/*.png 枚举帧的,默认 .png 后缀,然而实际上 color/ 下的 RGB 帧是 .jpg 后缀的。

调用:

#!/bin/bash

DEST=data/scannet-frames  # 前面 split 的输出

anno_id=0  # interact with `convert-scannet-coco-objdet.py`
image_id=0
for subset in train val; do
    python convert-scannet-coco-objdet.py \
        --dataset-folder $DEST/$subset \
        --output-folder $DEST \
        --things-only \
        --subset-tag $subset \
        --begin-anno-id $anno_id \
        --begin-image-id $image_id

    # update beginning (i.e. last unused) annotation & image ID
    anno_id=`cat last-unused-anno-id.txt`
    image_id=`cat last-unused-image-id.txt`
done
rm last-unused-anno-id.txt
rm last-unused-image-id.txt

Environment

因为后续需要用到 MMDetection 的文件,所以用源码安装的方式,参考 [4] 的安装教程:get_started.md,安装脚本:

#!/bin/bash
# env-mmdetection.sh

echo conda 安装虚拟环境
CONDA_P=~/miniconda3
ENV=openmmlab
if [ ! -d $CONDA_P/envs/$ENV ]; then
    conda create --name $ENV python=3.8 -y
fi
CONDA_BIN=$CONDA_P/envs/$ENV/bin

$CONDA_BIN/pip install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
# used in mmdetection/demo/video_gpuaccel_demo.py
$CONDA_BIN/pip install ffmpegcv scipy scikit-image
conda install -n $ENV ffmpeg -y

$CONDA_BIN/pip install -U openmim
$CONDA_BIN/mim install mmcv-full==1.6.0 mmengine
# avoid bug: KeyError: 'Cascade Mask R-CNN'
#   (i.e. open-mmlab/mim issues #125)
# https://github.com/open-mmlab/mim/issues/125
$CONDA_BIN/mim install mmdet==2.24.0

if [ ! -d mmdetection ]; then
    echo try to clone from the original github repo
    git clone https://github.com/open-mmlab/mmdetection.git
    # git submodule add https://github.com/open-mmlab/mmdetection.git
    if [ $? -ne 0 ]; then
        echo * FAILED to clone from github
        echo clone from a gitee transit repo instead
        git clone https://gitee.com/xoxleoxox/mmdetection
        # git submodule add https://gitee.com/xoxleoxox/mmdetection
    fi
fi
cd mmdetection
$CONDA_BIN/pip install -v -e .

echo 验证安装
$CONDA_BIN/mim download mmdet --config yolov3_mobilenetv2_320_300e_coco --dest .
$CONDA_BIN/python demo/image_demo.py demo/demo.jpg yolov3_mobilenetv2_320_300e_coco.py \
    yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth --device cpu --out-file result.jpg

注:之前从源码安装,会在验证安装、跑 demo 的时候报过一个错,见 [17],所以里面有句装 mmdet==2.24.0 的;但训练如果用 mmdet 2.24.0 又会报错,还是需要从源码安装新的版本,而这会覆盖掉 2.24.0 的旧版。

Training

codes of MMDetection

要用到 MMDetection 的代码,故用了 git submodule 在工程目录中添加 MMDetection 的仓库,参考 [18-20]。脚本:

#!/bin/bash
# add-submodules.sh

# echo rvc_devkit
# git submodule add https://github.com/ozendelait/rvc_devkit.git
# if [ $? -ne 0 ]; then
#     git submodule add https://gitee.com/tyloeng/rvc_devkit.git
# fi

echo mmdetection
git submodule add https://github.com/open-mmlab/mmdetection.git
if [ $? -ne 0 ]; then
    git submodule add https://gitee.com/xoxleoxox/mmdetection
fi

# echo ScanNet
# git submodule add https://github.com/ScanNet/ScanNet.git
# if [ $? -ne 0 ]; then
#     git submodule add https://gitee.com/gxdcode/ScanNet.git
# fi

git submodule update --init --recursive
git submodule update --remote

#CONDA_P=~/miniconda3
#ENV=openmmlab
#CONDA_BIN=$CONDA_P/envs/$ENV/bin

#cd rvc_devkit
#$CONDA_BIN/pip install -r requirements.txt
#cd objdet
#$CONDA_BIN/pip install -r requirements.txt

configuration files

利用 MMDetection 在新数据集上训练已经写好的模型可以参考 [4] 的示例 2_new_data_model.md、1_exist_data_model.md。数据前面准备好了,这里主要是要准备配置文件。我放照 [4] 的结构,在自己工程目录也建了个 configs/ 目录,从 [4] 复制了两份配置文件并改名:

  • mstrain_3x_scannet.py(来自 mmdetection/configs/common/mstrain_3x_coco.py
  • faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py(来自 mmdetection/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_coco.py

此时工程目录结构形如:

my-project/
|- convert-scannet-coco-objdet.py
|- split-scannet.py
|- data/
|  `- scannet-frames/
|     |- train/						# splitting 产生
|     |- val/						# splitting 产生
|     |- scannet_objdet_train/		# anno 格式转化产生
|     |- scannet_objdet_val/		# anno 格式转化产生
|     |- scannet_objdet_train.json	# anno 格式转化产生
|     `- scannet_objdet_val.json	# anno 格式转化产生
|- mmdetection/						# submodule
|- configs/							# 仿照 mmdetection/configs/ 结构
|  |- common/
|  |  `- mstrain_3x_scannet.py
|  `- faster_rcnn/
|     `- faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py
`- scripts/
   |- add-modules.sh
   |- env-mmdetection.sh
   |- find_gpu.sh
   `- train-faster-rcnn-scannet-frames.sh

其中两个配置文件为:

  • mstrain_3x_scannet.pydata 部分改成自己的、改 _base_ 引用路径)
  • (2022.9.17)参考 [24-25],将类集改为 ScanNet 的,要改 classesdata/train/dataset/classesdata/val/classesdata/test/classes。(但这个修改我未测试过,不知道有没有其它要相应改的。
## iTom Notes
# Inherited from `mmdetection/configs/common/mstrain_3x_coco.py`,
# this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.
import os.path as osp

_base_ = '../../mmdetection/configs/_base_/default_runtime.py'
# dataset settings
dataset_type = 'CocoDataset'
classes = (
    "wall", "floor", "cabinet", "bed", "chair",
    "sofa", "table", "door", "window", "bookshelf",
    "picture", "counter", "desk", "curtain", "refrigerator",
    "shower curtain", "toilet", "sink", "bathtub", "otherfurniture"
)
data_root = 'data/scannet-frames/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

# In mstrain 3x config, img_scale=[(1333, 640), (1333, 800)],
# multiscale_mode='range'
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Resize',
        img_scale=[(1333, 640), (1333, 800)],
        multiscale_mode='range',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

# Use RepeatDataset to speed up training
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='RepeatDataset',
        times=3,
        dataset=dict(
            type=dataset_type,
            ann_file=osp.join(data_root, 'scannet_panoptic_train.json'),
            img_prefix=osp.join(data_root, 'train/'),
            pipeline=train_pipeline,
            classes=classes)),
    val=dict(
        type=dataset_type,
        ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
        img_prefix=osp.join(data_root, 'val/'),
        pipeline=test_pipeline,
        classes=classes),
    test=dict(
        type=dataset_type,
        ann_file=osp.join(data_root, 'scannet_panoptic_val.json'),
        img_prefix=osp.join(data_root, 'val/'),
        pipeline=test_pipeline,
        classes=classes))
evaluation = dict(interval=1, metric='bbox')

# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)

# learning policy
# Experiments show that using step=[9, 11] has higher performance
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[9, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
  • faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py_base_ 引用上述自己改的配置文件、改引用路径)
  • (2022.9.17)参考 [24-25],将类集改为 ScanNet 的,要改 model/roi_head/bbox_head/num_classes。(但这个修改我未测试过,不知道有没有其它要相应改的。
## iTom Notes
# Inherited from `mmdetection/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_coco.py`,
# this file is designed for training Faster R-CNN on converted ScanNet-frames-25k.

_base_ = [
    # '../common/mstrain_3x_coco.py',
    '../common/mstrain_3x_scannet.py',
    # '../_base_/models/faster_rcnn_r50_fpn.py'
    '../../mmdetection/configs/_base_/models/faster_rcnn_r50_fpn.py'
]
model = dict(
    backbone=dict(
        type='ResNeXt',
        depth=101,
        groups=64,
        base_width=4,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        style='pytorch',
        init_cfg=dict(
            type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d')),
    roi_head=dict(bbox_head=dict(num_classes=20)))

training

用 MMDetection 提供的脚本进行分布式训练:

#!/bin/bash
# train-faster-rcnn-scannet-frames.sh
clear

echo run \`conda activate openmmlab\` first

config=configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_mstrain_3x_scannet.py

. scripts/find_gpu.sh 4 14845

PATH=/usr/local/cuda/bin:$PATH \
PYTHONPATH=mmdetection/mmdet:$PYTHONPATH \
CUDA_VISIBLE_DEVICES=${gpu_id} \
MMDET_DATASETS=`pwd`/data/scannet-frames/ \
bash mmdetection/tools/dist_train.sh \
    $config ${n_gpu_found}
# python mmdetection/tools/train.py \
#     $config

其中:

  • find_gpu.sh 见 [21];
  • 将 cuda 的 bin/ 目录放在 $PATH 开头,保证使用 cuda 目录里的 nvcc 而不是 /usr/bin/nvcc,见 [22]。

执行 bash scripts/train-faster-rcnn-scannet-frames.sh 开始训练。

References

  1. (CVPR 2017) ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
  2. ScanNet/ScanNet
  3. (arXiv 2019) MMDetection: Open MMLab Detection Toolbox and Benchmark
  4. open-mmlab/mmdetection
  5. ScanNet Benchmark
  6. 关于ScanNet数据集
  7. 深度学习(1)RGB-D数据集:ScanNet
  8. scannet数据集下载文件
  9. COCO | Data format
  10. COCO数据集的标注格式
  11. COCO数据集标注详解
  12. ozendelait/rvc_devkit
  13. waspinator/pycococreator
  14. convert mask binary image to polygon format #131
  15. Image segmentation mask to polygon for coco json
  16. Python - convert binary mask to polygon
  17. KeyError: ‘Cascade Mask R-CNN’ #125
  18. Git Tools - Submodules
  19. Git submodule 子模块的管理和使用
  20. Git Submodule使用完整教程
  21. shell监视gpu使用情况
  22. 装detectron2报错:nvcc fatal : No input files specified; use option --help for more information
  23. facebookresearch/detr/datasets/coco.py/convert_coco_poly_to_mask
  24. AssertionError: The num_classes (3) in Shared2FCBBoxHead of MMDataParallel does not matches the length of CLASSES 80) in CocoDataset #4828
  25. Train with customized datasets | Prepare a config
  26. ScanNet-EfficientPS/tools/scannet_train_val_to_efficientps.py
  27. panopticapi/converters/panoptic2detection_coco_format.py

你可能感兴趣的:(机器学习,ScanNet,MMDetection,COCO,目标检测,分布式训练,1024程序员节)