【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第1张图片

CVPR-2021


文章目录

  • 1 Background and Motivation
  • 2 Related Work
  • 3 Advantages / Contributions
  • 4 Method
  • 5 Experiments
    • 5.1 Datasets
    • 5.2 Copy-Paste is robust to training configurations
    • 5.3 Copy-Paste helps data-efficiency
    • 5.4 Copy-Paste and self-training are additive
    • 5.5 Copy-Paste improves COCO state-of-the-art
    • 5.6 Copy-Paste produces better representations for PASCAL detection and segmentation
    • 5.7 Copy-Paste provides strong gains on LVIS
  • 6 Appendix
    • 6.1 Ablation on the Copy-Paste method
    • 6.2 Copy-Paste provides more gain on harder categories of COCO
    • 6.3 How likely objects are copied to an un-matched scene?
    • 6.4 Benchmark results on different object sizes
  • 7 Conclusion(own)


1 Background and Motivation

Instance segmentation often data-hungry

人工标注成本较高,本文聚焦 data augmentation 类方法来缓解上述问题

虽然有许多 data augmentation 方法被提出,但 more general-purpose in nature and have not been designed specifically for instance segmentation.

本文作者提出了 Copy-Paste 实例分割数据增广方法,randomly picking objects and pasting them at random locations on the target image

作者方法和类似于 【Cut, Paste and Learn】《Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection》,区别在于

1)not use geometric transformationss (e.g.rotation),find Gaussian blurring of the pasted instances not beneficial

2)拓展到了 semi-supervised learning

3)pasting objects contained in one image into another image already populated with instances(区别于全前景贴全背景)

4)数据集从 CMU 到了应用更广的 COCO 和 LVIS

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第2张图片

2 Related Work

  • Data Augmentations
    mainly used for encoding invariances to data transformations, 有利于分类
  • Mixing Image Augmentations(mixup, CutMix and Mosaic)
    still not object-aware and have not been designed specifically for the task of instance segmentation
  • Copy-Paste Augmentation
  • Instance Segmentation
  • Long-Tail Visual Recognition
    data re-samplin and loss re-weighting,作者方法 yields significant gains

3 Advantages / Contributions

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第3张图片

提出了 Copy-Paste 方法,provides a significant boost on top of baselines across multiple settings.

it gives solid improvements across a wide range of settings with variability in backbone architecture, extent of scale jittering, training schedule and image size.

4 Method

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第4张图片

  • scale jittering and random horizontal flipping

  • select a random subset of objects from one of the images and paste them onto the other image

  • adjust the ground-truth annotations:remove fully occluded objects and update the masks and bounding boxes of partially occluded objects.

generated images can look very different from real images

1)Blending Pasted Objects

I 1 × α + I 2 × ( 1 − α ) I_1 \times \alpha + I_2 \times (1-\alpha) I1×α+I2×(1α)

其中 α \alpha α 为 binary mask, I 1 I_1 I1 is the pasted image and I 2 I_2 I2 is the main image

simply composing without any blending has similar performance,不像 【Cut, Paste and Learn】《Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection》 中探索了不同的 composing 方法

2)Large Scale Jittering

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第5张图片

  • standard scale jittering (SSJ) :0.8~1.25

  • large scale jittering (LSJ):0.1~2.0

3)Self-training Copy-Paste

标准的 self-training 流程

  • train a supervised model with Copy-Paste augmentation on labeled data
  • generate pseudo labels on unlabeled data
  • paste ground-truth instances into pseudo labeled and supervised labeled images and train a model on this new data

5 Experiments

training a model from scratch with large scale jittering and Copy-Paste augmentation requires 576 epochs while training with only standard scale jittering takes 96 epochs.

加了 scale jittering 和 Copy-Paste 训练的更久更猛

5.1 Datasets

  • COCO
  • VOC
  • LVIS
    LVIS has 1203 classes to simulate the long-tail distribution of classes in natural images.【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第6张图片
    【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第7张图片

5.2 Copy-Paste is robust to training configurations

robust across a variety of training iterations, models and training hyperparameters.

1)Robustness to backbone initialization

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第8张图片

2)Robustness to training schedules

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第9张图片
训练更久,性能还能进一步提升

3)Copy-Paste is additive to large scale jittering augmentation
【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第10张图片
mixup 搭配 SSJ 还行,mixup 搭配 LSJ 效果基本被抵消,但 Copy-Paste 和 LSJ 兼容得不错

4)Copy-Paste works across backbone architectures and image sizes

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第11张图片
看看不同尺寸的提升

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第12张图片

5.3 Copy-Paste helps data-efficiency

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第13张图片
用更少的数据达到同样的效果

a model trained on 75% of COCO with Copy-Paste and LSJ has a similar AP to a model trained on 100% of COCO with LSJ.

5.4 Copy-Paste and self-training are additive

self-training 相关实验,引入了额外的数据集
【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第14张图片

1)Data to Paste on
【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第15张图片

  • supervised COCO data (120k images)
  • pseudo labeled data (110k images from unlabeled COCO and 610k from Objects365).

表3 的结果可以和表2 对应上

2)Data to Copy from

pasting pseudo labeled objects from an unlabeled dataset directly into the COCO labeled dataset.

反过来贴,no additional AP improvements.

5.5 Copy-Paste improves COCO state-of-the-art

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第16张图片

5.6 Copy-Paste produces better representations for PASCAL detection and segmentation

trained with Copy-Paste on COCO.

transfer learning experiments on the PASCAL VOC 2007 dataset.

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第17张图片
【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第18张图片

5.7 Copy-Paste provides strong gains on LVIS

长尾数据集

two different training paradigms typically used for LVIS:

  • single-stage where a detector is trained directly on the LVIS dataset

  • two-stage where the model from the first stage is fine-tuned with class re-balancing losses to help handle the class imbalance.
    损失的形式为 ( 1 − β ) / ( 1 − β n ) (1−β)/(1−β^n) (1β)/(1βn),where n n n is the number of instances of the class and β = 0.999 β = 0.999 β=0.999

1)Copy-Paste improves single-stage LVIS training

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第19张图片
APr (the AP for rare classes)

Repeat Factor Sampling (RFS) are used to handle the class imbalance problem on LVIS

可以看到 Copy-Paste 和 RFS 可兼容

2)Copy-Paste improves two-stage LVIS training

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第20张图片

3)Comparison with the state-of-the-art

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第21张图片

6 Appendix

6.1 Ablation on the Copy-Paste method

1)Subset of pasted objects

效果最好的是 pasting a random subset of objects,one 或者 all 都没有 random 的好
【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第22张图片

2)Blending

表10 可以看出,不太 care blending 的形式,不用也没有掉点

3)Scale jittering

random scale jittering on both the pasted image (image that pasted objects are being copied from) and the main image.
【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第23张图片
引入 LSJ 主要的提升来自于对 main image 的增广

6.2 Copy-Paste provides more gain on harder categories of COCO

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第24张图片
横坐标按 baseline 各类别的 AP 排序的,纵坐标是提升率,不是提升的点

6.3 How likely objects are copied to an un-matched scene?

compute the probability of copying objects to an unmatched scene category

仅区别室内室外场景

We found there are 42538 indoor and 71017 outdoor images (we couldn’t estimate the category of the rest 4732 images).
【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第25张图片
copy objects to an unmatched scene in about half (46.8%) of generated images.(23.4%+23.4%)

6.4 Benchmark results on different object sizes

【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第26张图片

7 Conclusion(own)

  • 效果确实很惊艳,拿下COCO目标检测和实例分割双料第一名!目标检测数据刷到57.3 AP,实例分割刷到49.1 AP!(Table4)

  • Instaboost: Boosting instance segmentation via probability map guided copypasting. In ICCV, 2019.

  • 半监督学习之self-training

    【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第27张图片

  • 复制-粘贴大法(Copy-Paste):简单而有效的数据增强
    【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第28张图片

  • 【Copy-Paste】《Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation》_第29张图片

  • 代码复现:Copy-Paste 数据增强for 语义分割
    数据格式是 VOC 版本的

"""
Unofficial implementation of Copy-Paste for semantic segmentation
"""
 
from PIL import Image
import imgviz
import cv2
import argparse
import os
import numpy as np
import tqdm
 
 
def save_colored_mask(mask, save_path):
    lbl_pil = Image.fromarray(mask.astype(np.uint8), mode="P")
    colormap = imgviz.label_colormap()
    lbl_pil.putpalette(colormap.flatten())
    lbl_pil.save(save_path)
 
 
def random_flip_horizontal(mask, img, p=0.5):
    if np.random.random() < p:
        img = img[:, ::-1, :]
        mask = mask[:, ::-1]
    return mask, img
 
 
def img_add(img_src, img_main, mask_src):
    if len(img_main.shape) == 3:
        h, w, c = img_main.shape
    elif len(img_main.shape) == 2:
        h, w = img_main.shape
    mask = np.asarray(mask_src, dtype=np.uint8)
    sub_img01 = cv2.add(img_src, np.zeros(np.shape(img_src), dtype=np.uint8), mask=mask) # src 前景抠出来
    mask_02 = cv2.resize(mask, (w, h), interpolation=cv2.INTER_NEAREST)
    mask_02 = np.asarray(mask_02, dtype=np.uint8)
    sub_img02 = cv2.add(img_main, np.zeros(np.shape(img_main), dtype=np.uint8),
                        mask=mask_02) # img_main 对应的 src 前景区域抠出来
    img_main = img_main - sub_img02 + cv2.resize(sub_img01, (img_main.shape[1], img_main.shape[0]),
                                                 interpolation=cv2.INTER_NEAREST) # 去掉 main 中的 src前景区域,把 src 前景区域贴上来
    return img_main
 
 
def rescale_src(mask_src, img_src, h, w):
    if len(mask_src.shape) == 3:
        h_src, w_src, c = mask_src.shape
    elif len(mask_src.shape) == 2:
        h_src, w_src = mask_src.shape
    max_reshape_ratio = min(h / h_src, w / w_src)
    rescale_ratio = np.random.uniform(0.2, max_reshape_ratio)
 
    # reshape src img and mask
    rescale_h, rescale_w = int(h_src * rescale_ratio), int(w_src * rescale_ratio)
    mask_src = cv2.resize(mask_src, (rescale_w, rescale_h),
                          interpolation=cv2.INTER_NEAREST)
    # mask_src = mask_src.resize((rescale_w, rescale_h), Image.NEAREST)
    img_src = cv2.resize(img_src, (rescale_w, rescale_h),
                         interpolation=cv2.INTER_LINEAR)
 
    # set paste coord
    py = int(np.random.random() * (h - rescale_h))
    px = int(np.random.random() * (w - rescale_w))
 
    # paste src img and mask to a zeros background
    img_pad = np.zeros((h, w, 3), dtype=np.uint8)
    mask_pad = np.zeros((h, w), dtype=np.uint8)
    img_pad[py:int(py + h_src * rescale_ratio), px:int(px + w_src * rescale_ratio), :] = img_src
    mask_pad[py:int(py + h_src * rescale_ratio), px:int(px + w_src * rescale_ratio)] = mask_src
 
    return mask_pad, img_pad
 
 
def Large_Scale_Jittering(mask, img, min_scale=0.1, max_scale=2.0):
    rescale_ratio = np.random.uniform(min_scale, max_scale)
    h, w, _ = img.shape
 
    # rescale
    h_new, w_new = int(h * rescale_ratio), int(w * rescale_ratio)
    img = cv2.resize(img, (w_new, h_new), interpolation=cv2.INTER_LINEAR)
    mask = cv2.resize(mask, (w_new, h_new), interpolation=cv2.INTER_NEAREST)
    # mask = mask.resize((w_new, h_new), Image.NEAREST)
 
    # crop or padding
    x, y = int(np.random.uniform(0, abs(w_new - w))), int(np.random.uniform(0, abs(h_new - h)))
    if rescale_ratio <= 1.0:  # padding
        img_pad = np.ones((h, w, 3), dtype=np.uint8) * 168
        mask_pad = np.zeros((h, w), dtype=np.uint8)
        img_pad[y:y+h_new, x:x+w_new, :] = img
        mask_pad[y:y+h_new, x:x+w_new] = mask
        return mask_pad, img_pad
    else:  # crop
        img_crop = img[y:y+h, x:x+w, :]
        mask_crop = mask[y:y+h, x:x+w]
        return mask_crop, img_crop
 
 
def copy_paste(mask_src, img_src, mask_main, img_main):
    mask_src, img_src = random_flip_horizontal(mask_src, img_src)
    mask_main, img_main = random_flip_horizontal(mask_main, img_main)
 
    # LSJ, Large_Scale_Jittering
    if args.lsj:
        mask_src, img_src = Large_Scale_Jittering(mask_src, img_src)
        mask_main, img_main = Large_Scale_Jittering(mask_main, img_main)
    else:
        # rescale mask_src/img_src to less than mask_main/img_main's size
        h, w, _ = img_main.shape
        mask_src, img_src = rescale_src(mask_src, img_src, h, w)
 
    img = img_add(img_src, img_main, mask_src) # src 的前景抠出来贴 main
    mask = img_add(mask_src, mask_main, mask_src)
 
    return mask, img
 
 
def main(args):
    # input path
    segclass = os.path.join(args.input_dir, 'SegmentationClass')
    JPEGs = os.path.join(args.input_dir, 'JPEGImages')
 
    # create output path
    os.makedirs(args.output_dir, exist_ok=True)
    os.makedirs(os.path.join(args.output_dir, 'SegmentationClass'), exist_ok=True)
    os.makedirs(os.path.join(args.output_dir, 'JPEGImages'), exist_ok=True)
 
    masks_path = os.listdir(segclass)
    tbar = tqdm.tqdm(masks_path, ncols=100)
    for mask_path in tbar:
        # get source mask and img
        mask_src = np.asarray(Image.open(os.path.join(segclass, mask_path)), dtype=np.uint8)
        img_src = cv2.imread(os.path.join(JPEGs, mask_path.replace('.png', '.jpg')))
 
        # random choice main mask/img
        mask_main_path = np.random.choice(masks_path)
        mask_main = np.asarray(Image.open(os.path.join(segclass, mask_main_path)), dtype=np.uint8)
        img_main = cv2.imread(os.path.join(JPEGs, mask_main_path.replace('.png', '.jpg')))
 
        # Copy-Paste data augmentation
        mask, img = copy_paste(mask_src, img_src, mask_main, img_main) # 调用 copy_paste 方法
 
        mask_filename = "copy_paste_" + mask_path
        img_filename = mask_filename.replace('.png', '.jpg')
        save_colored_mask(mask, os.path.join(args.output_dir, 'SegmentationClass', mask_filename)) # 保存 mask
        cv2.imwrite(os.path.join(args.output_dir, 'JPEGImages', img_filename), img) # 保存合成后的图片
 
 
def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--input_dir", default="../dataset/VOCdevkit2012/VOC2012", type=str,
                        help="input annotated directory")
    parser.add_argument("--output_dir", default="../dataset/VOCdevkit2012/VOC2012_copy_paste", type=str,
                        help="output dataset directory")
    parser.add_argument("--lsj", default=True, type=bool, help="if use Large Scale Jittering")
    return parser.parse_args()
 
 
if __name__ == '__main__':
    args = get_args()
    main(args)

你可能感兴趣的:(CNN,人工智能,python,开发语言)