论文题目:FMix: Enhancing Mixed Sample Data Augmentation-2020
官方代码:FMix-Pytorch
近几年,样本融合的数据增强方法获得了越来越多的关注,比如一些很成功的变体-MixUp和CutMix。受到CutMix方法的启发,我们提出了FMix方法。FMix方法使用了从傅里叶空间中采样得到的低频图像的二值模板。FMix性能超过了MixUp和CutMix。
MixUp在样本之间进行插值,而CutMix方法从一个随机选取的图像中选择一块矩形区域,将该区域插入到另一个图像中,得到最终的输出图像。MixUp抑制了模型学习特定特征的能力,包括更细致的特征。
我们认为,CutMix能够使得模型保持对真实数据的良好了解,因为观察的特征通常仅仅来源于一个数据点。同时CutMix通过动态增加可观察数据点的数量降低了模型的过拟合风险。然而,CutMix仅仅使用了一个固定的矩形区域,这就有了不必要的限制。
我们发现插值能造成过早的紧缩,使得模型更偏重于一般特征。而使用模板的融合能够保存语义信息分布,更适合分类任务中的增强。
基于上述的问题,提出了FMix方法。FMix使用了任意形状的模板(不单单是固定不变的一个矩形区域),同时保留了CutMix的性能。
FMix方法流程如下:
实现效果如下图所示:
def sample_lam(alpha, reformulate=False):
"""
利用给定的alpha值,从对称的beta分布中采样一个lambda值。
:param alpha: Alpha value for beta distribution
:param reformulate: If True, uses the reformulation of [1].
"""
if reformulate:
lam = beta.rvs(alpha+1, alpha)
else:
lam = beta.rvs(alpha, alpha) # 产生一个beta随机变量,alpha作为形状参数。
return lam
def make_low_freq_image(decay, shape, ch=1):
"""
从傅里叶空间采样一个低频图像
:param decay_power: Decay power for frequency decay prop 1/f**d
:param shape: Shape of desired mask, list up to 3 dims
:param ch: Number of channels for desired mask
"""
freqs = fftfreqnd(*shape)
spectrum = get_spectrum(freqs, decay, ch, *shape)#.reshape((1, *shape[:-1], -1))
spectrum = spectrum[:, 0] + 1j * spectrum[:, 1]
mask = np.real(np.fft.irfftn(spectrum, shape))
if len(shape) == 1:
mask = mask[:1, :shape[0]]
if len(shape) == 2:
mask = mask[:1, :shape[0], :shape[1]]
if len(shape) == 3:
mask = mask[:1, :shape[0], :shape[1], :shape[2]]
mask = mask
mask = (mask - mask.min())
mask = mask / mask.max()
return mask
def fftfreqnd(h, w=None, z=None):
""" Get bin values for discrete fourier transform of size (h, w, z)
:param h: Required, first dimension size
:param w: Optional, second dimension size
:param z: Optional, third dimension size
"""
fz = fx = 0
fy = np.fft.fftfreq(h)
if w is not None:
fy = np.expand_dims(fy, -1)
if w % 2 == 1:
fx = np.fft.fftfreq(w)[: w // 2 + 2]
else:
fx = np.fft.fftfreq(w)[: w // 2 + 1]
if z is not None:
fy = np.expand_dims(fy, -1)
if z % 2 == 1:
fz = np.fft.fftfreq(z)[:, None]
else:
fz = np.fft.fftfreq(z)[:, None]
return np.sqrt(fx * fx + fy * fy + fz * fz)
def get_spectrum(freqs, decay_power, ch, h, w=0, z=0):
""" 获取傅里叶图像(利用size和decay_power)
:param freqs: Bin values for the discrete fourier transform
:param decay_power: Decay power for frequency decay prop 1/f**d
:param ch: Number of channels for the resulting mask
:param h: Required, first dimension size
:param w: Optional, second dimension size
:param z: Optional, third dimension size
"""
scale = np.ones(1) / (np.maximum(freqs, np.array([1. / max(w, h, z)])) ** decay_power)
param_size = [ch] + list(freqs.shape) + [2]
param = np.random.randn(*param_size)
scale = np.expand_dims(scale, -1)[None, :]
return scale * param
def binarise_mask(mask, lam, in_shape, max_soft=0.0):
""" 二值化低频图像,使得有平均lambda值
:param mask: Low frequency image, usually the result of `make_low_freq_image`
:param lam: Mean value of final mask
:param in_shape: Shape of inputs
:param max_soft: Softening value between 0 and 0.5 which smooths hard edges in the mask.
:return:
"""
idx = mask.reshape(-1).argsort()[::-1]
mask = mask.reshape(-1)
num = math.ceil(lam * mask.size) if random.random() > 0.5 else math.floor(lam * mask.size)
eff_soft = max_soft
if max_soft > lam or max_soft > (1-lam):
eff_soft = min(lam, 1-lam)
soft = int(mask.size * eff_soft)
num_low = num - soft
num_high = num + soft
mask[idx[:num_high]] = 1
mask[idx[num_low:]] = 0
mask[idx[num_low:num_high]] = np.linspace(1, 0, (num_high - num_low))
mask = mask.reshape((1, *in_shape))
return mask