Existing deep learning real denoising methods require a large amount of noisy-clean image pairs for supervision. Nonetheless, capturing a real noisy-clean dataset is an unacceptable expensive and cumbersome procedure. To alleviate this problem, this work investigates how to generate realistic noisy images. Firstly, we formulate a simple yet reasonable noise model that treats each real noisy pixel as a random variable. This model splits the noisy image generation problem into two sub-problems: image domain alignment and noise domain alignment. Subsequently, we propose a novel framework, namely Pixel-level Noise-aware Generative Adversarial Network (PNGAN). PNGAN employs a pre-trained real denoiser to map the fake and real noisy images into a nearly noise-free solution space to perform image domain alignment. Simultaneously, PNGAN establishes a pixel-level adversarial training to conduct noise domain alignment. Additionally, for better noise fitting, we present an efficient architecture Simple Multi-scale Network (SMNet) as the generator. Qualitative validation shows that noise generated by PNGAN is highly similar to real noise in terms of intensity and distribution. Quantitative experiments demonstrate that a series of denoisers trained with the generated noisy images achieve state-of-the-art (SOTA) results on four real denoising benchmarks. Part of codes, pre-trained models, and results are available at https://github.com/caiyuanhao1998/PNGAN for comparisons.
现有的深度学习真实去噪方法需要大量有噪声的干净图像对进行监督。尽管如此,捕获一个真正有噪声的干净数据集是一个不可接受的昂贵且繁琐的过程。为了缓解这一问题,本工作研究了如何生成真实的噪声图像。首先,我们建立了一个简单而合理的噪声模型,将每个真实的噪声像素视为一个随机变量。该模型将噪声图像生成问题分为两个子问题:图像域对齐和噪声域对齐。随后,我们提出了一种新的框架,即像素级噪声感知生成对抗网络(PNGAN)。PNGAN使用预训练的实去噪器将假噪声和实噪声图像映射到几乎无噪声的解决方案空间中,以执行图像域对准。同时,PNGAN建立像素级对抗训练以进行噪声域对准。此外,为了更好地拟合噪声,我们提出了一种高效的结构简单多尺度网络(SMNet)作为生成器。定性验证表明,PNGAN产生的噪声在强度和分布方面与真实噪声高度相似。定量实验表明,用生成的噪声图像训练的一系列去噪器在四个真实去噪基准上实现了最先进的(SOTA)结果。部分代码、预训练模型和结果可在https://github.com/caiyuanhao1998/PNGAN用于比较。
Image denoising has various applications ranging from consumer electronics to medical imaging devices. Neural methods have enjoyed great success in real noise removal in high-resolution images. However, training complex neural networks often require many clean-noisy image pairs that correctly model the underlying noise distribution. In addition, image denoising is not easily self-supervised due to the complex nature of real-world image noise. Unfortunately, training data for image denoising has been notoriously difficult and expensive to gather, with the most notable denoising datasets being SIDD and DND. Neither SIDD nor DND are large datasets: SIDD contains 160 scenes with 150 similar clean-noisy image pairs for each scene captured using smartphones, and DND contains 50 clean-noisy image pairs captured using consumer-grade cameras.
图像去噪具有从消费电子产品到医疗成像设备的各种应用。神经方法在高分辨率图像中去除真实噪声方面取得了巨大成功。然而,训练复杂的神经网络通常需要许多干净的噪声图像对,这些图像对正确地模拟潜在的噪声分布。此外,由于真实世界图像噪声的复杂性,图像去噪不容易自我监督。不幸的是,用于图像去噪的训练数据是出了名的困难和昂贵,最著名的去噪数据集是SIDD和DND。SIDD和DND都不是大型数据集:SIDD包含160个场景,每个场景使用智能手机捕获150个类似的干净噪声图像对,DND包含50个使用消费级相机捕获的干净噪声的图像对。
Data augmentation methods in training neural networks have been proven to help improve the robustness of a model by increasing the diversity in data. Surprisingly, data augmentation in generating synthetic noisy training data has been quite tricky because real-world noise cannot be easily approximated. Naive approaches such as adding additive white Gaussian noise (AWGN) usually do not help improve the model performance because real-world noise is fundamentally a different distribution than Gaussian.
训练神经网络中的数据增强方法已被证明有助于通过增加数据的多样性来提高模型的鲁棒性。令人惊讶的是,生成合成噪声训练数据时的数据增强非常棘手,因为真实世界的噪声不容易近似。诸如添加加性高斯白噪声(AWGN)之类的简单方法通常无助于提高模型性能,因为真实世界的噪声基本上与高斯分布不同。
Brooks et al. and Zamir et al. proposed noise-synthesizing methods that attempts to generate realistic noise by first converting an RGB image to a RAW image, before a non-neural RAW noise model is applied and the image is converted back to RGB.
Brooks等人和Zamir等人提出了噪声合成方法,该方法尝试在应用非神经RAW噪声模型并将图像转换回RGB之前,通过首先将RGB图像转换为RAW图像来生成真实的噪声。
Taking it one step further, Cai et al. took a neural approach to realistic noise modeling and achieved impressive and ground-breaking results. They proposed a GAN architecture, namely PNGAN. They train a generator that learns to generate real noise from AWGN and a discriminator that distinguishes between real and synthetic noise. Cai et al. trained their GAN using real-noisy training data (SIDD and DND) and generated synthetic noisy-clean image pairs from high-resolution image datasets, including DIV2K and Flickr2K. They finetuned existing denoising models using a mixture of real-noisy data and synthetic-noisy data and observed noticeable improvements.
更进一步,Cai等人采用了神经方法进行真实的噪声建模,并取得了令人印象深刻的突破性成果。他们提出了一种GAN架构,即PNGAN。他们训练一个学习从AWGN生成真实噪声的生成器和一个区分真实噪声和合成噪声的鉴别器。Cai等人使用真实的噪声训练数据(SIDD和DND)训练他们的GAN,并从高分辨率图像数据集(包括DIV2K和Flickr2K)生成合成的噪声干净图像对。他们使用真实噪声数据和合成噪声数据的混合来微调现有的去噪模型,并观察到明显的改进。
Based on this finding, we hypothesied that most current complex denoising models are under-trained due to the scarcity of training data. And the objective of this project is to implement (from scatch) the noise-synthesizing neural networks proposed by Cai et al. and generate realistic noisy training data to better improve existing denoising models, namely Restormer proposed by Zamir et al., which was trained using only real noisy data (SIDD). We have since observed improved denoising performance in our finetuned Restormer model.
基于这一发现,我们假设,由于缺乏训练数据,目前大多数复杂的去噪模型都训练不足。本项目的目标是实现(来自scatch)Cai等人提出的噪声合成神经网络,并生成真实的噪声训练数据,以更好地改进现有的去噪模型,即Zamir等人提出的Restormer,其仅使用真实噪声数据(SIDD)进行训练。此后,我们在微调的Restormer模型中观察到了改进的去噪性能。
To train our PNGAN models, we applied similar hyper-parameter settings provided by Cai et al. We trained both the generator and denoiser using the SIDD dataset. For synthetic noise settings, we used the setting1 proposed by the paper. We used the same 128x128 patch setting provided by Cai et al. We trained the model using a batch size of 8 for 60,000 iterations using Adam optimizer and cosine annealing learning rate scheduler with the same learning rate setting. All the models are trained on a single NVIDIA V100 GPU. The following figure shows an example of synthetic noise generation.
为了训练我们的PNGAN模型,我们应用了Cai等人提供的类似超参数设置。我们使用SIDD数据集训练了生成器和去噪器。对于合成噪声设置,我们使用了论文提出的设置1。我们使用了Cai等人提供的相同的128x128补丁设置。我们使用Adam优化器和具有相同学习速率设置的余弦退火学习速率调度器,使用8的批量大小训练了60000次迭代的模型。所有型号均在单个NVIDIA V100 GPU上训练。下图显示了合成噪声生成的示例。
To generate synthetic clean-noisy image pairs for denoiser finetuning, we selected the DIV2K dataset, which is a dataset composed of diverse 2k-resolution high quality images which we consider relatively noise-free. From DIV2K, we cropped the high-resolution images to 128x128 patches and feed them through PNGAN generator to gather noisy input. We then mixed the synthetic-noisy (DIV2K-PNGAN) data with real-noisy (SIDD) data to finetune the pretrained Restormer model. We used the same AdamW optimizer provided by Restormer, and we finetuned the Restormer model for 100,800 iterations using a batch size of 4 due to the memory constraint. The following figure shows an example of the Restormer denoiser.
为了生成用于去噪器微调的合成干净噪声图像对,我们选择了DIV2K数据集,这是一个由各种2k分辨率高质量图像组成的数据集,我们认为这些图像相对无噪声。从DIV2K,我们将高分辨率图像裁剪成128x128个补丁,并通过PNGAN生成器将其馈送以收集噪声输入。然后,我们将合成噪声(DIV2K-PNGAN)数据与真实噪声(SIDD)数据混合,以微调预训练的Restormer模型。我们使用了Restormer提供的相同的AdamW优化器,由于内存限制,我们使用4的批处理大小对Restormer模型进行了100800次迭代的微调。下图显示了Restormer去噪器的示例。
In benchmarks, we observed encouraging results in applying PNGAN’s synthetic noisy data to finetune the Restormer denoising model. Overall, we observed meaningful performance increase in both PSNR (peak signal-to-noise ratio) and SSIM (structural simmilarity index measure) after finetuning Restormer with PNGAN-noise. To prove the effectiveness of our noise model, we also included benchmarks of the model finetuned with Gaussian noise augmented data, which shows that Gaussian noise augmentation actually degrades the performance of the original denoising model, suggesting that real noise distribution cannot be approximated with simple Gaussian noise.
在基准测试中,我们观察到了应用PNGAN的合成噪声数据来微调Restormer去噪模型的令人鼓舞的结果。总的来说,我们观察到在用PNGAN噪声微调Restormer后,PSNR(峰值信噪比)和SSIM(结构相似性指数测量)的性能都有显著提高。为了证明我们的噪声模型的有效性,我们还包括了用高斯噪声增强数据微调的模型基准,这表明高斯噪声增强实际上降低了原始去噪模型的性能,这表明真实的噪声分布不能用简单的高斯噪声来近似。