扩散模型已经成为一个强大的深层生成模型的新家族,在许多应用中具有破纪录的性能,包括图像合成、视频生成和分子设计。在这个综述中,我们提供了一个关于扩散模型的快速扩展的工作的概述,将研究分为三个关键领域:有效抽样,改进的似然估计,和处理具有特殊结构的数据。我们还讨论了将扩散模型与其他生成模型相结合以增强结果的潜力。我们进一步回顾了扩散模型在计算机视觉、自然语言处理、时间数据建模等领域的广泛应用,以及在其他科学学科中的跨学科应用。本调查旨在提供一个背景化的、深入的扩散模型的状态,确定重点领域和指出进一步探索的潜在领域。Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy。
生成模型已用于处理各种图像恢复任务,包括超分辨率、修复和平移[10,47,61,103,137,174,187,282]。图像超分辨率旨在从低分辨率输入中恢复高分辨率图像,而图像修复则涉及重建图像中缺失或损坏的区域。
有几种方法利用扩散模型来完成这些任务。例如,通过重复细化的超分辨率(SR3)[202]使用DDPM来实现条件图像生成。SR3通过随机迭代去噪过程进行超分辨率处理。级联扩散模型(CDM)[91]由顺序排列的多个扩散模型组成,每个扩散模型生成分辨率不断提高的图像。SR3和CDM都直接将扩散过程应用于输入图像,这导致了更大的评估步骤。
为了允许在有限的计算资源下训练扩散模型,一些方法[198,234]使用预训练的自动编码器将扩散过程转移到潜在空间。潜在扩散模型(LDM)[198]简化了去噪扩散模型的训练和采样过程,而不牺牲质量
对于修复任务,RePaint[147]采用了一种增强的去噪策略,该策略使用重采样迭代来更好地调整图像(见图5)。同时,Palette[200]使用条件扩散模型为四个图像生成任务创建了一个开放的框架:着色、修复、取消剪切和JPEG恢复
图像翻译专注于合成具有特定期望风格的图像[103]。SDEdit[161]在提高保真度之前使用了一个随机微分方程(SDE)。具体来说,它首先向输入图像添加噪声,然后通过SDE对图像进行降噪。
[10]Georgios Batzolis, Jan Stanczuk, Carola-Bibiane Schönlieb, and Christian Etmann. 2021. Conditional image generation with score-based diffusionmodels.arXiv preprint arXiv:2111.13606(2021).
[47]Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. InIEEE Conferenceon Computer Vision and Pattern Recognition. 248–255
[61]Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming transformers for high-resolution image synthesis. InIEEE Conference on ComputerVision and Pattern Recognition. 12873–12883.
[103]Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. InIEEEConference on Computer Vision and Pattern Recognition. 1125–1134
[147]Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. Repaint: Inpainting using denoisingdiffusion probabilistic models. InIEEE Conference on Computer Vision and Pattern Recognition. 11461–11471
[161]Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. Sdedit: Guided image synthesis and editingwith stochastic differential equations. InInternational Conference on Learning Representations
[174]Muzaffer Özbey, Salman UH Dar, Hasan A Bedel, Onat Dalmaz, Şaban Özturk, Alper Güngör, and Tolga Çukur. 2022. Unsupervised Medical ImageTranslation with Adversarial Diffusion Models.arXiv preprint arXiv:2207.08208(2022).
[187]Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-imagegeneration. InInternational Conference on Machine Learning. 8821–883
[198]Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusionmodels. InIEEE Conference on Computer Vision and Pattern Recognition. 10684–10695.
[200]Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette:Image-to-image diffusion models. InSpecial Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings. 1–10.
[202]Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mohammad Norouzi. 2022. Image super-resolution via iterativerefinement.IEEE Transactions on Pattern Analysis and Machine Intelligence(2022)
[234]Arash Vahdat, Karsten Kreis, and Jan Kautz. 2021. Score-based generative modeling in latent space. InAdvances in Neural Information ProcessingSystems, Vol. 34. 11287–1130
[282]Min Zhao, Fan Bao, Chongxuan Li, and Jun Zhu. 2022. Egsde: Unpaired image-to-image translation via energy-guided stochastic differentialequations.arXiv preprint arXiv:2207.06635(2022)
语义分割旨在根据建立的对象类别来标记每个图像像素。生成预训练可以提高语义分割模型的标签利用率,最近的研究表明,通过DDPM学习的表示包含对分割任务有用的高级语义信息[9,76]。利用这些学习表示的少镜头方法的表现优于VDVAE[33]和ALAE[179]等替代方法。类似地,解码器去噪预训练(DDeP)[17]将扩散模型与去噪自动编码器[239]集成,并在标签高效语义分割方面提供了有前景的结果
[9]Dmitry Baranchuk, Andrey Voynov, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Label-Efficient Semantic Segmentation with Diffusion Models. InInternational Conference on Learning Representations
[76]Alexandros Graikos, Nikolay Malkin, Nebojsa Jojic, and Dimitris Samaras. 2022. Diffusion models as plug-and-play priors. InAdvances in NeuralInformation Processing Systems.
[17]Emmanuel Asiedu Brempong, Simon Kornblith, Ting Chen, Niki Parmar, Matthias Minderer, and Mohammad Norouzi. 2022. Denoising Pretrainingfor Semantic Segmentation. InIEEE Conference on Computer Vision and Pattern Recognition. 4175–4186
[239]Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoisingautoencoders. InInternational Conference on Machine Learning. 1096–1103.