笔记1

SIREN

Abstract

Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal’s spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or SIRENs, are ideally suited for representing complex natural signals and their derivatives. We analyze SIREN activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how SIRENs can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine SIRENs with hypernetworks to learn priors over the space of SIREN functions. Please see the project website for a video overview of the proposed method and all applications.

隐式定义的,连续的,可微的信号表示参数化神经网络已经成为一种强大的范式,提供了许多优于传统代表的好处的可能性。然而,目前用于这种隐式神经表示的网络架构无法对信号进行细致的建模,也无法表示信号的空间和时间导数,尽管这些对于许多被隐式定义为偏微分方程解的物理信号是必不可少的。我们建议利用周期性激活并证明了这些网络,被称为正弦表示网络或SIRENs,非常适合表示复杂的自然信号及其衍生物。我们分析了SIREN的激活统计学提出一个有原则的初始化方案,并演示图像、波场、视频、声音及其导数的表示。进一步,我们将展示如何利用SIREN来解决具有挑战性的边值问题作为特殊的Eikonal方程(产生 SDFs),泊松还有亥姆霍兹方程和波动方程。最后,我们把SIREN和超网络来学习先验的SIREN函数的空间。请参见项目网站的视频概述提出的方法和所有应用(Colab 环境)。

Introducing

We are interested in a class of functions Φ that satisfy equations of the form
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tcCzfmUj-1615031523608)(https://s3.ax1x.com/2021/03/05/6eXjv4.png)]
This implicit problem formulation takes as input the spatial or spatio-temporal coordinates x ∈ R^m and, optionally, derivatives of Φ with respect to these coordinates. Our goal is then to learn a neural network that parameterizes Φ to map x to some quantity of interest while satisfying the constraint presented in Equation (1). Thus, Φ is implicitly defined by the relation defined by F and we refer to neural networks that parameterize such implicitly defined functions as implicit neural representations. As we show in this paper, a surprisingly wide variety of problems across scientific fields fall into this form, such as modeling many different types of discrete signals in image, video, and audio processing using a continuous and differentiable representation, learning 3D shape representations via signed distance functions [1–4], and, more generally, solving boundary value problems, such as the Poisson, Helmholtz, or wave equations.

我们感兴趣的是一类满足这种形式方程的φ函数
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-22GRXlRc-1615031523275)(https://s3.ax1x.com/2021/03/05/6eXjv4.png)]
这个隐式问题公式的输入是空间或时空坐标x∈R^m,可选为φ对这些坐标的导数。我们的目标是学习神经网络,参数化Φx映射到一些感兴趣的数量而满足方程(1)中给出的约束。因此,Φ是隐式定义的关系定义为F,我们指的是神经网络参数化等隐式定义函数隐神经表征。当我们显示摘要,令人惊讶的是各种各样的问题在科学领域陷入这种形式,如离散信号建模许多不同类型的图像,视频和音频处理使用连续、可微的典型代表,学习3D形状表示通过签署了距离函数[1 - 4],而且,更普遍的是,解决边值问题,如泊松或亥姆霍兹波动方程。

A continuous parameterization offers several benefits over alternatives, such as discrete grid-based representations. For example, due to the fact that Φ is defined on the continuous domain of x, it can be significantly more memory efficient than a discrete representation, allowing it to model fine detail that is not limited by the grid resolution but by the capacity of the underlying network architecture. Being differentiable implies that gradients and higher-order derivatives can be computed analytically, for example using automatic differentiation, which again makes these models independent of conventional grid resolutions. Finally, with well-behaved derivatives, implicit neural representations may offer a new toolbox for solving inverse problems, such as differential equations.

连续参数化比其他方法有很多优点,比如基于离散网格的表示。例如,由于Φx的连续域上定义,它可以显著更多的内存效率比一个离散的表示,允许它模型细节,不是网格分辨率的限制,而是通过底层网络架构的能力。可微意味着梯度和高阶导数可以解析计算,例如使用自动微分,这再次使这些模型独立于传统网格分辨率。最后,有了表现良好的导数,隐式神经表示可以提供一个新的工具箱来解决逆问题,如微分方程。

For these reasons, implicit neural representations have seen significant research interest over the last year (Sec. 2). Most of these recent representations build on ReLU-based multilayer perceptrons (MLPs). While promising, these architectures lack the capacity to represent fine details in the underlying signals, and they typically do not represent the derivatives of a target signal well. This is partly due to the fact that ReLU networks are piecewise linear, their second derivative is zero everywhere, and they are thus incapable of modeling information contained in higher-order derivatives of natural signals. While alternative activations, such as tanh or softplus, are capable of representing higher-order derivatives, we demonstrate that their derivatives are often not well behaved and also fail to represent fine details.

由于这些原因,隐式神经表示在过去的一年中已经引起了重要的研究兴趣(第2节)。这些最近的表示大多建立在基于ReLU的多层感知器(MLPs)上。虽然有希望,但这些架构缺乏表现底层信号细节的能力,而且它们通常不能很好地表示目标信号的导数。部分原因是ReLU网络是分段线性的,其二阶导数处处为零,因此无法对自然信号高阶导数中包含的信息进行建模。虽然替代激活,如tanh或softplus,能够表示高阶导数,但我们证明,它们的导数往往表现不佳,也不能表示细节。

To address these limitations, we leverage MLPs with periodic activation functions for implicit neural representations. We demonstrate that this approach is not only capable of representing details in the signals better than ReLU-MLPs, or positional encoding strategies proposed in concurrent work [5], but that these properties also uniquely apply to the derivatives, which is critical for many applications we explore in this paper.

为了解决这些限制,我们利用带有周期性激活函数的MLPs来进行隐式神经表示。我们证明,这种方法不仅能够比ReLU-MLPs或并发工作[5]中提出的位置编码策略更好地表示信号中的细节,而且这些特性也唯一地适用于导数,这对于我们在本文中探索的许多应用程序是至关重要的。

To summarize, the contributions of our work include:

  • A continuous implicit neural representation using periodic activation functions that fits complicated signals, such as natural images and 3D shapes, and their derivatives robustly.
  • An initialization scheme for training these representations and validation that distributions of these representations can be learned using hypernetworks.
  • Demonstration of applications in: image, video, and audio representation; 3D shape reconstruction; solving first-order differential equations that aim at estimating a signal by supervising only with its gradients; and solving second-order differential equations.

总而言之,我们的工作贡献包括:

  • 一种连续的隐式神经表示,使用周期激活函数拟合复杂的信号,如自然图像和三维形状,及其稳健的导数。
  • 一种初始化方案,用于训练这些表示,并验证这些表示的分布可以使用超网络学习。
  • 应用演示:图像、视频和音频表示;3D形状重建;求解一阶微分方程,该方程的目的是仅通过监测信号的梯度来估计信号;以及解二阶微分方程。

Related Work

Implicit neural representations.

Recent work has demonstrated the potential of fully connected networks as continuous, memory-efficient implicit representations for shape parts [6, 7], objects [1, 4, 8, 9], or scenes [10–13]. These representations are typically trained from some form of 3D data as either signed distance functions [1, 4, 8–12] or occupancy networks [2, 14]. In addition to representing shape, some of these models have been extended to also encode object appearance [3, 5, 10, 15, 16], which can be trained using (multiview) 2D image data using neural rendering [17]. Temporally aware extensions [18] and variants that add part-level semantic segmentation [19] have also been proposed.

最近的研究已经证明了完全连接网络的潜力,它可以作为形状部分[6, 7]、对象[1, 4, 8, 9]或场景[10-13]的连续的、有效记忆的隐式表示。这些表示通常是从某种形式的3D数据中训练出来的,要么是有符号距离函数[1, 4, 8-12],要么是占用网络[2, 14]。除了表示形状之外,其中一些模型还被扩展到编码对象外观[3, 5, 10, 15, 16],这些模型可以使用神经渲染[17]的(multiview) 2D图像数据进行训练。时间感知扩展[18]和添加部分级语义分割[19]的变体也被提出。

Periodic nonlinearities.

Periodic nonlinearities have been investigated repeatedly over the past decades, but have so far failed to robustly outperform alternative activation functions. Early work includes Fourier neural networks, engineered to mimic the Fourier transform via single-hiddenlayer networks [20, 21]. Other work explores neural networks with periodic activations for simple classification tasks [22–24] and recurrent neural networks [25–29]. It has been shown that such models have universal function approximation properties [30–32]. Compositional pattern producing networks [33, 34] also leverage periodic nonlinearities, but rely on a combination of different nonlinearities via evolution in a genetic algorithm framework. Motivated by the discrete cosine transform, Klocek et al. [35] leverage cosine activation functions for image representation but they do not study the derivatives of these representations or other applications explored in our work. Inspired by these and other seminal works, we explore MLPs with periodic activation functions for applications involving implicit neural representations and their derivatives, and we propose principled initialization and generalization schemes.

过去的几十年里,周期性非线性已经被反复研究过,但是到目前为止,还没有强有力地优于替代激活函数。早期的研究包括傅里叶神经网络,通过单隐层网络来模拟傅里叶变换[20,21]。其他研究探索了对简单分类任务具有周期性激活的神经网络[22-24]和循环神经网络[25-29]。已有研究表明,这些模型具有泛函数逼近性质[30-32]。组合模式产生网络[33,34]也利用周期非线性,但依赖于通过遗传算法框架中的进化不同非线性的组合。受离散余弦变换的启发,Klocek等人[35]利用余弦激活函数进行图像表示,但他们没有研究这些表示的导数或我们工作中探索的其他应用。受这些和其他开创性工作的启发,我们探索了具有周期激活函数的MLPs的应用,涉及隐式神经表示及其导数,并提出了原则性的初始化和推广方案。

Neural DE Solvers.

Neural networks have long been investigated in the context of solving differential equations (DEs) [36], and have previously been introduced as implicit representations for this task [37]. Early work on this topic involved simple neural network models, consisting of MLPs or radial basis function networks with few hidden layers and hyperbolic tangent or sigmoid nonlinearities [37–39]. The limited capacity of these shallow networks typically constrained results to 1D solutions or simple 2D surfaces. Modern approaches to these techniques leverage recent optimization frameworks and auto-differentiation, but use similar architectures based on MLPs. Still, solving more sophisticated equations with higher dimensionality, more constraints, or more complex geometries is feasible [40–42]. However, we show that the commonly used MLPs with smooth, non-periodic activation functions fail to accurately model high-frequency information and higher-order derivatives even with dense supervision.

神经网络在求解微分方程(DEs)[36]的背景下已经被研究了很长时间,并且在此之前已经被引入作为这个任务[37]的隐式表示。这一课题的早期研究涉及到简单的神经网络模型,包括隐层较少的MLPs或径向基函数网络和双曲正切或s型非线性[37-39]。这些浅层网络的有限容量通常将结果限制在一维解或简单的二维曲面上。这些技术的现代方法利用了最近的优化框架和自动分化,但使用了基于MLPs的类似架构。然而,求解更高维度、更多约束或更复杂几何的复杂方程是可行的[40-42]。然而,我们发现,通常使用的具有平滑、非周期激活函数的MLP即使在密集监督下也不能准确地模拟高频信息和高阶导数。

Neural ODEs [43] are related to this topic, but are very different in nature. Whereas implicit neural representations can be used to directly solve ODEs or PDEs from supervision on the system dynamics, neural ODEs allow for continuous function modeling by pairing a conventional ODE solver (e.g., implicit Adams or Runge-Kutta) with a network that parameterizes the dynamics of a function. The proposed architecture may be complementary to this line of work.

神经 ODEs[43]与这个主题有关,但本质上非常不同。鉴于隐神经表示可用于直接解决ODEs或偏微分方程,通过监督系统动力学,神经ODEs允许通过将传统ODE求解器(例如,隐式Adams或龙格-库塔)与参数化函数动力学的网络进行连续函数建模。所提议的架构可能是对这一行工作的补充。

Formulation

就是数学推导

Experiments

In this section, we leverage SIRENs to solve challenging boundary value problems using different types of supervision of the derivatives of Φ. We first solve the Poisson equation via direct supervision of its derivatives. We then solve a particular form of the Eikonal equation, placing a unit-norm constraint on gradients, parameterizing the class of signed distance functions (SDFs). SIREN significantly outperforms ReLU-based SDFs, capturing large scenes at a high level of detail. We then solve the second-order Helmholtz partial differential equation, and the challenging inverse problem of full-waveform inversion. Finally, we combine SIRENs with hypernetworks, learning a prior over the space of parameterized functions. All code and data will be made publicly available.

在本节中,我们利用SIRENs来解决具有挑战性的边值问题,使用不同类型的监督φ的导数。我们首先通过泊松方程的导数直接指导解泊松方程。然后我们解出一个特殊形式的Eikonal方程,在梯度上放置一个单位范数约束,参数化一类有符号距离函数(SDFs)。SIREN的性能显著优于基于ReLU的SDFs,在高水平的细节上捕捉大型场景。然后,我们求解了二阶Helmholtz偏微分方程,以及具有挑战性的全波形反演问题。最后,我们将SIRENs与超网络相结合,学习参数化函数空间的先验。所有的代码和数据都将向公众公开。(当然我们只需要关注SDFs)

Solving the Poisson Equation

非重点

Representing Shapes with Signed Distance Functions

Inspired by recent work on shape representation with differentiable signed distance functions (SDFs) [1, 4, 9], we fit SDFs directly on oriented point clouds using both ReLU-based implicit neural representations and SIRENs. This amounts to solving a particular Eikonal boundary value problem that constrains the norm of spatial gradients |∇xΦ| to be 1 almost everywhere. Note that ReLU networks are seemingly ideal for representing SDFs, as their gradients are locally constant and their second derivatives are 0. Adequate training procedures for working directly with point clouds were described in prior work [4, 9]. We fit a SIREN to an oriented point cloud using a loss of the form.
6uukA1.png
Here, ψ(x) = exp(−α · |Φ(x)|), α >> 1 penalizes off-surface points for creating SDF values close to 0. Ω is the whole domain and we denote the zero-level set of the SDF as Ω0. The model Φ(x) is supervised using oriented points sampled on a mesh, where we require the SIREN to respect Φ(x) = 0 and its normals n(x) = ∇f(x). During training, each minibatch contains an equal number of points on and off the mesh, each one randomly sampled over Ω. As seen in Fig. 4, the proposed periodic activations significantly increase the details of objects and the complexity of scenes that can be represented by these neural SDFs, parameterizing a full room with only a single five-layer fully connected neural network. This is in contrast to concurrent work that addresses the same failure of conventional MLP architectures to represent complex or large scenes by locally decoding a discrete representation, such as a voxelgrid, into an implicit neural representation of geometry [11–13].

受最近关于可微符号距离函数(SDFs)形状表示的研究的启发[1,4,9],我们使用基于relr的隐式神经表示和SIRENs直接将SDFs拟合到定向点云上。这等于解决一个特殊的∇x φ |的空间梯度|的范数约束为1的∇x φ |边值问题。注意,ReLU网络似乎是表示SDFs的理想方法,因为它们的梯度是局部常数,二阶导数为0。直接使用点云的适当培训程序在之前的工作中已经描述过[4,9]。我们把一个警报器装到指向点云上,用的是遗失的形式。
6uukA1.png
这里,ψ(x) = exp(−α·| φ (x)|), α >> 1惩罚创建接近0的SDF值的非表面点。Ω是整个定义域,我们将SDF的零级集记为Ω0。利用在网格上采样的有向点来监督模型φ (x),我们要求SIREN遵从φ (x) = 0及其法线n(x) =∇f(x)。在训练过程中,每个minibatch包含相同数量的网格上和网格下的点,每个点在Ω上随机采样。如图4所示,我们提出的周期性激活方法显著增加了这些神经神经分布函数所代表的物体细节和场景的复杂性,仅用一个五层全连接的神经网络就可以参数化整个房间。这与并行工作相反,并行工作解决了传统MLP架构的相同失败,即通过局部解码离散表示(如voxelgrid)为几何的隐式神经表示来表示复杂或大型场景[11-13]。

Solving the Helmholtz and Wave Equations

非重点

Learning a Space of Implicit Functions

非重点

Discussion and Conclusion

The question of how to represent a signal is at the core of many problems across science and engineering. Implicit neural representations may provide a new tool for many of these by offering a number of potential benefits over conventional continuous and discrete representations. We demonstrate that periodic activation functions are ideally suited for representing complex natural signals and their derivatives using implicit neural representations. We also prototype several boundary value problems that our framework is capable of solving robustly. There are several exciting avenues for future work, including the exploration of other types of inverse problems and applications in areas beyond implicit neural representations, for example neural ODEs [43]. With this work, we make important contributions to the emerging field of implicit neural representation learning and its applications.

如何表示信号的问题是许多科学和工程问题的核心。与传统的连续和离散表示相比,隐式神经表示可能提供了许多潜在的好处,从而为其中许多方法提供了一种新的工具。我们证明了周期激活函数非常适合使用隐式神经表示来表示复杂的自然信号及其导数。我们还建立了几个我们的框架能够健壮地解决的边值问题的原型。未来的工作有几个令人兴奋的途径,包括探索其他类型的反问题和在隐式神经表示以外的领域的应用,例如神经ODEs[43]。通过这项工作,我们为内隐神经表征学习及其应用这一新兴领域做出了重要贡献。

Broader Impact

The proposed SIREN representation enables accurate representations of natural signals, such as images, audio, and video in a deep learning framework. This may be an enabler for downstream tasks involving such signals, such as classification for images or speech-to-text systems for audio. Such applications may be leveraged for both positive and negative ends. SIREN may in the future further enable novel approaches to the generation of such signals. This has potential for misuse in impersonating actors without their consent. For an in-depth discussion of such so-called DeepFakes, we refer the reader to a recent review article on neural rendering [17].

提出的警报器表示方法能够在深度学习框架中准确表示自然信号,如图像、音频和视频。这可能是涉及此类信号的下游任务的使能器,例如图像的分类或音频的语音到文本系统。这样的应用程序可以发挥积极和消极的作用。在未来,SIREN可能会进一步使产生这种信号的新方法成为可能。这有可能在未经演员同意的情况下被滥用。为了深入讨论这种所谓的深度造假,我们建议读者查阅最近的一篇关于神经渲染[17]的评论文章。

Supplementary Material

Empirical evaluation

We validate our theoretical derivation with an experiment. We assemble a 6-layer, single-input SIREN with 2048 hidden units, and initialize it according to the proposed initialization scheme. We draw 2^8 inputs in a linear range from −1 to 1 and plot the histogram of activations after each linear layer and after each sine activation. We further compute the 1D Fast Fourier Transform of all activations in a layer. Lastly, we compute the sum of activations in the final layer and compute the gradient of this sum w.r.t. each activation. The results can be visualized in Figure 2. The distribution of activations nearly perfectly matches the predicted Gauss-Normal distribution after each linear layer and the arcsine distribution after each sine nonlinearity. As discussed in the main text, frequency components of the spectrum similarly remain comparable, with the maximum frequency growing only slowly. We verified this initialization scheme empirically for a 50-layer SIREN with similar results. Finally, similar to the distribution of activations, we plot the distribution of gradients and empirically demonstrate that it stays almost perfectly constant across layers, demonstrating that SIREN does not suffer from either vanishing or exploding gradients at initialization. We leave a formal investigation of the distribution of gradients to future work.

我们用实验来验证我们的理论推导。我们组装了一个6层、单输入、2048个隐藏单元的SIREN,并根据所提出的初始化方案对其进行初始化。我们在−1到1的线性范围内绘制2^8个输入,并绘制每个线性层和每个正弦激活后的激活直方图。我们进一步计算了一层中所有激活的一维快速傅里叶变换。最后,我们计算最后一层激活的总和,并计算这个总和在每一次激活时的梯度。结果如图2所示。每个线性层后的激活分布与预测的高斯正态分布和每个正弦非线性层后的反正弦分布几乎完全吻合。正如在正文中所讨论的,频谱的频率成分同样保持可比性,最大频率增长缓慢。我们对50层SIREN的初始化方案进行了经验验证,得到了相似的结果。最后,类似于激活的分布,我们绘制梯度分布,并根据经验证明,它几乎保持完全不变的层,证明SIREN既没有消失或爆炸梯度在初始化。我们把梯度分布的正式研究留到以后的工作中去做。

Representing Shapes with Signed Distance Functions

We performed an additional baseline using the ReLU positional encoding [35] shown in Figure 4. Similar to the results we obtained using the ReLU positional encoding on images, zero-level set of the SDF, in which the shape is encoded features high-frequencies that are not present while the level of details remains low (despite being much higher that in ReLU, see main paper).

我们使用图4中所示的ReLU位置编码[35]执行了一个额外的基线。与我们在图像上使用ReLU位置编码获得的结果相似,在SDF的零级集合中,形状被编码,其特征是不存在的高频,而细节级别仍然较低(尽管比ReLU高得多,见主要论文)。

Data. We use the Thai statue from the The Stanford 3D Scanning Repository (http://graphics.stanford.edu/data/3Dscanrep/). The room is a Free 3D model from Turbosquid.com. We sample each mesh by subdividing it until we obtain 10 million points and their normals. Those are then converted in .xyz formats we load from our code.

肉眼可见的优秀

你可能感兴趣的:(自己学习,深度学习)