[NeRF]学习笔记(持续更新中)

文章目录

  • 前言
  • 一、Raysampler
    • 1、RayBundle
    • 2、MultinomialRaysampler
    • 3、ProbabilisticRaysampler
  • 二、Raymarcher
    • 1、EmissionAbsorptionRaymarcher
    • 2、EmissionAbsorptionNeRFRaymarcher
  • 三、NeuralRadianceField
    • 1、HarmonicEmbedding
    • 2、MLPWithInputSkips
    • 3、NeuralRadianceField
  • 四、Renderer
    • 1、ImplicitRenderer
    • 2、RadianceFieldRenderer


前言

最近NeRF热的不行,好多组都在沿着这个方向做各种尝试,为了能和这些算法的兄弟们交流,不至于听不懂他们在讲什么,所以决定花些时间认真搞搞懂。之前也是经常因为这个原因就去学习一些东西,但是都是自己看看就完了,没有认真整理过笔记,后来受一位大佬的影响,发现认真整理笔记不仅为了自己,也为了后续学习的人可以方便有疑问的地方可以参考,所以也决定留些公开的学习笔记,一方面是强迫自己好好整理,二是万一对别人有点帮助更是一些欣慰。
由于是边学习,边记录,所以逻辑性会比较差,有机会的话我会重新再整理一下。

首先记录下我自己的学习顺序并推荐下参考资料:

  1. 原始论文:nerf因为和孩之宝的那个玩具重名,所以搜索关键词经常先出那个的链接,所以这里直接附上原始NeRF论文的链接:https://www.matthewtancik.com/nerf;
  2. 达摩院大佬的笔记:[NeRF]代码+逻辑详细分析:https://samuel92.blog.csdn.net/article/details/118959540?spm=1001.2014.3001.5502 ,这个笔记写的非常好,推荐一定要读代码的时候对照着看,不仅很详细,而且以流程图的方式讲清楚了整个流程和细节,我的笔记能做的也就是重复这个内容并补充一些特别小的细节;
  3. pytorch的code:pytorch3d中有个nerf的project:https://github.com/facebookresearch/pytorch3d/tree/main/projects/nerf ,这个我在mac上是配置完了运行环境,但是因为没有GPU,所以都是方便debug看代码用的,配置时也遇到一些问题,后面可以慢慢整理下;

为了方便后续阅读,我争取都放在一篇文章中,我会慢慢整理补充这个笔记,持续更新完善细节,第一次弄,比较慢,也希望用心弄,争取对自己和别人都有些帮助。

0408:更新了个前言,后面的暂时还都是模板内容


一、Raysampler

1、RayBundle

RayBundle类是个NamedTuple,包含origins,directions,lengths,xys(保存了sample图像中的xy坐标),类注释中写了directions不一定非得标准化,RayBundle可以通过ray_bundle_to_ray_points函数得到所有的points(Tensor)信息。
还有一句注释没完全懂:they define unit vector in the respective 1D coordinate systems

    Args:
        rays_origins: A tensor of shape `(..., 3)`
        rays_directions: A tensor of shape `(..., 3)`
        rays_lengths: A tensor of shape `(..., num_points_per_ray)`

    Returns:
        rays_points: A tensor of shape `(..., num_points_per_ray, 3)`
            containing the points sampled along each ray.
   通过None扩展维度,通过Tensor Broadcasting完成计算
   https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html#in-brief-tensor-broadcasting
    rays_points = (
        rays_origins[..., None, :]
        + rays_lengths[..., :, None] * rays_directions[..., None, :]
    )

2、MultinomialRaysampler

MultinomialRaysampler类是个nn.Module,这个类注释写的比较明白,还给了个图,直接完整copy过来,NDCMultinomialRaysampler也是这个类的特殊case,按照NDC空间指定了入参,这个类简单的说,就是一个立方体长宽高通过min_x,min_y,min_depth,max_x,max_y,max_depth来指定,每个维度均匀按照image_width, image_height,n_pts_per_ray划分格子进行采样,采样后再通过指定的输入cameras,通过cameras的unproject_points函数转换为变换前的世界坐标,得到采样后RayBundle。
这个类相比之前版本(0.6.1)做了更新,增加了几个参数,完成几个额外功能,n_rays_per_image,就是xy平面也可以进行采样了,unit_directions指定了方向向量是否归一化,stratified_sampling指定了每条ray是否采用stratified random sampling进行采样。
MonteCarloRaysampler的注释直接这里写下吧,For practical purposes, this is similar to MultinomialRaysampler without a mask, however sampling at real-valued locations bypassing replacement checks may be faster.

    Samples a fixed number of points along rays which are regularly distributed
    in a batch of rectangular image grids. Points along each ray
    have uniformly-spaced z-coordinates between a predefined
    minimum and maximum depth.

    The raysampler first generates a 3D coordinate grid of the following form:
    
       / min_x, min_y, max_depth -------------- / max_x, min_y, max_depth
      /                                        /|
     /                                        / |     ^
    / min_depth                    min_depth /  |     |
    min_x ----------------------------- max_x   |     | image
    min_y                               min_y   |     | height
    |                                       |   |     |
    |                                       |   |     v
    |                                       |   |
    |                                       |   / max_x, max_y,     ^
    |                                       |  /  max_depth        /
    min_x                               max_y /                   / n_pts_per_ray
    max_y ----------------------------- max_x/ min_depth         v
              < --- image_width --- >
    

    In order to generate ray points, `MultinomialRaysampler` takes each 3D point of
    the grid (with coordinates `[x, y, depth]`) and unprojects it
    with `cameras.unproject_points([x, y, depth])`, where `cameras` are an
    additional input to the `forward` function.

    Note that this is a generic implementation that can support any image grid
    coordinate convention. For a raysampler which follows the PyTorch3D
    coordinate conventions please refer to `NDCMultinomialRaysampler`.
    As such, `NDCMultinomialRaysampler` is a special case of `MultinomialRaysampler`.

forward函数的注释也比较清楚,直接copy过来吧

        Args:
            cameras: A batch of `batch_size` cameras from which the rays are emitted.
            mask: if given, the rays are sampled from the mask. Should be of size
                (batch_size, image_height, image_width).
            min_depth: The minimum depth of a ray-point.
            max_depth: The maximum depth of a ray-point.
            n_rays_per_image: If given, this amount of rays are sampled from the grid.
            n_pts_per_ray: The number of points sampled along each ray.
            stratified_sampling: if set, performs stratified sampling in n_pts_per_ray
                bins for each ray; otherwise takes n_pts_per_ray deterministic points
                on each ray with uniform offsets.
        Returns:
            A named tuple RayBundle with the following fields:
            origins: A tensor of shape
                `(batch_size, s1, s2, 3)`
                denoting the locations of ray origins in the world coordinates.
            directions: A tensor of shape
                `(batch_size, s1, s2, 3)`
                denoting the directions of each ray in the world coordinates.
            lengths: A tensor of shape
                `(batch_size, s1, s2, n_pts_per_ray)`
                containing the z-coordinate (=depth) of each ray in world units.
            xys: A tensor of shape
                `(batch_size, s1, s2, 2)`
                containing the 2D image coordinates of each ray or,
                if mask is given, `(batch_size, n, 1, 2)`
            Here `s1, s2` refer to spatial dimensions. Unless the mask is
            given, they equal `(image_height, image_width)`, otherwise `(n, 1)`,
            where `n` is `n_rays_per_image` if provided, otherwise the minimum
            cardinality of the mask in the batch.

3、ProbabilisticRaysampler

注释也是比较清楚,这个用在render_pass == "fine"中

   Implements the importance sampling of points along rays.
   The input is a `RayBundle` object with a `ray_weights` tensor
   which specifies the probabilities of sampling a point along each ray.

   This raysampler is used for the fine rendering pass of NeRF.
   As such, the forward pass accepts the RayBundle output by the
   raysampling of the coarse rendering pass. Hence, it does not
   take cameras as input.

二、Raymarcher

1、EmissionAbsorptionRaymarcher

注释搬运工

Raymarch using the Emission-Absorption (EA) algorithm.

    The algorithm independently renders each ray by analyzing density and
    feature values sampled at (typically uniformly) spaced 3D locations along
    each ray. The density values `rays_densities` are of shape
    `(..., n_points_per_ray)`, their values should range between [0, 1], and
    represent the opaqueness of each point (the higher the less transparent).
    The feature values `rays_features` of shape
    `(..., n_points_per_ray, feature_dim)` represent the content of the
    point that is supposed to be rendered in case the given point is opaque
    (i.e. its density -> 1.0).

    EA first utilizes `rays_densities` to compute the absorption function
    along each ray as follows:
        
        absorption = cumprod(1 - rays_densities, dim=-1)
        
    The value of absorption at position `absorption[..., k]` specifies
    how much light has reached `k`-th point along a ray since starting
    its trajectory at `k=0`-th point.

    Each ray is then rendered into a tensor `features` of shape `(..., feature_dim)`
    by taking a weighed combination of per-ray features `rays_features` as follows:
        ```
        weights = absorption * rays_densities
        features = (rays_features * weights).sum(dim=-2)
        ```
    Where `weights` denote a function that has a strong peak around the location
    of the first surface point that a given ray passes through.

    Note that for a perfectly bounded volume (with a strictly binary density),
    the `weights = cumprod(1 - rays_densities, dim=-1) * rays_densities`
    function would yield 0 everywhere. In order to prevent this,
    the result of the cumulative product is shifted `self.surface_thickness`
    elements along the ray direction.

实际的实现关注
        rays_densities = rays_densities[..., 0]
        absorption = _shifted_cumprod(
            (1.0 + eps) - rays_densities, shift=self.surface_thickness
        )
        weights = rays_densities * absorption
        features = (weights[..., None] * rays_features).sum(dim=-2)
        opacities = 1.0 - torch.prod(1.0 - rays_densities, dim=-1, keepdim=True)

        return torch.cat((features, opacities), dim=-1)

2、EmissionAbsorptionNeRFRaymarcher

与上一个类的不同,就是把额外把weights返回了,opacities没返回

    This is essentially the `pytorch3d.renderer.EmissionAbsorptionRaymarcher`
    which additionally returns the rendering weights. It also skips returning
    the computation of the alpha-mask which is, in case of NeRF, equal to 1
    everywhere.

    The weights are later used in the NeRF pipeline to carry out the importance
    ray-sampling for the fine rendering pass.

具体实现:
        rays_densities = rays_densities[..., 0]
        absorption = _shifted_cumprod(
            (1.0 + eps) - rays_densities, shift=self.surface_thickness
        )
        weights = rays_densities * absorption
        features = (weights[..., None] * rays_features).sum(dim=-2)

三、NeuralRadianceField

1、HarmonicEmbedding

该类实现了论文中position embedding

        Given an input tensor `x` of shape [minibatch, ... , dim],
        the harmonic embedding layer converts each feature
        (i.e. vector along the last dimension) in `x`
        into a series of harmonic features `embedding`,
        where for each i in range(dim) the following are present
        in embedding[...]:
            ```
            [
                sin(f_1*x[..., i]),
                sin(f_2*x[..., i]),
                ...
                sin(f_N * x[..., i]),
                cos(f_1*x[..., i]),
                cos(f_2*x[..., i]),
                ...
                cos(f_N * x[..., i]),
                x[..., i],              # only present if append_input is True.
            ]
            ```
        where N corresponds to `n_harmonic_functions-1`, and f_i is a scalar
        denoting the i-th frequency of the harmonic embedding.

        If `logspace==True`, the frequencies `[f_1, ..., f_N]` are
        powers of 2:
            `f_1, ..., f_N = 2**torch.arange(n_harmonic_functions)`

        If `logspace==False`, frequencies are linearly spaced between
        `1.0` and `2**(n_harmonic_functions-1)`:
            `f_1, ..., f_N = torch.linspace(
                1.0, 2**(n_harmonic_functions-1), n_harmonic_functions
            )`

        Note that `x` is also premultiplied by the base frequency `omega_0`
        before evaluating the harmonic functions.

2、MLPWithInputSkips

HarmonicEmbedding完成γ(x),MLPWithInputSkips实现了论文中MLP网络中红框部分:
[NeRF]学习笔记(持续更新中)_第1张图片

3、NeuralRadianceField

实现了论文中的NeuralRadianceField模型部分,具体模型结构如上图,输入ray_bundle和density_noise_std得到rays_densities,和rays_colors

四、Renderer

1、ImplicitRenderer

初始化入参分别是raysampler和raymarcher实例,forward函数接收cameras和volumetric_function,然后得到images,同时返回了ray_bundle,核心代码如下:

        # first call the ray sampler that returns the RayBundle parametrizing
        # the rendering rays.
        ray_bundle = self.raysampler(
            cameras=cameras, volumetric_function=volumetric_function, **kwargs
        )
        # ray_bundle.origins - minibatch x ... x 3
        # ray_bundle.directions - minibatch x ... x 3
        # ray_bundle.lengths - minibatch x ... x n_pts_per_ray
        # ray_bundle.xys - minibatch x ... x 2

        # given sampled rays, call the volumetric function that
        # evaluates the densities and features at the locations of the
        # ray points
        rays_densities, rays_features = volumetric_function(
            ray_bundle=ray_bundle, cameras=cameras, **kwargs
        )
        # ray_densities - minibatch x ... x n_pts_per_ray x density_dim
        # ray_features - minibatch x ... x n_pts_per_ray x feature_dim

        # finally, march along the sampled rays to obtain the renders
        images = self.raymarcher(
            rays_densities=rays_densities,
            rays_features=rays_features,
            ray_bundle=ray_bundle,
            **kwargs
        )
        # images - minibatch x ... x (feature_dim + opacity_dim)

        return images, ray_bundle

2、RadianceFieldRenderer

    This class holds pointers to the fine and coarse renderer objects, which are
    instances of `pytorch3d.renderer.ImplicitRenderer`, and pointers to the
    neural networks representing the fine and coarse Neural Radiance Fields,
    which are instances of `NeuralRadianceField`.

    The rendering forward pass proceeds as follows:
        1) For a given input camera, rendering rays are generated with the
            `NeRFRaysampler` object of `self._renderer['coarse']`.
            In the training mode (`self.training==True`), the rays are a set
                of `n_rays_per_image` random 2D locations of the image grid.
            In the evaluation mode (`self.training==False`), the rays correspond
                to the full image grid. The rays are further split to
                `chunk_size_test`-sized chunks to prevent out-of-memory errors.
        2) For each ray point, the coarse `NeuralRadianceField` MLP is evaluated.
            The pointer to this MLP is stored in `self._implicit_function['coarse']`
        3) The coarse radiance field is rendered with the
            `EmissionAbsorptionNeRFRaymarcher` object of `self._renderer['coarse']`.
        4) The coarse raymarcher outputs a probability distribution that guides
            the importance raysampling of the fine rendering pass. The
            `ProbabilisticRaysampler` stored in `self._renderer['fine'].raysampler`
            implements the importance ray-sampling.
        5) Similar to 2) the fine MLP in `self._implicit_function['fine']`
            labels the ray points with occupancies and colors.
        6) self._renderer['fine'].raymarcher` generates the final fine render.
        7) The fine and coarse renders are compared to the ground truth input image
            with PSNR and MSE metrics.

你可能感兴趣的:(深度学习)