最近NeRF热的不行,好多组都在沿着这个方向做各种尝试,为了能和这些算法的兄弟们交流,不至于听不懂他们在讲什么,所以决定花些时间认真搞搞懂。之前也是经常因为这个原因就去学习一些东西,但是都是自己看看就完了,没有认真整理过笔记,后来受一位大佬的影响,发现认真整理笔记不仅为了自己,也为了后续学习的人可以方便有疑问的地方可以参考,所以也决定留些公开的学习笔记,一方面是强迫自己好好整理,二是万一对别人有点帮助更是一些欣慰。
由于是边学习,边记录,所以逻辑性会比较差,有机会的话我会重新再整理一下。
首先记录下我自己的学习顺序并推荐下参考资料:
为了方便后续阅读,我争取都放在一篇文章中,我会慢慢整理补充这个笔记,持续更新完善细节,第一次弄,比较慢,也希望用心弄,争取对自己和别人都有些帮助。
0408:更新了个前言,后面的暂时还都是模板内容
RayBundle类是个NamedTuple,包含origins,directions,lengths,xys(保存了sample图像中的xy坐标),类注释中写了directions不一定非得标准化,RayBundle可以通过ray_bundle_to_ray_points函数得到所有的points(Tensor)信息。
还有一句注释没完全懂:they define unit vector in the respective 1D coordinate systems
Args:
rays_origins: A tensor of shape `(..., 3)`
rays_directions: A tensor of shape `(..., 3)`
rays_lengths: A tensor of shape `(..., num_points_per_ray)`
Returns:
rays_points: A tensor of shape `(..., num_points_per_ray, 3)`
containing the points sampled along each ray.
通过None扩展维度,通过Tensor Broadcasting完成计算
https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html#in-brief-tensor-broadcasting
rays_points = (
rays_origins[..., None, :]
+ rays_lengths[..., :, None] * rays_directions[..., None, :]
)
MultinomialRaysampler类是个nn.Module,这个类注释写的比较明白,还给了个图,直接完整copy过来,NDCMultinomialRaysampler也是这个类的特殊case,按照NDC空间指定了入参,这个类简单的说,就是一个立方体长宽高通过min_x,min_y,min_depth,max_x,max_y,max_depth来指定,每个维度均匀按照image_width, image_height,n_pts_per_ray划分格子进行采样,采样后再通过指定的输入cameras,通过cameras的unproject_points函数转换为变换前的世界坐标,得到采样后RayBundle。
这个类相比之前版本(0.6.1)做了更新,增加了几个参数,完成几个额外功能,n_rays_per_image,就是xy平面也可以进行采样了,unit_directions指定了方向向量是否归一化,stratified_sampling指定了每条ray是否采用stratified random sampling进行采样。
MonteCarloRaysampler的注释直接这里写下吧,For practical purposes, this is similar to MultinomialRaysampler without a mask, however sampling at real-valued locations bypassing replacement checks may be faster.
Samples a fixed number of points along rays which are regularly distributed
in a batch of rectangular image grids. Points along each ray
have uniformly-spaced z-coordinates between a predefined
minimum and maximum depth.
The raysampler first generates a 3D coordinate grid of the following form:
/ min_x, min_y, max_depth -------------- / max_x, min_y, max_depth
/ /|
/ / | ^
/ min_depth min_depth / | |
min_x ----------------------------- max_x | | image
min_y min_y | | height
| | | |
| | | v
| | |
| | / max_x, max_y, ^
| | / max_depth /
min_x max_y / / n_pts_per_ray
max_y ----------------------------- max_x/ min_depth v
< --- image_width --- >
In order to generate ray points, `MultinomialRaysampler` takes each 3D point of
the grid (with coordinates `[x, y, depth]`) and unprojects it
with `cameras.unproject_points([x, y, depth])`, where `cameras` are an
additional input to the `forward` function.
Note that this is a generic implementation that can support any image grid
coordinate convention. For a raysampler which follows the PyTorch3D
coordinate conventions please refer to `NDCMultinomialRaysampler`.
As such, `NDCMultinomialRaysampler` is a special case of `MultinomialRaysampler`.
forward函数的注释也比较清楚,直接copy过来吧
Args:
cameras: A batch of `batch_size` cameras from which the rays are emitted.
mask: if given, the rays are sampled from the mask. Should be of size
(batch_size, image_height, image_width).
min_depth: The minimum depth of a ray-point.
max_depth: The maximum depth of a ray-point.
n_rays_per_image: If given, this amount of rays are sampled from the grid.
n_pts_per_ray: The number of points sampled along each ray.
stratified_sampling: if set, performs stratified sampling in n_pts_per_ray
bins for each ray; otherwise takes n_pts_per_ray deterministic points
on each ray with uniform offsets.
Returns:
A named tuple RayBundle with the following fields:
origins: A tensor of shape
`(batch_size, s1, s2, 3)`
denoting the locations of ray origins in the world coordinates.
directions: A tensor of shape
`(batch_size, s1, s2, 3)`
denoting the directions of each ray in the world coordinates.
lengths: A tensor of shape
`(batch_size, s1, s2, n_pts_per_ray)`
containing the z-coordinate (=depth) of each ray in world units.
xys: A tensor of shape
`(batch_size, s1, s2, 2)`
containing the 2D image coordinates of each ray or,
if mask is given, `(batch_size, n, 1, 2)`
Here `s1, s2` refer to spatial dimensions. Unless the mask is
given, they equal `(image_height, image_width)`, otherwise `(n, 1)`,
where `n` is `n_rays_per_image` if provided, otherwise the minimum
cardinality of the mask in the batch.
注释也是比较清楚,这个用在render_pass == "fine"中
Implements the importance sampling of points along rays.
The input is a `RayBundle` object with a `ray_weights` tensor
which specifies the probabilities of sampling a point along each ray.
This raysampler is used for the fine rendering pass of NeRF.
As such, the forward pass accepts the RayBundle output by the
raysampling of the coarse rendering pass. Hence, it does not
take cameras as input.
注释搬运工
Raymarch using the Emission-Absorption (EA) algorithm.
The algorithm independently renders each ray by analyzing density and
feature values sampled at (typically uniformly) spaced 3D locations along
each ray. The density values `rays_densities` are of shape
`(..., n_points_per_ray)`, their values should range between [0, 1], and
represent the opaqueness of each point (the higher the less transparent).
The feature values `rays_features` of shape
`(..., n_points_per_ray, feature_dim)` represent the content of the
point that is supposed to be rendered in case the given point is opaque
(i.e. its density -> 1.0).
EA first utilizes `rays_densities` to compute the absorption function
along each ray as follows:
absorption = cumprod(1 - rays_densities, dim=-1)
The value of absorption at position `absorption[..., k]` specifies
how much light has reached `k`-th point along a ray since starting
its trajectory at `k=0`-th point.
Each ray is then rendered into a tensor `features` of shape `(..., feature_dim)`
by taking a weighed combination of per-ray features `rays_features` as follows:
```
weights = absorption * rays_densities
features = (rays_features * weights).sum(dim=-2)
```
Where `weights` denote a function that has a strong peak around the location
of the first surface point that a given ray passes through.
Note that for a perfectly bounded volume (with a strictly binary density),
the `weights = cumprod(1 - rays_densities, dim=-1) * rays_densities`
function would yield 0 everywhere. In order to prevent this,
the result of the cumulative product is shifted `self.surface_thickness`
elements along the ray direction.
实际的实现关注
rays_densities = rays_densities[..., 0]
absorption = _shifted_cumprod(
(1.0 + eps) - rays_densities, shift=self.surface_thickness
)
weights = rays_densities * absorption
features = (weights[..., None] * rays_features).sum(dim=-2)
opacities = 1.0 - torch.prod(1.0 - rays_densities, dim=-1, keepdim=True)
return torch.cat((features, opacities), dim=-1)
与上一个类的不同,就是把额外把weights返回了,opacities没返回
This is essentially the `pytorch3d.renderer.EmissionAbsorptionRaymarcher`
which additionally returns the rendering weights. It also skips returning
the computation of the alpha-mask which is, in case of NeRF, equal to 1
everywhere.
The weights are later used in the NeRF pipeline to carry out the importance
ray-sampling for the fine rendering pass.
具体实现:
rays_densities = rays_densities[..., 0]
absorption = _shifted_cumprod(
(1.0 + eps) - rays_densities, shift=self.surface_thickness
)
weights = rays_densities * absorption
features = (weights[..., None] * rays_features).sum(dim=-2)
该类实现了论文中position embedding
Given an input tensor `x` of shape [minibatch, ... , dim],
the harmonic embedding layer converts each feature
(i.e. vector along the last dimension) in `x`
into a series of harmonic features `embedding`,
where for each i in range(dim) the following are present
in embedding[...]:
```
[
sin(f_1*x[..., i]),
sin(f_2*x[..., i]),
...
sin(f_N * x[..., i]),
cos(f_1*x[..., i]),
cos(f_2*x[..., i]),
...
cos(f_N * x[..., i]),
x[..., i], # only present if append_input is True.
]
```
where N corresponds to `n_harmonic_functions-1`, and f_i is a scalar
denoting the i-th frequency of the harmonic embedding.
If `logspace==True`, the frequencies `[f_1, ..., f_N]` are
powers of 2:
`f_1, ..., f_N = 2**torch.arange(n_harmonic_functions)`
If `logspace==False`, frequencies are linearly spaced between
`1.0` and `2**(n_harmonic_functions-1)`:
`f_1, ..., f_N = torch.linspace(
1.0, 2**(n_harmonic_functions-1), n_harmonic_functions
)`
Note that `x` is also premultiplied by the base frequency `omega_0`
before evaluating the harmonic functions.
HarmonicEmbedding完成γ(x),MLPWithInputSkips实现了论文中MLP网络中红框部分:
实现了论文中的NeuralRadianceField模型部分,具体模型结构如上图,输入ray_bundle和density_noise_std得到rays_densities,和rays_colors
初始化入参分别是raysampler和raymarcher实例,forward函数接收cameras和volumetric_function,然后得到images,同时返回了ray_bundle,核心代码如下:
# first call the ray sampler that returns the RayBundle parametrizing
# the rendering rays.
ray_bundle = self.raysampler(
cameras=cameras, volumetric_function=volumetric_function, **kwargs
)
# ray_bundle.origins - minibatch x ... x 3
# ray_bundle.directions - minibatch x ... x 3
# ray_bundle.lengths - minibatch x ... x n_pts_per_ray
# ray_bundle.xys - minibatch x ... x 2
# given sampled rays, call the volumetric function that
# evaluates the densities and features at the locations of the
# ray points
rays_densities, rays_features = volumetric_function(
ray_bundle=ray_bundle, cameras=cameras, **kwargs
)
# ray_densities - minibatch x ... x n_pts_per_ray x density_dim
# ray_features - minibatch x ... x n_pts_per_ray x feature_dim
# finally, march along the sampled rays to obtain the renders
images = self.raymarcher(
rays_densities=rays_densities,
rays_features=rays_features,
ray_bundle=ray_bundle,
**kwargs
)
# images - minibatch x ... x (feature_dim + opacity_dim)
return images, ray_bundle
This class holds pointers to the fine and coarse renderer objects, which are
instances of `pytorch3d.renderer.ImplicitRenderer`, and pointers to the
neural networks representing the fine and coarse Neural Radiance Fields,
which are instances of `NeuralRadianceField`.
The rendering forward pass proceeds as follows:
1) For a given input camera, rendering rays are generated with the
`NeRFRaysampler` object of `self._renderer['coarse']`.
In the training mode (`self.training==True`), the rays are a set
of `n_rays_per_image` random 2D locations of the image grid.
In the evaluation mode (`self.training==False`), the rays correspond
to the full image grid. The rays are further split to
`chunk_size_test`-sized chunks to prevent out-of-memory errors.
2) For each ray point, the coarse `NeuralRadianceField` MLP is evaluated.
The pointer to this MLP is stored in `self._implicit_function['coarse']`
3) The coarse radiance field is rendered with the
`EmissionAbsorptionNeRFRaymarcher` object of `self._renderer['coarse']`.
4) The coarse raymarcher outputs a probability distribution that guides
the importance raysampling of the fine rendering pass. The
`ProbabilisticRaysampler` stored in `self._renderer['fine'].raysampler`
implements the importance ray-sampling.
5) Similar to 2) the fine MLP in `self._implicit_function['fine']`
labels the ray points with occupancies and colors.
6) self._renderer['fine'].raymarcher` generates the final fine render.
7) The fine and coarse renders are compared to the ground truth input image
with PSNR and MSE metrics.