5 View Selection
5.1 Global View Selection
For each reference view R, global view selection seeks a set N of neighboring views that are good candidates for stereo matching in terms of scene content, appearance, and scale. In addition, the neighboring views should provide sufficient parallax with respect to R and each other in order to enable a stable match. Here we describe a scoring function designed to measure the quality of each candidate neighboring view based on these desiderata.

对于每个参考视图R,“全局视图选择”(global view selection)会寻找一组N个相邻视图,这些视图在场景内容、外观和比例方面都是立体匹配的良好候选视图。此外,相邻视图应提供足够的视差,以实现稳定的匹配。在这里,我们描述了一个评分函数,用于根据这些需求来衡量每个候选相邻视图的质量。

首先作者把待匹配的这张图片叫做参考试图 R,然后作者设计了打分器给其他图片打分,这个过程叫做全局视图选择(Global View Selection)。打分的过程是这样的:它不是给所有的图片都打分,而是先挑选出一组邻近的候选视图 N,然后给N中的每个视图打分。

  1. 相比于参考试图得有一定视差
  2. 得有相似的场景内容
  3. 有相似的分辨率(比例)


To first order, the number of shared feature points reconstructed in the SfM phase is a good indicator of the compatibility of a given view V with the reference view R. Indeed, images with many shared features generally cover a similar portion of the scene. Moreover, success in SIFT matching is a good predictor that pixel-level matching will also succeed across much of the image. In particular, SIFT selects features with similar appearance, and thus images with many shared features tend to have similar appearance to each other, overall.



However, the number of shared feature points is not sufficient to ensure good reconstructions. First, the views with the most shared feature points tend to be nearly collocated and as such do not provide a large enough baseline for accurate reconstruction. Second, the scale invariance of the SIFT feature detector causes images of substantially different resolutions to match well, but such resolution differences are problematic for stereo matching.


但是!如果只用两张图片相互匹配的特征点数量来进行选择,选择跟视图R共用特征点最多的视图V来进行深度计算就会出问题,因为他们很可能满足不了有一定视差这个条件,因为它们很可能太像一张图片了,用作者的话来说他们往往几乎是并置的(nearly collocated)所以无法提供足够的基线来进行重建计算。

Thus, we compute a global score gR for each view V within a candidate neighborhood N (which includes R) as a weighted sum over features shared with R:
where FX is the set of feature points observed in view X, and the weight functions are described below.
To encourage a good range of parallax within a neighborhood, the weight function wN (f) is defined as a product over all pairs of views in N:
MVS的邻域选择 Multi-View Stereo for Community Photo Collections 第5.1节口语化翻译_第1张图片
where wα(f,Vi,Vj) = min((α/αmax)^2,1) and α is the angle between the lines of sight from Vi and Vj to f . The function wα(f,Vi,Vj) downweights triangulation angles below αmax, which we set to 10 degrees in all of our experiments. The quadratic weight function serves to counteract the trend of greater numbers of features in common with decreasing angle. At the same time, excessively large triangulation angles are automatically discouraged by the associated scarcity of shared SIFT features.



MVS的邻域选择 Multi-View Stereo for Community Photo Collections 第5.1节口语化翻译_第2张图片



MVS的邻域选择 Multi-View Stereo for Community Photo Collections 第5.1节口语化翻译_第3张图片
MVS的邻域选择 Multi-View Stereo for Community Photo Collections 第5.1节口语化翻译_第4张图片

The weighting function ws(f) measures similarity in resolution of images R and V at feature f. To estimate the 3D sampling rate of V in the vicinity of the feature f, we compute the diameter sV (f) of a sphere centered at f whose projected diameter in V equals the pixel spacing in V . We similarly compute sR(f) for R and define the scale weight ws based on the ratio r = sR(f)/sV (f) using
MVS的邻域选择 Multi-View Stereo for Community Photo Collections 第5.1节口语化翻译_第5张图片
This weight function favors views with equal or higher res- olution than the reference view.

ws(scale weight)是用来测量RV在共视点f处分辨率的相似性的,计算ws要先计算sR(f)sV(f),它俩是一样的东西,例如sV是指Vf点处的分辨率。什么叫f点的分辨率呢,就是如果这点在画面上移动了一个像素间距(pixel spacing),那么实际3D中的f点在空间中移动了多少距离,这个距离就是sV(f)。还是那位博主的图
MVS的邻域选择 Multi-View Stereo for Community Photo Collections 第5.1节口语化翻译_第6张图片
最后另 r = sR(f)/sV (f)
MVS的邻域选择 Multi-View Stereo for Community Photo Collections 第5.1节口语化翻译_第7张图片
意思是VR的分辨率小一半以内是可以接受的,RV大或者比V小一半以上都会受到惩罚。我估摸着这意思就是V图要尽量包含R图(r > 1),这样R图上的点都能在V上找到,因为目标还是重建R;但也不能太远了(r < 2),保证清晰度。总体来说,该权重函数支持分辨率等于或高于参考视图的视图。

Having defined the global score for a view V and neighborhood N, we could now find the best N of a given size (usually |N| = 10), in terms of the sum of view scores ∑V ∈N gR(v). For efficiency, we take a greedy approach and grow the neighborhood incrementally by iteratively adding to N the highest scoring view given the current N (which initially contains only R).


Rescaling Views Although global view selection tries to select neighboring views with compatible scale, some amount of scale mismatch is unavoidable due to variability in resolution within CPCs, and can adversely affect stereo matching. We therefore seek to adapt, through proper filtering, the scale of all views to a common, narrow range either globally or on a per-pixel basis. We chose the former to avoid varying the size of the matching window in different areas of the depth map and to improve efficiency. Our approach is to find the lowest-resolution view Vmin ∈ N relative to R, resample R to approximately match that lower resolution, and then resample images of higher resolution to match R.

global view selection 虽然会尽力去选择scale compatible较好的neighbouring view进入N,但是难免会有较大的resolution 差距从而导致mismatch,这些mismatch 会对立体匹配产生很大的影响。可以通过过滤器将views的scale进行归一化,尽量减小取值范围。作者的方法是首先计算相对与R,在N中的最低分辨率view Vmin,然后根据Vmin对R进行重采样,再利用重采样后的R对其余高分辨率的图像进行重采样。


Specifically, we estimate the resolution scale of a view V relative to R based on their shared features:

Vmin is then simply equal to arg minV ∈N scaleR (V ). If scaleR (Vmin ) is smaller than a threshold t (in our case t = 0.6 which corresponds to mapping a 5×5 reference window on a 3×3 window in the neighboring view with the lowest relative scale), we rescale the reference view so that, after rescaling, scaleR (Vmin ) = t. We then rescale all neighboring views with scaleR(V ) > 1.2 to match the scale of the reference view (which has possibly itself been rescaled in the previous step). Note that all rescaled ver- sions of images are discarded when moving on to compute a depth map for the next reference view.


从这个式子可以看到,求和符号的内部就是上面计算过的 r,这个式子相当于是把V和R的所有共视点处的分辨率尺度取了平均。
首先我们找到N中尺度最小的V,把它的尺度设为scaleR(Vmin),如果scaleR(Vmin)<0.6,则对R进行rescale。之后,rescale 所有 scaleR(V)大于1.2的V,使其matchreference view。注意,在计算深度图是,scale的图片都会被丢弃。

