Point-Based Multi-View Stereo Network
ICCV 2019 oral → TPAMI
cost volume → directly process point clouds
predict the depth in a coarse-to-fine manner
leverage 3D geometry priors and 2D texture information jointly → feature-augmented point cloud
key sight: 节省内存,重建质量更高,同时还有潜在应用(forested depth inference)
small 3D cost volume generate an initial coarse depth map, then converted to point cloud
iteratively regress point cloud from initial point cloud using PointFlow
本文: PointFlow module, which estimates the 3D flow based on joint 2D-3D features of point hypotheses
use MVSNet to predict a relatively low-resolution cost volume
图像1/8下采样,深度假设层从256缩减到48 or 96,内存消耗为MVSNet的1/20
3-scale,每个下采样的前一层都被提取用来构建最终的特征金字塔 [Fi1, Fi2, Fi3]
fetched multi-view image feature variance + normalized 3D coordinates in world space
fetched image feature: 3D点的特征可以由多视点feature map的相机投影变换构建,由于刚才算的图像特征是不同尺寸的,因此相机内参也要放缩不同尺度,同样也使用了方差来聚集多个feature map
(为什么我理解这一步就是把上一步得到的特征求方差构建C呢,也没看到投影变换啊 公式1)?
normalized point coordinate: 公式(2) 把刚刚的图像特征和这个点的空间位置信息组合在一起
这样就得到了feature augmented point,它作为下一步PointFlow的输入
这其中,每一次迭代预测深度剩余的时候,点的位置Xp都会更新,因此会取出不同的point feature → fetch features from different areas of images dynamically according to the updated point position
因为已知相机参数,可以把深度图投影到点云上,for each point, estimate its displacement to the ground truth surface along the reference camera direction by observing its neighboring points from all views
【point hypotheses generation】
gather neighborhood image feature information at different depth
【edge convolution】
use DGCNN to enrich feature aggregation between neighboring points
通过刚刚得到的feature augmented point对应的一组点,edge convolution是关于这些点的一个非线性函数,最后再通过一些聚集操作(max pooling, average pooling,…)
【flow prediction】
输入feature augmented point,输出是flow → 进而得到深度残差图
本文可以看做数据驱动的点云上采样(用reference view里的信息)
可以对ROI感兴趣区域单独稠密,原文里叫 forested depth inference