目录
双目立体视觉的理解:
平行视图的极几何(第二种实现视差图的思路)
图像校正(camera calibration)
实现——相似度匹配,视差计算
重要影响参数
实验报告讨论部分
SGBM算法示例,这个效果更好,速度也更快。
【双目视觉】 SGBM算法应用(Python版)_落叶随峰的博客-CSDN博客
任务:生成视差图
关键词:视差原理(平行视图的极几何),图像校正,相似度匹配,视差计算和匹配
图片数据集:vision.middlebury.edu/stereo
From the human perspective, binocular stereoscopic vision is the ability of our brain to create a sense of depth and three-dimensional space from the different images captured by our two eyes.
From the machine perspective, binocular Stereo Vision involves using two cameras to take pictures of an object from different angles, then calculating the positional deviation between corresponding points in the images to obtain the three-dimensional geometric information of the object based on the principle of parallax.
Binocular stereo vision primarily involves four steps: camera calibration, stereo rectification, stereo matching, and parallax calculation.
左上角图为极几何图,是后面stereo matching的关键。
使用平行视图的三角测量计算出视差,但碍于没有摄像机的焦距F和拍摄两点间的距离B,所以无法使用此方法,假如有,可以使用归一化相关匹配,便可以很轻松的实现双目立体系统。
实质:通过一系列算法使得两个图像“平行”
下图为算法详解,将其转化为代码即可完成任意图像校正,这次实现之中,我使用网站的数据集,故不用进行校正。
代码实现:
可以在脚本中的主函数部分修改以下关键参数
import cv2
import numpy as np
import time
def read_images(left_image_path, right_image_path):
# Read images in grayscale mode
left_image = cv2.imread(left_image_path, 0)
right_image = cv2.imread(right_image_path, 0)
return left_image, right_image
def ncc(left_block, right_block):
# Calculate the Normalized Cross-Correlation (NCC) between two blocks
product = np.mean((left_block - left_block.mean()) * (right_block - right_block.mean()))
stds = left_block.std() * right_block.std()
if stds == 0:
return 0
else:
return product / stds
def ssd(left_block, right_block):
# Calculate the Sum of Squared Differences (SSD) between two blocks
return np.sum(np.square(np.subtract(left_block, right_block)))
def sad(left_block, right_block):
# Calculate the Sum of Absolute Differences (SAD) between two blocks
return np.sum(np.abs(np.subtract(left_block, right_block)))
def select_similarity_function(method):
# Select the similarity measure function based on the method name
if method == 'ncc':
return ncc
elif method == 'ssd':
return ssd
elif method == 'sad':
return sad
else:
raise ValueError("Unknown method")
def compute_disparity_map(left_image, right_image, block_size, disparity_range, method='ncc'):
# Initialize disparity map
height, width = left_image.shape
disparity_map = np.zeros((height, width), np.uint8)
half_block_size = block_size // 2
similarity_function = select_similarity_function(method)
# Loop over each pixel in the image
for row in range(half_block_size, height - half_block_size):
for col in range(half_block_size, width - half_block_size):
best_disparity = 0
best_similarity = float('inf') if method in ['ssd', 'sad'] else float('-inf')
# Define one block for comparison based on the current pixel
left_block = left_image[row - half_block_size:row + half_block_size + 1,
col - half_block_size:col + half_block_size + 1]
# Loop over different disparities
for d in range(disparity_range):
if col - d < half_block_size:
continue
# Define the second block for comparison
right_block = right_image[row - half_block_size:row + half_block_size + 1,
col - d - half_block_size:col - d + half_block_size + 1]
# Compute the similarity measure
similarity = similarity_function(left_block, right_block)
# Update the best similarity and disparity if necessary
if method in ['ssd', 'sad']:
# For SSD and SAD, we are interested in the minimum value
if similarity < best_similarity:
best_similarity = similarity
best_disparity = d
else:
# For NCC, we are interested in the maximum value
if similarity > best_similarity:
best_similarity = similarity
best_disparity = d
# Assign the best disparity to the disparity map
disparity_map[row, col] = best_disparity * (256. / disparity_range)
return disparity_map
def main():
# Define paths for input images
left_image_path = 'img1.png'
right_image_path = 'img2.png'
# Load images
left_image, right_image = read_images(left_image_path, right_image_path)
# Record the start time
tic_start = time.time()
# Define the block size and disparity range
block_size = 15
disparity_range = 64 # This can be adjusted based on your specific context
# Specify the similarity measurement method ('ncc', 'ssd', or 'sad')
method = 'ssd' # Change this string to switch between methods
# Compute the disparity map using the selected method
disparity_map = compute_disparity_map(left_image, right_image, block_size, disparity_range, method=method)
# Resize the disparity map for display
scale_factor = 2.0 # Scaling the image by 3 times
resized_image = cv2.resize(disparity_map, (0,0), fx=scale_factor, fy=scale_factor)
# Display the result
cv2.imshow('disparity_map_resized', resized_image)
print('Time elapsed:', time.time() - tic_start)
# Wait for key press and close all windows
cv2.waitKey(0)
cv2.destroyAllWindows()
if __name__ == "__main__":
main()
窗口大小的影响:
较小的窗口
较大的窗口
To discuss how different window sizes affect the depth map, I set search range to 64 and the size of window to 3,5,7,15,21, then I got four images Fig 3.1, Fig 3.2, Fig 3.3, Fig 3.4, Fig 3.5.(Although this question we use algorithm SSD to answer, in SAD and NCC, we have the same conclusion.) (I use block_size instead of window_size in my script)
Tips: For typographical convenience, the depth maps in this part might not be clear. Please enlarge as appropriate in the file to view details.
And the time each experiment uses is shown in the table 3.1:
table 3.1
window_size |
Time(s) |
3 |
45.30 |
5 |
44.65 |
7 |
45.48 |
15 |
51.96 |
21 |
60.54 |
Fig 3.1,window_size=3 Fig 3.2,window_size=5
Fig 3.3,window_size=7 Fig 3.4,window_size=15
Fig 3.5,window_size=21
Experimental Observations and Analysis:
Effect of Window Size:
Initiating the experiment with an exceedingly small window size, such as 3, it was observed that while the contours of objects were distinctly visible, the depth map was significantly marred by noise. Incrementing the window size marginally to dimensions like 5 or 7 led to a noticeable reduction in noise. As the window size was expanded further, reaching 15 or 21, the noise dwindled to nearly imperceptible levels, rendering the image not only smoother but also imbued with a discernible sense of depth, allowing for an explicit demarcation between proximate and remote objects.
Distortions at Larger Window Sizes:
However, an unintended consequence of this scaling was observed. At larger window sizes, specifically 15 and 21, object morphology began to suffer distortions, with closer objects, notably the triangles in Fig 3.5, being the most adversely affected. Furthermore, finer details, like the apertures in a distant fence, were obscured or entirely lost.
Timing Observations:
In terms of computational duration, the experiments displayed inconsistency, with processing times fluctuating between 45 and 60 seconds, devoid of a discernible pattern correlating with window sizes or other parameters.
Storage Space
In the experiment, smaller window sizes produce noisier images with more details, using more storage (e.g., 126KB at size 3). Increasing the size reduces noise and storage use (109KB at size 5, 85.9KB at size 7) but blurs object edges and expands shadowed areas. Despite this, larger windows enhance depth continuity and lessen noise impact. However, they also introduce a growing black space at the depth map's right edge, indicating a trade-off between clarity and information retention.
Rationale and Conclusions:
Small Window Drawbacks:
An extremely small window size creates a situation where each point essentially functions independently, similar to attempting stereo matching using only the grayscale value of individual pixels across the left and right images, which is known for its inaccuracy.
Large Window Complications:
Conversely, an oversized window encapsulates an excessive number of pixels, diluting the impact of any single pixel's shift. Consequently, as the right window shifts slightly, the minimal variation in the window's content leads the algorithm to erroneously assume uniform disparity across neighboring points. This results in a depth map that, while smooth, is bereft of critical detail.
Optimal Window Size:
The experiment unequivocally demonstrates the profound impact of window size on the fidelity of the resulting depth map. Transitioning from smaller to larger sizes reveals a trend of diminishing noise and enhanced smoothness and layering in the image. For this experiment, given a disparity search range of [0,64], an optimal window size would likely fall between 7 and 15, balancing detail with noise reduction and overall image quality.
To discuss how different similarity metrics affect the depth map, I set size of window to 3,7,15,21 in each experiment using different similarity metrics. Then we got four sets of results:
Tips: For typographical convenience, the depth maps in this part might not be clear. Please enlarge as appropriate in the file to view details.
Fig 4.1,SAD Fig 4.2,SSD Fig 4.3,NCC
And the time each algorithm used is shown in the table 4.1:
table 4.1,window_size=3
Similarity metric |
Time (s) |
SAD |
41.99 |
SSD |
45.30 |
NCC |
379.97 |
Fig 4.4,SAD Fig 4.5,SSD Fig 4.6,NCC
And the time each algorithm used is shown in the table 4.2:
table 4.2,window_size=7
Similarity metric |
Time (s) |
SAD |
43.87 |
SSD |
45.48 |
NCC |
401.16 |
Fig 4.7,SAD Fig 4.8,SSD Fig 4.9,NCC
And the time each algorithm used is shown in the table 4.3:
table 4.3,window_size=15
Similarity metric |
Time (s) |
SAD |
48.12 |
SSD |
51.96 |
NCC |
373.81 |
Fig 4.10,SAD Fig 4.12,SSD Fig 4.13,NCC
And the time each algorithm used is shown in the table 4.4:
table 4.4,window_size=21
Similarity metric |
Time (s) |
SAD |
50.77 |
SSD |
60.54 |
NCC |
483.80 |
Experimental Observations and Analysis:
The SAD algorithm works poorly.
Efficacy of the SAD Algorithm:
The performance of the SAD (Sum of Absolute Differences) algorithm is found to be subpar. When utilizing a small window size, such as 3x3, all three algorithms (SAD, SSD, and NCC) tend to generate a significant amount of noise within the results. SSD (Sum of Squared Differences) manifests slightly fewer noise points compared to SAD, while NCC (Normalized Cross-Correlation) creates the highest number of noise points, significantly impacting the clarity of distant objects, such as fences.
Error Manifestations:
In the results from SAD and SSD, depth inaccuracies are typically presented as discrete points or clustered areas. In contrast, errors within NCC outputs are more uniformly distributed, appearing as dense, scattered points with fewer aggregations.
Block Size Variations:
Upon increasing the window size to dimensions like 7x7 or 15x15, SSD displays fewer inaccuracies, particularly on objects closer to the viewpoint, although some regions demonstrate exaggerated errors. NCC, in comparison, delivers a smoother representation of distances for farther objects, as seen in Fig 4.6, despite a more pronounced granularity in the disparity map. When the window size is expanded to 21x21, SAD yields fewer noise points than SSD and NCC, though all three algorithms introduce some distortion into the object shapes.
Computational Time:
NCC consistently demands the most substantial computational time, with SAD and SSD completing more quickly across all tested scenarios.
Rationale and Conclusions:
Algorithmic Complexity:
Assessing formulas 1.3 through 1.5, it’s evident that NCC's computational demands are considerably higher than those of SAD and SSD, attributable to its more complex correlation calculations, thus leading to longer processing times.
Inaccuracies in NCC:
The notable error area within the NCC map, especially around the large triangle, could be due to an inadequate search range. As the algorithm processes the left image from left to right, it initially locates the leftmost point of the green triangle. However, the search range's limitations prevent it from identifying the optimal match, resulting in a premature selection of a match point. This miscalculation translates into a disparity that understates the actual one, evidenced by the darker points indicating lesser disparity.
Optimal Algorithm Selection:
Based on repetitive trials, the SSD or NCC algorithms with a moderate 7x7 to 15x15 block size are recommended for more precise outcomes, or the SAD algorithm with a smaller 3x3 window size for acceptable results with less computational demand.