双目立体视觉——视差图(stereo matching)三种相似度算法实现

目录

双目立体视觉的理解:

平行视图的极几何(第二种实现视差图的思路)

图像校正(camera calibration)

实现——相似度匹配,视差计算

重要影响参数

实验报告讨论部分


SGBM算法示例,这个效果更好,速度也更快。

【双目视觉】 SGBM算法应用(Python版)_落叶随峰的博客-CSDN博客

任务:生成视差图

关键词:视差原理(平行视图的极几何),图像校正,相似度匹配,视差计算和匹配

图片数据集:vision.middlebury.edu/stereo

双目立体视觉的理解:

From the human perspective, binocular stereoscopic vision is the ability of our brain to create a sense of depth and three-dimensional space from the different images captured by our two eyes.

From the machine perspective, binocular Stereo Vision involves using two cameras to take pictures of an object from different angles, then calculating the positional deviation between corresponding points in the images to obtain the three-dimensional geometric information of the object based on the principle of parallax.

Binocular stereo vision primarily involves four steps: camera calibration, stereo rectification, stereo matching, and parallax calculation.

平行视图的极几何(第二种实现视差图的思路)

双目立体视觉——视差图(stereo matching)三种相似度算法实现_第1张图片

左上角图为极几何图,是后面stereo matching的关键。

使用平行视图的三角测量计算出视差,但碍于没有摄像机的焦距F和拍摄两点间的距离B,所以无法使用此方法,假如有,可以使用归一化相关匹配,便可以很轻松的实现双目立体系统。

双目立体视觉——视差图(stereo matching)三种相似度算法实现_第2张图片

图像校正(camera calibration)

实质:通过一系列算法使得两个图像“平行”

下图为算法详解,将其转化为代码即可完成任意图像校正,这次实现之中,我使用网站的数据集,故不用进行校正。

双目立体视觉——视差图(stereo matching)三种相似度算法实现_第3张图片

实现——相似度匹配,视差计算

  1. 读入两个角度照片的灰度图
  2. 用左图的图像块去遍历右图的图像块
  3. 使用三种不同的相似度度量方法去获得最佳视差值
  4. 输出最佳视差图和计算时间

代码实现:

可以在脚本中的主函数部分修改以下关键参数

  • window size
  • 相似度度量方法
  • 视差范围(此次实验不做讨论)
  • 输出图像的窗口放大倍数
import cv2
import numpy as np
import time

def read_images(left_image_path, right_image_path):
    # Read images in grayscale mode
    left_image = cv2.imread(left_image_path, 0)
    right_image = cv2.imread(right_image_path, 0)
    return left_image, right_image

def ncc(left_block, right_block):
    # Calculate the Normalized Cross-Correlation (NCC) between two blocks
    product = np.mean((left_block - left_block.mean()) * (right_block - right_block.mean()))
    stds = left_block.std() * right_block.std()

    if stds == 0:
        return 0
    else:
        return product / stds

def ssd(left_block, right_block):
    # Calculate the Sum of Squared Differences (SSD) between two blocks
    return np.sum(np.square(np.subtract(left_block, right_block)))

def sad(left_block, right_block):
    # Calculate the Sum of Absolute Differences (SAD) between two blocks
    return np.sum(np.abs(np.subtract(left_block, right_block)))

def select_similarity_function(method):
    # Select the similarity measure function based on the method name
    if method == 'ncc':
        return ncc
    elif method == 'ssd':
        return ssd
    elif method == 'sad':
        return sad
    else:
        raise ValueError("Unknown method")

def compute_disparity_map(left_image, right_image, block_size, disparity_range, method='ncc'):
    # Initialize disparity map
    height, width = left_image.shape
    disparity_map = np.zeros((height, width), np.uint8)
    half_block_size = block_size // 2
    similarity_function = select_similarity_function(method)

    # Loop over each pixel in the image
    for row in range(half_block_size, height - half_block_size):
        for col in range(half_block_size, width - half_block_size):
            best_disparity = 0
            best_similarity = float('inf') if method in ['ssd', 'sad'] else float('-inf')

            # Define one block for comparison based on the current pixel
            left_block = left_image[row - half_block_size:row + half_block_size + 1,
                                     col - half_block_size:col + half_block_size + 1]

            # Loop over different disparities
            for d in range(disparity_range):
                if col - d < half_block_size:
                    continue

                # Define the second block for comparison
                right_block = right_image[row - half_block_size:row + half_block_size + 1,
                                          col - d - half_block_size:col - d + half_block_size + 1]

                # Compute the similarity measure
                similarity = similarity_function(left_block, right_block)

                # Update the best similarity and disparity if necessary
                if method in ['ssd', 'sad']:
                    # For SSD and SAD, we are interested in the minimum value
                    if similarity < best_similarity:
                        best_similarity = similarity
                        best_disparity = d
                else:
                    # For NCC, we are interested in the maximum value
                    if similarity > best_similarity:
                        best_similarity = similarity
                        best_disparity = d

            # Assign the best disparity to the disparity map
            disparity_map[row, col] = best_disparity * (256. / disparity_range)

    return disparity_map

def main():
    # Define paths for input images
    left_image_path = 'img1.png'
    right_image_path = 'img2.png'

    # Load images
    left_image, right_image = read_images(left_image_path, right_image_path)

    # Record the start time
    tic_start = time.time()

    # Define the block size and disparity range
    block_size = 15
    disparity_range = 64  # This can be adjusted based on your specific context

    # Specify the similarity measurement method ('ncc', 'ssd', or 'sad')
    method = 'ssd'  # Change this string to switch between methods

    # Compute the disparity map using the selected method
    disparity_map = compute_disparity_map(left_image, right_image, block_size, disparity_range, method=method)

    # Resize the disparity map for display
    scale_factor = 2.0  # Scaling the image by 3 times
    resized_image = cv2.resize(disparity_map, (0,0), fx=scale_factor, fy=scale_factor)

    # Display the result
    cv2.imshow('disparity_map_resized', resized_image)
    print('Time elapsed:', time.time() - tic_start)

    # Wait for key press and close all windows
    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

重要影响参数

窗口大小的影响:

较小的窗口

  • 细节丰富
  • 更多噪声

较大的窗口

  • 视差图更平滑,噪声更小
  • 细节丢失

实验报告讨论部分

  1. A discussion on how different window sizes affect the results, and why. 300 – 800 字,可以加任意数量的图和表

To discuss how different window sizes affect the depth map, I set search range to 64 and the size of window to 3,5,7,15,21, then I got four images Fig 3.1, Fig 3.2, Fig 3.3, Fig 3.4, Fig 3.5.(Although this question we use algorithm SSD to answer, in SAD and NCC, we have the same conclusion.) (I use block_size instead of window_size in my script)

Tips: For typographical convenience, the depth maps in this part might not be clear. Please enlarge as appropriate in the file to view details.

And the time each experiment uses is shown in the table 3.1:

table 3.1

window_size

Time(s)

3

45.30

5

44.65

7

45.48

15

51.96

21

60.54

双目立体视觉——视差图(stereo matching)三种相似度算法实现_第4张图片                 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第5张图片

      Fig 3.1,window_size=3          Fig 3.2,window_size=5       双目立体视觉——视差图(stereo matching)三种相似度算法实现_第6张图片                      双目立体视觉——视差图(stereo matching)三种相似度算法实现_第7张图片

          Fig 3.3,window_size=7                                                Fig 3.4,window_size=15

                           双目立体视觉——视差图(stereo matching)三种相似度算法实现_第8张图片

Fig 3.5,window_size=21

Experimental Observations and Analysis:

Effect of Window Size:

Initiating the experiment with an exceedingly small window size, such as 3, it was observed that while the contours of objects were distinctly visible, the depth map was significantly marred by noise. Incrementing the window size marginally to dimensions like 5 or 7 led to a noticeable reduction in noise. As the window size was expanded further, reaching 15 or 21, the noise dwindled to nearly imperceptible levels, rendering the image not only smoother but also imbued with a discernible sense of depth, allowing for an explicit demarcation between proximate and remote objects.

Distortions at Larger Window Sizes:

However, an unintended consequence of this scaling was observed. At larger window sizes, specifically 15 and 21, object morphology began to suffer distortions, with closer objects, notably the triangles in Fig 3.5, being the most adversely affected. Furthermore, finer details, like the apertures in a distant fence, were obscured or entirely lost.

Timing Observations:

In terms of computational duration, the experiments displayed inconsistency, with processing times fluctuating between 45 and 60 seconds, devoid of a discernible pattern correlating with window sizes or other parameters.

Storage Space

In the experiment, smaller window sizes produce noisier images with more details, using more storage (e.g., 126KB at size 3). Increasing the size reduces noise and storage use (109KB at size 5, 85.9KB at size 7) but blurs object edges and expands shadowed areas. Despite this, larger windows enhance depth continuity and lessen noise impact. However, they also introduce a growing black space at the depth map's right edge, indicating a trade-off between clarity and information retention.

Rationale and Conclusions:

Small Window Drawbacks:

An extremely small window size creates a situation where each point essentially functions independently, similar to attempting stereo matching using only the grayscale value of individual pixels across the left and right images, which is known for its inaccuracy.

Large Window Complications:

Conversely, an oversized window encapsulates an excessive number of pixels, diluting the impact of any single pixel's shift. Consequently, as the right window shifts slightly, the minimal variation in the window's content leads the algorithm to erroneously assume uniform disparity across neighboring points. This results in a depth map that, while smooth, is bereft of critical detail.

Optimal Window Size:

The experiment unequivocally demonstrates the profound impact of window size on the fidelity of the resulting depth map. Transitioning from smaller to larger sizes reveals a trend of diminishing noise and enhanced smoothness and layering in the image. For this experiment, given a disparity search range of [0,64], an optimal window size would likely fall between 7 and 15, balancing detail with noise reduction and overall image quality.

  1. A discussion on the different similarity metrics you have used, explaining how they affect the results, and why. 300 – 800 字,可以加任意数量的图和表

To discuss how different similarity metrics affect the depth map, I set size of window to 3,7,15,21 in each experiment using different similarity metrics. Then we got four sets of results:

Tips: For typographical convenience, the depth maps in this part might not be clear. Please enlarge as appropriate in the file to view details.

  1. window_size=3

  双目立体视觉——视差图(stereo matching)三种相似度算法实现_第9张图片  双目立体视觉——视差图(stereo matching)三种相似度算法实现_第10张图片  双目立体视觉——视差图(stereo matching)三种相似度算法实现_第11张图片

              Fig 4.1,SAD                                    Fig 4.2,SSD                              Fig 4.3,NCC

And the time each algorithm used is shown in the table 4.1:

table 4.1,window_size=3

Similarity metric

Time (s)

SAD

41.99

SSD

45.30

NCC

379.97

  1. window_size=7

  双目立体视觉——视差图(stereo matching)三种相似度算法实现_第12张图片 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第13张图片 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第14张图片

           Fig 4.4,SAD                                     Fig 4.5,SSD                           Fig 4.6,NCC

And the time each algorithm used is shown in the table 4.2:

table 4.2,window_size=7

Similarity metric

Time (s)

SAD

43.87

SSD

45.48

NCC

401.16

  1. window_size=15

 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第15张图片 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第16张图片 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第17张图片

           Fig 4.7,SAD                                     Fig 4.8,SSD                               Fig 4.9,NCC

And the time each algorithm used is shown in the table 4.3:

table 4.3,window_size=15

Similarity metric

Time (s)

SAD

48.12

SSD

51.96

NCC

373.81

  1. window_size=3

  双目立体视觉——视差图(stereo matching)三种相似度算法实现_第18张图片 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第19张图片 双目立体视觉——视差图(stereo matching)三种相似度算法实现_第20张图片

             Fig 4.10,SAD                                Fig 4.12,SSD                              Fig 4.13,NCC

And the time each algorithm used is shown in the table 4.4:

table 4.4,window_size=21

Similarity metric

Time (s)

SAD

50.77

SSD

60.54

NCC

483.80

Experimental Observations and Analysis:

The SAD algorithm works poorly.

Efficacy of the SAD Algorithm:

The performance of the SAD (Sum of Absolute Differences) algorithm is found to be subpar. When utilizing a small window size, such as 3x3, all three algorithms (SAD, SSD, and NCC) tend to generate a significant amount of noise within the results. SSD (Sum of Squared Differences) manifests slightly fewer noise points compared to SAD, while NCC (Normalized Cross-Correlation) creates the highest number of noise points, significantly impacting the clarity of distant objects, such as fences.

Error Manifestations:

In the results from SAD and SSD, depth inaccuracies are typically presented as discrete points or clustered areas. In contrast, errors within NCC outputs are more uniformly distributed, appearing as dense, scattered points with fewer aggregations.

Block Size Variations:

Upon increasing the window size to dimensions like 7x7 or 15x15, SSD displays fewer inaccuracies, particularly on objects closer to the viewpoint, although some regions demonstrate exaggerated errors. NCC, in comparison, delivers a smoother representation of distances for farther objects, as seen in Fig 4.6, despite a more pronounced granularity in the disparity map. When the window size is expanded to 21x21, SAD yields fewer noise points than SSD and NCC, though all three algorithms introduce some distortion into the object shapes.

Computational Time:

NCC consistently demands the most substantial computational time, with SAD and SSD completing more quickly across all tested scenarios.

Rationale and Conclusions:

Algorithmic Complexity:

Assessing formulas 1.3 through 1.5, it’s evident that NCC's computational demands are considerably higher than those of SAD and SSD, attributable to its more complex correlation calculations, thus leading to longer processing times.

Inaccuracies in NCC:

The notable error area within the NCC map, especially around the large triangle, could be due to an inadequate search range. As the algorithm processes the left image from left to right, it initially locates the leftmost point of the green triangle. However, the search range's limitations prevent it from identifying the optimal match, resulting in a premature selection of a match point. This miscalculation translates into a disparity that understates the actual one, evidenced by the darker points indicating lesser disparity.

Optimal Algorithm Selection:

Based on repetitive trials, the SSD or NCC algorithms with a moderate 7x7 to 15x15 block size are recommended for more precise outcomes, or the SAD algorithm with a smaller 3x3 window size for acceptable results with less computational demand.

你可能感兴趣的:(1024程序员节,python,笔记,学习)