局部立体匹配算法BM中的匹配代价聚合方法

Block Matching methods

The following matching costs are defined for patches centered at the pixel positions {mathbf p} and {mathbf q}.

The matching costs described below are trivially extended to color images by not making distinctions between the channels. This is equivalent to consider larger patches (three times the number of pixels) containing the pixel values from all the channels.

SSD (sum of squared differences) and SAD (sum of absolute differences)

By taking p={1,2} in the following equation, we have the Sum of Absolute Differences (SAD) and the Sum of Squared Differences (SSD) respectively:

mbox{SSD}(mathbf p,mathbf q) := sum_{mathbf t in B(0,r)} | u({mathbf p} + mathbf  t), v({mathbf q} + mathbf  t)|^p. 

But any value of p will work as well.

ZSSD (Zero mean SSD, or SSD-mean)

The Zero-mean SSD adds invariance with respect to additive contrast changes to the SSD. It is defined by

mbox{ZSSD}(mathbf p, mathbf q) :=  sum_{mathbf t in B(0,r)}     left| u({mathbf p} + mathbf  t) -  v({mathbf q} + mathbf t)  +       mu_v({mathbf q})-  mu_u({mathbf p}) right|^2 , 

where textstyle  mu_u({mathbf p})= frac{1}{| B(0,r)|} sum_{mathbf t in B(0,r)} u(mathbf p+mathbf t) is the precomputed average of pixel values in u over the block centered at mathbf p. The ZSSD cannot be implemented with integral images if expressed in this form, because the pixel value distances are dependent on the position of the pixels. Indeed, depending on the position, a different pair of means mu_{v} and mu_{u} are being subtracted. However, expanding the square we get

mbox{ZSSD}(mathbf p,mathbf q) :=    mbox{SSD}(mathbf p,mathbf q) -       {|B(0,r)|} left(   mu_v({mathbf q})-  mu_u({mathbf p})  right)^2 

and since mu_v(cdot) and mu_u(cdot) can be also precomputed by integral images, the overall cost of ZSSD is comparable to SSD.

SSD/Norm

Normalizing the patches by their L^2-norms renders the comparison robust to multiplicative contrast changes. The SSD/Norm is defined as

mbox{SSD/Norm}(mathbf p, mathbf q) :=    sum_{ mathbf t in B(0,r)}     left| frac{ u({mathbf p} + mathbf  t) } { | u|_{B({mathbf p},r)} | }   -     frac{ v({mathbf q} + mathbf t) } { | v|_{B({mathbf q},r)} | }    right|^2 , 

where textstyle | u|_{B({mathbf p},r)} |  =  sqrt{sum_{mathbf t in B(0,r)} left( u(mathbf p+mathbf t) right)^2}. Expanding the above expression we get to a correlation formula

mbox{SSD/Norm}(mathbf p, mathbf q) := 2 -    2     frac{ mbox{Prod}(mathbf p ,mathbf q) } {  | u|_{B({mathbf p},r)} | ,, | v|_{B({mathbf q},r)} | } , 

where mbox{Prod}(mathbf p ,mathbf q):=  sum_{mathbf t in B(0,r)}  u({mathbf p} + mathbf  t)  v({mathbf q} + mathbf t), which can be computed using integral images by taking d(a,b) = ab, and the terms | u|_{B({mathbf p},r)} | and | v|_{B({mathbf q},r)} | can be precomputed.

The implementation of this cost should prevent dividing by zero. Under this circumstance, the cost should be set to 2.

NCC (normalized cross correlation)

The Normalized Cross Correlation combines the benefits of ZSSD and SSDNorm as it is invariant to affine contrast changes. It is defined as

mbox{NCC}(mathbf p, mathbf q) :=  1 - mbox{Corr}(mathbf p, mathbf q) , 

with

mbox{Corr}(mathbf p, mathbf q) :=     frac{       frac1{|B(0,r)|}  sum_{mathbf t in B(0,r)}       left( u(mathbf p + mathbf  t) -  mu_u{(mathbf p)}right)        left( v(mathbf q + mathbf  t) -  mu_v{(mathbf q)}right)    }{    sqrt{ sigma^2left(  u|_{B({mathbf p},r)} right)   ,  sigma^2left(  v|_{B({mathbf q},r)} right)  }     } , 

and where sigma^2left( u|_{B({mathbf p},r)}right) is the sample variance of the block centered at mathbf p.

mbox{Corr} takes values in [-1,1], where 1 indicates the maximum correspondence. For a consistent notation across the different methods mbox{NCC} is defined as 1 - mbox{Corr}. The above expression can be written as

mbox{Corr}(mathbf p, mathbf q) :=    frac{   frac1{|B(0,r)|}  mbox{Prod}(mathbf p,mathbf q) -  mu_u{(mathbf p)}  ,cdot , mu_v{(mathbf q)}     }{     sqrt{ sigma^2left(  u|_{B({mathbf p},r)} right)   ,  sigma^2left(  v|_{B({mathbf q},r)} right)  }   } , 

which can be computed using one integral image (for mbox{Prod}) for each offset value (mathbf p-mathbf q), and where the terms mu and sigma^2 can be precomputed by integral images too!

The implementation of this cost must prevent dividing by zero. Under that circumstance the cost should be set to 1.

AFF (“affine” similarity measure)

The “affine” similarity measure proposed by Delon and Desolneux (2010) is also invariant to affine contrast changes, but differently from NCC it can distinguish flat patches from those containing edges. It can be seen from the definition of mbox{Corr} that if one of the patches is flat, then the correlation will be zero independently of the content of the second patch. In contrast, the “affine” similarity measure defined below can be non zero under the same circumstances:

mbox{AFF}(mathbf p, mathbf q) :=  max left( min_{alphage0, beta} left| U_{{mathbf p}} - alpha V_{{mathbf q}}  - beta right| ,                  min_{alphage0, beta} left|  V_{{mathbf q}} - alpha U_{{mathbf p}}  - beta right|  right)   , 

where U_{{mathbf p}}=u|_{B({mathbf p},r)}V_{{mathbf q}}=v|_{B({mathbf q},r)} and | cdot | denotes the L^2-norm. It can be explicitly computed by the formula

mbox{AFF}^2(mathbf p, mathbf q) :=  max left( sigma^2left(  u|_{B({mathbf p},r)} right)   ,  sigma^2left(  v|_{B({mathbf q},r)} right)  right)  cdot min left(1, 1- mbox{Corr}( {mathbf p}, {mathbf q} ) |    mbox{Corr}( {mathbf p}, {mathbf q} ) | right), 

which can be implemented with integral images after pre-computing sigma^2left(  u|_{B({mathbf p},r)} right) and sigma^2left(  v|_{B({mathbf q},r)} right), which in turn can be done with two integral images, one for the square of u and one for u.

LIN is a simpler variant of the AFF cost that drops the invariance to additive changes, but has a similar performance

mbox{LIN}(mathbf p, mathbf q) :=  max left( min_{alphage0} left| U_{{mathbf p}} - alpha V_{{mathbf q}}  right| ,                  min_{alphage0} left|  V_{{mathbf q}} - alpha U_{{mathbf p}} right|  right). 

It can also be implemented with integral images by re-writing it as

mbox{LIN}^2(mathbf p, mathbf q) :=  max(| U_{{mathbf p}}|^2 , | V_{{mathbf q}}|^2)     left( 1- frac{left[   mbox{Prod}( {mathbf p}, {mathbf q} )right]^2}{| U_{{mathbf p}}|^2  | V_{{mathbf q}}|^2} right). 

The implementation of these costs must prevent dividing by zero. Under that circumstance, the cost should be set respectively to the maximum variance or norm of the two patches.

Delon,J. and Desolneux,A. (2010). Stabilization of Flicker-like effects in image sequences through local contrast correction. SIAM Journal on Imaging Sciences, 3(4):703–734.

BTSAD and BTSSD (Birchfield & Tomasi sampling insensitive pixel dissimilarities)

By BTSAD or BTSSD, we mean an adaptation to SAD or SSD of the Birchfield and Tomasi (1998) sampling insensitive pixel dissimilarity. This matching cost, originally proposed for stereo matching, is designed to be insensitive to image sampling. For smooth (non aliased) images this cost is proven to be stable with respect to subpixel translations of the patches, while still evaluating the costs at integer positions.

This cost is particularly useful in combination with global block matching methods (Dynamic Programming for instance), where the dissimilarity is accumulated along scanlines, and where the subpixel computations are unaffordable. The insensitivity usually prevents the misclassification of some pixels as occlusions.

The usefulness of this matching cost for a ‘‘local’’ block matching method is less clear, since the minimum SAD/SSD cost may not change. Nevertheless it is worth comparing its performance with subpixel matching.

The matching costs proposed by Birchfield & Tomasi replace the definition of the pixel value distances with d^{BT}. For the case of the SSD we get

BTSSD(mathbf p,mathbf q) := sum_{mathbf t in B(0,r)} |d^{BT}( I_{L}({mathbf p} + mathbf  t), I_{R}({mathbf q} + mathbf  t))|^2, 

where the distance d^{BT} is defined as a symmetrization of the distance by

d^{BT}(I_{L}(mathbf p) , I_{R}(mathbf q)) =  min ( bar d ( I_{L}(mathbf p), I_{R}(mathbf q)), bar d ( I_{R}(mathbf q), I_{L}(mathbf p)) ), 

and where bar d is

bar d ( I_{L}(mathbf p), I_{R}(mathbf q))= max (0,I_{L}(mathbf p) - I^{max}_{R}(mathbf q),I^{min}_{R}(mathbf q) - I_{L}(mathbf p)), 
bar d ( I_{R}(mathbf q), I_{L}(mathbf p))= max (0,I_{R}(mathbf q) - I^{max}_{L}(mathbf p),I^{min}_{L}(mathbf p) - I_{R}(mathbf q)). 

The four precomputed images I^{max}_{R}, I^{min}_{R},I^{max}_{L}, I^{min}_{L} contain the maximum and minimum interpolated values of the image in a half pixel neighbor. For instance, with I_{R} we have

I^{min}_{R}(mathbf q) = min(I^-_{R}(mathbf q), I^+_{R}(mathbf q),I_{R}(mathbf q)) quad mbox{and} quad  I^{max}_{R}(mathbf q) = max(I^-_{R}(mathbf q), I^+_{R}(mathbf q),I_{R}(mathbf q)), 

where the interpolated values are computed by bilinear interpolation:

I^-_{R}(mathbf q) = frac12(I_{R}(mathbf q)+I_{R}(mathbf q-(1,0)^T)), quad I^+_{R}(mathbf q)  =frac12(I_{R}(mathbf q)+I_{R}(mathbf q+(1,0)^T)). 

Note that the maximum (and minimum) of the interpolated pixels I^{max} (resp. I^{min}) are only computed along the horizontal axis (definitions of I^+ and I^-). This is because the cost was originally proposed for stereo matching, so the subpixel differences are supposed to occur only along the horizontal axis. A straightforward extension of this cost for two dimensional block matching replaces the definition of I^{max} (resp. I^{min}) to consider subpixel offsets also in the vertical direction

I^{max}(mathbf q) = max  left{ hat I(mathbf q+ mathbf V) right}  quad mbox{with} quad mathbf V = left[-1/2,1/2right]^2, 

where hat I denotes the bilinear interpolation of the image I. For this interpolation the maximum (resp. minimum) will occur at one of nine possible positions, therefore

I^{max}(mathbf q) = max  left{ hat I(mathbf q+ mathbf s) : s in {-frac12, 0,frac12} times {-frac12, 0,frac12} right}. 

Birchfield,S. and Tomasi,C. (1998). A Pixel Dissimilarity Measure that is Insensitive To Image Sampling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(4):401-406.

Subpixel

This option permits to compute a subpixel disparity map. It is important to note that this is not a subpixel refinement step, the algorithm will consider all the subpixel disparities within the disparity range. This option is ignored when computing 2D displacement fields.

Output, Evaluation and Statistics

The output of the block matching methods always contain the following images:

  • disparity: Represents the computed disparity map using a grayscale coding. Next to this disparity map, another image is displayed for those stereo pairs that provide a ground truth. This second image (titled METHOD/error ) shows the differences between the computed disparity and the ground truth.

  • matching cost: Represents the minimum matching cost for each pixel, using a grayscale coding (black is 0).

  • back-projection: Is obtained by warping the secondary image using the displacement field varepsilon: mathbf Z^2 to mathbf R^2 computed from the first image to the second image (denoted u:mathbf Z^2 to mathbf R and v:mathbf Z^2 to mathbf R respectively). That is, the backprojection is

    widetilde v(mathbf x) = v( mathbf x + mathbf varepsilon(mathbf x)). 
    Warping v using the ground truth displacement field, produces an image that matches (except at the occlusions) the first image. Thus, comparing widetilde v with u permits to assess the quality of the displacement field. To facilitate the comparison the image back-projection/error shows the pixel-wise difference |u(mathbf x) - widetilde v(mathbf x)|, shown using a grayscale coding.

  • first/second image: The color range of poorly contrasted images is stretched for visualization.

When the ground truth for the stereo pair is available, the errors with respect to it are computed and shown. The errors are shown as images with fixed range from -4 to 4. In this case the following images are also shown as part of the output:

  • ground truth: Represents the ground truth using a grayscale coding.

  • evaluation mask: Is an optional input image that indicates which pixels are considered (white) in the computation of the statistics (usually the boundaries are not considered). The pixels discarded by the evaluation mask are not shown in the error displays, are not considered in the statistics and are also removed from the non-occluded mask (see below).

局部立体匹配算法BM中的匹配代价聚合方法_第1张图片 

Evaluation mask example

  • occlusion mask: Is a binary mask indicating which pixels are occluded (black), and therefore not considered for the computation of the statistics on Non Occluded areas. This mask is only present in the default input datasets of the demo.

局部立体匹配算法BM中的匹配代价聚合方法_第2张图片 

Occlusion mask example

  • Show |Err| > 1: This binary mask indicates the points where the ground truth and the computed disparity differ more than 1 pixel |mbox{GT}-varepsilon| > 1. These pixels are superimposed on the disparity map and painted in RED.

Statistics

The statistics consider two aspects of the results: the density and the precision with respect to the ground truth disparity. The errors are computed as the absolute difference between computed disparity values and the ground truth.

  • Density: It is the percentage of pixels that for which the algorithm has returned a disparity. This quantity does not consider the pixels removed by the evaluation mask.

  • Percentage of pixels with error > 1 in the Evaluation Mask: This quantity is computed considering the pixels in the evaluation mask.

  • Percentage of pixels with error > 1 in Occluded areas: Considers the occluded pixels (according to the mask) that are in the evaluation mask. Since the occluded regions cannot be seen in one of the images, this quantity evaluates how well the algorithm extrapolates the disparity map.

  • Percentage of pixels with error > 1 in NON Occluded areas: Considers the non-occluded pixels that are in the evaluation mask. This quantity is similar to Eval. Mask but is not contaminated by the errors on occluded regions.

你可能感兴趣的:(Stereo,Vision)