Lin Yao
The learning of " Hierarchical Image Saliency Detection on Extended CSSD" (J. Shi, Q. Yan, Li Xu, and J. Jia, IEEE Transactions on Pattern Analysis and Machine Intelligence, Apr. 2016, pp.717-729)
Background
When objects contain salient small-scale patterns, saliency could generally be misled by their complexity. Aiming to solve this notorious and universal problem, this paper proposes a hierarchical framework, to analyze saliency cues from multiple levels of structure, and then integrate them for the final saliency map through hierarchical inference.
Model
Use the Gradient Magnitude as the Segmentation Function, The gradient is high at the borders of the objects and low (mostly) inside the objects. Segment the image to get initial map by using the watershed transform.
Where Ix indexes the gradient in the horizontal direction, and Iy indexes the gradient in the vertical direction.
Dt=M-kt⋆M , where kt indexes a mean filter of size t×t, and y indexes pixels. It is based on the observation that if all the label values for region Ri in M are altered after the convolution, Ri cannot encompass kt . Thus, the scale of the region is smaller than t.
Where ci and cj are Lab colors of regions Ri and Rj respectively. wRj counts the number of pixels in Rj. Regions with more pixels contribute higher local-contrast weights than those containing only a few pixels. ϕ(i,j) is set to exp-DRi,Rj/σ2 controlling the spatial distance influence between two regions i and j. Close regions have larger impact than distant ones.
Where x0,x1… indexes the set of pixel coordinates in region Ri, and xc is the coordinate of the image center. Hi makes regions close to image center have large weights
si'=Ci⋅Hi
Since the local contrast and location cues have been normalized to range [0, 1), their importance is balanced by λ, set to 9 in general. So obtain initial saliency maps separately.
For a node corresponding to region i in layerLk, we define a saliency variable sik. Minimize the following energy function for the hierarchical inference.
Es=kiEDsik+kij,Rik⊆Rjk+1EHsik,sjk+1+kij,Rik∈Α(Rik)ECsik,sjk
The energy consists of three parts.
Where βk controls the layer confidence and sj-k is the initial saliency value si'.
Where λk controls the strength of consistency between layers. Rik⊆Rjk+1
The last term is a local consistency term, which enforces intra-layer smoothness. It is used to make saliency assignment smooth between adjacent similar regions.
Where cik and cjk are mean colors of respective regions.
Energy function including these three terms considers multi-layer saliency cues, making final results have less errors occurred in each single scale.
Optimization
Adopt common loopy belief propagation for optimization. It starts from an initial set of belief propagation messages, and then iterates through each node by applying message passing until convergence.
Iterative formula:
The message passed from region Ri to an adjacent region Rj at the τ times iteration
mi→jτsj=minsiEDsi+ECsi,sj+p∈NRi\jmp→iτ-1(si)
Set NRi contains connected region nodes of Ri, including inter- and intra-layer ones.
If Riand Rj are regions in different layers
mi→jτsj=minsiEDsi+EHsi,sj+p∈NRi\jmp→iτ-1(si)
After message passing converges atτ times iteration, the optimal value of each saliency variable can be computed via minimizing its belief function.
sj*=minsjEDsj+p∈NRimp→jτ-1(sj)
Finally, collect the saliency variables in layer L1 to compute the final saliency map.
Analysis
Proposed method achieves high performance and broadens the feasibility to apply saliency detection to more applications handling different natural images. To a certain extent, it has some effects on the foreground and the background clutter. But still cannot get good results in similar foreground and the background. Moreover, when the region is merged, the threshold value is too large, and a large amount of information loss is easy to occur.