Semantic Soft Segmentation
Introduction
Selection and composition are at the core of the image editing process
- Disadvantages of traditional methods
- Tools such as the magnetic lasso and the magic wand exist to assist users but they only exploit low-level cues and heavily rely on the users’ skills and interpretation of the image content to produce good results.
- they only produce binary selections that need further refinement to account for soft boundaries.
- Several criteria: An accurate pre-segmentation of the image can speed up the editing process by providing an intermediate image representation
- Such a segmentation should provide distinct segments of the image, while also representing the soft transitions between them accurately.
- Each segment should be limited to the extent of a semantically meaningful region in the image.
- The segmentation should be done fully automatically not to add a point of interaction or require expertise from the artist.
- Project core methods
- We approach the semantic soft segmentation problem from a spectral decomposition angle. We combine the texture and color information from the input image together with high-level semantic cues that we generate using a convolutional neural network trained for scene analysis.
- We design a graph structure that reveals the semantic objects as well as the soft transitions between them in the eigenvectors of the corresponding Laplacian matrix.
- We introduce a spatially varying model of layer sparsity that generates high-quality layers from the eigenvectors that can be utilized for image editing.
- Conclusion
- We demonstrate that our algorithm successfully decomposes images into a small number of layers.
- We later show that our algorithm can successfully process images that are challenging for other techniques and we provide examples of editing operations
Related Work
soft segmentation
Soft segmentation is decomposing an image into two or more segments where each pixel may belong partially to more than one segment. The layer contents change depending on the specific goal of the corresponding method.
- Soft color segmentation methods extract soft layers of homogeneous colors using global optimization[Singaraju and Vidal 2011; Tai et al. 2007;Tan et al. 2016] .
- per-pixel color unmixing [Aksoy et al. 2016, 2017b]
- To generate spatially connected soft segments, Sin- garaju and Vidal [2011] start from a set of user-defined regions and solve two-layer soft segmentation problems multiple times to generate multiple layers.
- Levin et al. [2008b], propose spectral matting, estimating a set of spatially connected soft segments automatically via spectral decomposition.
- Vidal [2011] and Levin et al. [2008b] construct their algorithms around the matting Laplacian [Levin et al. 2008a].
Our work
We also make use of the matting Laplacian and spectral decomposition, following ideas from spectral matting. However, unlike previous work, we construct a graph that fuses higher-level information coming from a deep network with the local texture information in order to generate soft segments that correspond to semantically meaningful regions in the image.
Natural image matting
Natural image matting is the estimation of per-pixel opacities of a user-defined foreground region. The typical input to natural matting algorithms is a trimap, which defines the opaque foreground, the transparent background, and the unknown- opacity regions.
- the most closely-related approaches to ours are categorized as affinity-based methods.
- define inter-pixel affinities to construct a graph that reflects the opacity transitions in the image.
Our work
In contrary to natural image matting methods, we rely on automatically-generated semantic features in defining our soft segments instead of a trimap, and generate multiple soft segments rather than foreground segmentation.
Targeted edit propagation
Several image editing methods rely on user-defined sparse edits on the image and propagate them to the whole image.
- ScribbleBoost [Li et al. 2008] proposed a pipeline where they classify the objects specified by the user scribbles to allow edits targeting specific object classes in the image.
- DeepProp [Endo et al. 2016] utilized a deep network to propagate class-dependent color edits.
- Eynard et al. [2014] constructs a graph and, parallel to our method, analyze the eigendecomposition of the corresponding Laplacian matrix to create coherent recoloring results
- An and Pellacini [2008] and Chen et al. [2012] also define inter-pixel affinities and make use of the properties of the Laplacian matrices to solve for plausible propagations of the user-defined edits.
Our work
While our results can also be used for targeted edits, rather than using edits defined a priori, we directly decompose the image into soft segments and let the artist use them as an intermediate image representation in various scenarios and using external image editing tools.
Semantic Segmentation
Semantic segmentation has improved significantly with the introduction of deep neural networks, state-of-the-art in semantic segmentation include works on scene parsing by Zhao et al.
- Scene parsing by Zhao et al. [2017].
- Instance segmentation methods by He et al. [2017], Fathi et al. [2017]
- Bertasius et al. [2016] enhances semantic segmentation with color boundary cues.
Our work
Our soft segmentation method is class-agnostic, i.e. we are interested in an accurate segmentation of the image respecting semantic boundaries, but we do not aim to do classification or detection of a selected set of classes.
Method
Additive image formation model:
( R , G , B ) i n p u t = ∑ i α i ( R , G , B ) i (R,G,B)_{input}=\sum_i\alpha_i(R,G,B)_i (R,G,B)input=∑iαi(R,G,B)i
∑ i α i = 1 \sum_i\alpha_i=1 ∑iαi=1
Each pixel of each layer is augmented with an opacity value α ∈ [0, 1] with α = 0 meaning fully transparent, α = 1 fully opaque. We express the input RGB pixels as the sum of the pixels in each layer i weighted by its corresponding α value. We also constrain the α values to sum up to 1 at each pixel, representing a fully opaque input image.
Our approach uses the same formalism as spectral matting in formulating the soft segmentation task as an eigenvector estimation problem [Levin et al. 2008b]. The core component of this approach is the creation of a Laplacian matrix L that represents how likely each pair of pixels in the image is to belong to the same segment. We describe how to augment this approach with nonlocal cues and high-level semantic information.
Background
Spectral matting
a r g m i n y i ∑ i , p ∣ α i p ∣ γ + ∣ 1 − α i p ∣ γ \underset{y_i}{\mathrm{argmin}}\sum\limits_{i,p} \vert{\alpha_{ip}|^\gamma}+\vert{1-\alpha_{ip}|^\gamma} yiargmini,p∑∣αip∣γ+∣1−αip∣γ with: α i = E γ i \alpha_i=E\gamma_i αi=Eγi
Subject to: ∑ i α i p = 1 \sum\limits_i\alpha_{ip}=1 i∑αip=1