GNN + Semantic Segmentation

1 Semantic Object Parsing with Graph LSTM

——NUS ECCV2016(Spotlight)

1.1 任务描述

semantic object parsing aims to segment an object within an image into multiple parts with more fine-grained semantics and provide full understanding of image contents.

GNN + Semantic Segmentation_第1张图片

1.2 模型框架

GNN + Semantic Segmentation_第2张图片

adaptive graph topology(每张图像都是不同的graph topology)

(semantically consistent) node: arbitrary-shaped superpixel

edge: spatial neighborhood relations 空间相邻就连一条边

LSTM starting node: the superpixel node that has the highest predicted confidence across all the foreground semantic labels based on the initial features is regarded as the starting node

Updating order: ranking all the nodes according to their initial confidences on foreground classes in a descending order

GNN + Semantic Segmentation_第3张图片

Graph construction

Superpixel map: image over-segmentation using SLIC 线性迭代聚类( k均值聚类方法) averagely 1,000

feature maps需要upsample到原始图像大小,和superpixel map对应起来。

node初始特征f_i:同一个superpixel中所有像素特征的平均值

Graph LSTM

initial confidence maps

The confidence of each superpixel for each label: averaging the confidences of its contained pixels, and the label with highest confidence could be assigned to the superpixel

Node updating sequence: Among all the foreground superpixels (i.e., assigned to any semantic part label), the node updating sequence can be determined by ranking all the superpixel nodes according to the confidences of their assigned labels

Confidence-Driven Search表现好的原因: The CDS scheme can provide a relatively more reliable updating sequence for better semantic reasoning, since the earlier nodes in the updated sequence presumably have stronger semantic evidence (e.g., belonging to any important semantic parts with higher confidence) and their visual features may be more reliable for message passing.

Training

1. train the convolutional layer with 1 × 1 filters to generate initial confidence maps that are used to produce the starting node and the update sequence for all nodes in Graph LSTM

2. the whole network is fine-tuned based on the pretrained model to produce final parsing results

1.3 实验结果

PASCAL-Person-Part dataset: Head, Torso, Upper/Lower Arms and Upper/Lower Legs

Horse-Cow Parsing dataset: head, leg, tail and body

ATR dataset & Fashionista dataset: 18 labels: face, sunglass, hat, scarf, hair, upper-clothes, left-arm, right-arm, belt, pants, left-leg, right-leg, skirt, left-shoe, right-shoe, bag, dress and null

每个dataset都是的同一类物体,label是组成该物体的部分,比较符合医疗的分割

GNN + Semantic Segmentation_第4张图片

2 Interpretable Structure-Evolving LSTM可解释结构演化LSTM

——CMU CVPR2017

论文1的进阶版

2.1 主要思路

GNN + Semantic Segmentation_第5张图片
hierarchical graph structures
GNN + Semantic Segmentation_第6张图片
stochastic structure-evolving step

Metropolis-Hasting algorithm: stochastically merging some graph nodes by sampling their merging probabilities, and produces a new graph structure

This structure is further examined and determined according to a global reward defined as an acceptance probability. i) a state transition probability (i.e., a product of the merging probabilities); ii) a posterior probability representing the compatibility of the generated graph structure with task-specific observations.


GNN + Semantic Segmentation_第7张图片


2.3 实验结果

和论文1的比较

3 3D Graph Neural Networks for RGBD Semantic Segmentation

——CUHK ICCV2017 

RGBD = RGB + Depth Map 

2D appearance + 3D geometric information

GNN + Semantic Segmentation_第8张图片

Graph construction*

GNN + Semantic Segmentation_第9张图片
[x,y,z]是已知的还是[u,v]是已知的呢???

node: pixel

directed edges: K nearest neighbors (KNN) in the 3D space

Propagation Model

GNN + Semantic Segmentation_第10张图片

Prediction Model

back-propagation through time (BPTT) algorithm

4 总结

这是一种更加精细的语义分割

从dataset上看,每个dataset都是的同一类物体,label是组成该物体的部分,

horse-cow dataset 头和身体在颜色上没有明显的边界,但也能分出来,有点像多器官分割里的情况

比较符合医疗的分割



你可能感兴趣的:(GNN + Semantic Segmentation)