TITLE: Object Detection by Labelling Superpixels
AUTHOR: Yan, Junjie and Yu, Yinan and Zhu, Xiangyu and Lei, Zhen and Li, Stan Z.
FROM: CVPR2015
SOME DETAILS
The energy function is conducted as
where D(li,pi) is the data cost to capture the appearance of pi and measure its cost of belonging to label li , V(li,lj,pi,pj) is the pairwise smooth cost in the local area N and C(L) is the label cost to encourage compact detection and to punish the number of labels.
Data Cost
Super-pixels usually does not have enough semantic information, so corresponding regions are classified and their costs are propagated to super-pixels. In this work, RCNN is used to generate and classify semantic regions. The region set of T elements is denoted as R={r1,..,rT} and the classifier score is st , thus we can map the scores into (0,1) by
where α is set to 1.5 empirically. For each super-pixel the data cost is the weighted sum of T smallest costs,
where R(pi)t is the region pi belongs to with the t -th smallest cost.
Smooth Cost
The smooth cost is conducted for the reason that 1) adjacent super-pixels often have the same label and 2) super-pixels belonging to the same label should have similar apprearance. This attribute is measured by
where Vl is a boolean variable and is set to 1 when li=lj and (pi,pj)∈N . Va is defined as
where cqi and tqi are the values in the q -th bin of color and texture histogram of super-pixel pi . In this work color histogram and SIFT histogram are calculated to describe color and texture information.
Label Cost
The label cost is used to encourage less number of labels and its defination is
where δ(⋅) is defined as