Weakly Supervised Deep Detection Networks 阅读笔记

Weakly Supervised Deep Detection Networks 阅读笔记

Overall architecture

 

1. Existing network(such as AlexNet pre-trained on ImageNet)

2. SPP --> region level descriptor

3. (1) class score --> recognition

(2) probability distribution(which region contains the most salient image structure) --> detection

4.  aggregate the recognition and detection scores to predict the class of image(image level supervision)

Compared with other method

1. MIL: Use the appearance model itself to perform region selection

WSDDN: detection branch is independent of recognition branch

2. Bilinear architecture: two streams are symmetric

WSDDN: detection branch is explicitly designed

Method

1. Pre-trained network

2. Weakly supervised deep detection network

 

(1) Region level descriptor:


Region proposal: SSW, EB

(2) Classification data stream: fc + softmax

 

(3) Detection data stream: fc + softmax(differently defined)

 

(4) Combined region scores and detection

Final score of each region:

 

Then rank regions for each class independently.

Then apply nms(0.4)

(5) Image-level classification scores

Image level class score:


(yc in (0, 1))

Softmax is not applied because one image can have multiple label

3. Training WSDDN

A collection of images xi, i=1, 2, … , n

Image level labels yi∈ {-1, 1}C

 

4. Spatial regulariser

Penalize the feature map discrepancies between the highest scoring region and the regions with at least 60% IoU during training.

 

Experiments

CorLoc: the percentage of images that contain at least one instance of the target object class for which the most confident detected bounding box overlaps by at least 0.5 with one of these instances.

 

Problem: (1) group multiple object instances with a single bounding box

(2)focus on parts rather than the whole object

 

Result:

 

 






你可能感兴趣的:(论文阅读)