[深度学习论文笔记][Scene Classification] Learning Deep Features for Scene Recognition using Places Database

Zhou, Bolei, et al. “Learning deep features for scene recognition using places database.” Advances in neural information processing systems. 2014.


1 Places Dataset

7M images, 476 place categories.
• Places205: 5k images, 205 place categories.
• Places88: 88 common categories also contained in ImageNet.


2 Density and Diversity

[Density] A good datase should have a high degree of data concentration. Given two databases A and B, let a 1 be a random image from set A and b 1 from set B. Let us take their respective nearest neighbors (computed by Gist descriptor) in each set, a 2 from A and b 2 from B. If A is denser than B, then it would be more likely that a 1 and a 2 are closer to each other than b 1 and b 2 .


[Diversity] A good datase should also have a high variability of appearances and viewpoints. If set A is more diverse than set B, then two random images from set B are more likely to be visually similar than two random samples from A.


Places dataset has similar density with ImageNet, but Places is more diverse.


3 Motivation
Model pre-trained on ImageNet, then transfered to scene recognition does not work well.
• The higher-level features learned by object-centric versus scene-centric CNNs are different.
• Iconic images of objects do not contain the richness and diversity of visual information that pictures of scenes and environments provide for learning to recognize them.


4 Visualization of the Deep Features

Use AlexNet to train on ImageNet and Places, separately. The activations of each layer can be seen in Fig.

[深度学习论文笔记][Scene Classification] Learning Deep Features for Scene Recognition using Places Database_第1张图片

[CONV1 Visualization] Directly visualized the conv1 filters.
• Both network capture the oriented edges and opponent colors.


[Higher Layer Units Visualization] Forward pass the test images. Then for each unit, sort all these images based on the activation response. Finally average the top 100 images with the largest responses for each unit as a kind of visualization.
• Units in ImageNet-CNN look like object-blobs.
• Units in Places-CNN look like landscapes with more spatial structures.

5 Experiments
The Places-CNN feature shows impressive performance on scene classification benchmarks, outperforming the current state-of-the-art methods. On the other hand, the ImageNet-CNN feature shows better performance on object-related databases.


Additionally, we train a Hybrid-CNN, by combining the training set of Places-CNN and training set of ImageNet-CNN. Combining the two datasets yields an additional increase in performance for a few benchmarks, but not all of them.

你可能感兴趣的:(CNN,Papers)