1 Places Dataset
7M images, 476 place categories.2 Density and Diversity
[Density] A good datase should have a high degree of data concentration. Given two databases A and B, let a 1 be a random image from set A and b 1 from set B. Let us take their respective nearest neighbors (computed by Gist descriptor) in each set, a 2 from A and b 2 from B. If A is denser than B, then it would be more likely that a 1 and a 2 are closer to each other than b 1 and b 2 .
[Diversity] A good datase should also have a high variability of appearances and viewpoints. If set A is more diverse than set B, then two random images from set B are more likely to be visually similar than two random samples from A.
Places dataset has similar density with ImageNet, but Places is more diverse.
4 Visualization of the Deep Features
Use AlexNet to train on ImageNet and Places, separately. The activations of each layer can be seen in Fig.
[CONV1 Visualization] Directly visualized the conv1 filters.
• Both network capture the oriented edges and opponent colors.
[Higher Layer Units Visualization] Forward pass the test images. Then for each unit, sort all these images based on the activation response. Finally average the top 100 images with the largest responses for each unit as a kind of visualization.
• Units in ImageNet-CNN look like object-blobs.
• Units in Places-CNN look like landscapes with more spatial structures.
5 Experiments
The Places-CNN feature shows impressive performance on scene classification benchmarks, outperforming the current state-of-the-art methods. On the other hand, the ImageNet-CNN feature shows better performance on object-related databases.
Additionally, we train a Hybrid-CNN, by combining the training set of Places-CNN and training set of ImageNet-CNN. Combining the two datasets yields an additional increase in performance for a few benchmarks, but not all of them.