SCAN: learning to classify images without labels 阅读笔记

SCAN: learning to classify images without labels 阅读笔记

  • 概览
  • 具体方法
  • 实验设置

没用把论文的图片和公式放进来,太懒了SCAN: learning to classify images without labels 阅读笔记_第1张图片

概览

Approach: A two-step approach where feature learning and clustering
are decoupled.

Step 1:
Solve a pretext task + Mine k nearest neighbrs
通过利用特征相似性来挖掘每张图片的最近邻居nearest neighbors,研究发现这些nearest neighbors很大概率属于同一种类别same semantic class,暗示着可以用他们来进行semantic clustering

Step 2:
Train clustering model by imposing consistent predictions among neighbors.
使用损失函数来最大化每张图像和其mined neighbors之间的点积 dot product,使得network产生consistent and discriminative的独热预测,实现对图片和mined neighbors的分类。

具体方法

  • First, we show how mining nearest neighbors from a pretext task can be used as a prior for semantic clustering.
  • Second, we integrate the obtained prior into a novel loss function to classify each image and its nearest neighbors together.
  • Finally,使用自打标self-labeling方法来减轻临近点选择neighbors selection带来的噪声影响。

第一步、定义pretext task: 最小化原图和其增强数据之间的距离,得到semantically meaningful features。使用最小化距离的方式是为了使得到的feature representation对图像的变化【数据增强的方式】具有不表性,i.e. feature representations ought to be invariant to image transformations。

第二步、 a semantic clustering loss
挖掘nearest neighbors

  • 传统 K-means 方法会导致团簇简并cluster degeneracy。 简并的意思是这样的吧:A discriminative model can assign all its probability mass to the same cluster when learning the decision boundary. This leads to one cluster dominating the others.
  • loss function: 学习一个clustering function ,其可以对某一图片xi和其近邻集合进行分类。The function terminates in a softmax function to perform a soft assignment over the clusters.
  • 聚类过程团簇数量的设置: Remember that, the exact number of clusters in C is generally unknown. However, similar to prior work, we choose C equal to the number of ground-truth clusters for the purpose of evaluation. we can overcluster to a larger amount of clusters, and enforce the class distribution to be uniform.
  • Implementation details:
· 损失函数需要large batches,也就是需要sampling较多的数据以approximate the dataset statistics。
· 训练时,随机增强样本Xi和它的neighbors
· 对于K的取值,K=0时,only consistency between samples and theiraugmentation is posed。
· 对于K>=1,则考虑了更多的neighbors,但由于不是所有的neighbors都属于同一类,所以这也引入了噪声。

第三步、通过自打标的微调 fine-tuning through self-labeling

The semantic clustering loss imposed consistency between a sample and
its neighbors. More specifically, each sample was combined with K ≥ 1
neighbors, some of which inevitably do not belong to the same semantic
cluster. These false positive examples lead to predictions for which
the network is less certain.

At the same time, we experimentally observed that samples with highly confident predictions (pmax ≈ 1) tend to be classified to the proper cluster. In fact, the highly confident predictions that the network forms during clustering can be regarded as “prototypes” for each class。

In particular, during training confident samples are selected by thresholding the probability at the output, i.e. pmax > threshold. For every confident sample, a pseudo label is obtained by assigning the sample to its predicted cluster. A cross-entropy loss is used to update the weights for the obtained pseudo labels. To avoid overfitting, we calculate the cross-entropy loss on strongly augmented versions of the confident samples.

实验设置

数据集
小数据集:CIFAR10, CIFAR100, STL10
大数据集:ImageNet

训练设置:

  1. ResNet-18
  2. for every sample, 20 nearest neighbors are determined through an instance discrimination task based on noise contrastive estimation (NCE).
  3. SimCLR 执行 instance discrimination task on 小数据集;
    MoCo on ImageNet.

你可能感兴趣的:(人工智能,无监督学习)