noisy students: 一种半监督学习方法

noisy students: 一种半监督学习方法_第1张图片

The difference with knowledge distillation :

(You can call noisy student as knowledge expansion )

  1. the student network is no smaller than the teacher.

  2. inject noise

    For input noise:

    use data augmentation with RandAugment [18]. (Data augmentation is an important noising method)

    For model noise:

    we use dropout [76] and stochastic depth [37].

consistency training:

the teacher model, which has not converged thus can not get high accuracy, generates the psudo labels.
consistency training regularizes the model towards high entropy predictions, and prevents it from achieving good accuracy. To improve it, additional hyperparameters are introduced, which makes the mothed difficult to use at scale (大规模)

The experiment relies on a Cloud TPU v3 Pod, which has 2048 cores Our
largest model, EfficientNet-L2, needs to be trained for 6 days, if the
unlabeled batch size is 14x the labeled batch size

To be continued…

你可能感兴趣的:(目标检测经典论文笔记)