文章题目:《An Adversarial Approach to Discriminative Modality Distillation for Remote Sensing Image Classification》
文章引用格式:Shivam Pande, Avinandan Banerjee, Saurabh Kumar, Biplab Banerjee, Subhasis Chaudhuri. "An Adversarial Approach to Discriminative Modality Distillation for Remote Sensing Image Classification." International Conference on Computer Vision (ICCV), 2019
We deal with the problem of modality distillation for the purpose of remote sensing (RS) image classification by exploring the deep generative models. From the remote sensing perspective, this problem can also be considered in line with the missing bands problem frequently encountered due to sensor abnormality. It is expected that different modalities provide useful complementary information regarding a given task, thus leading to the training of a robust prediction model. Although training data may be collected from different sensor modalities, it is many a time possible that not all the information are readily available during the model inference phase. This paper tackles the problem by proposing a novel adversarial training driven hallucination architecture which is capable of learning discriminative feature representations corresponding to the missing modalities from the available ones during the test time. To this end, we follow a teacher-student model where the teacher is trained on the multimodal data (learning with privileged information) and the student model learns to subsequently distill the feature descriptors corresponding to the missing modality. Experimental results obtained on the benchmark hyperspectral (HSI) datasets and another dataset of multispectral (MS)-panchromatic (PAN) image pairs confirm the efficacy of the proposed approach. In particular, we find that the student model is consistently able to surpass the performance of the teacher model for HSI datasets.
对于传统多模态而言,如果测试集中出现的模态数据但是训练集中没有出现过,那就有可能使得模型失效,而这在遥感中是非常常见的。解决方法有两个,一个是在训练集和测试集中同时加上这类数据,另一个是利用隐藏信息,来近似测试集中的缺失数据,这种思路被称为模态蒸馏modality distillation through hallucination。因此本文就提出一种蒸馏结构,用于两种不同的遥感图像分类场景:MS-PAN和VHR。
据作者所知,目前只有一篇RS文章做了知识蒸馏,然而这篇工作的重构损失reconstruction loss却不能在不同模态的数据上很好的学习到数据分布。
本文主要解决的问题是teacher-student模型中的判别模态蒸馏问题。teacher网络用了多层分类器,设计为一个multi-stream网络,其中每一个stream都关注于某一确定模态下的判别特征的学习。然后后面连接一个基于CGAN的hallucination模型,来根据已有模态生成缺失模态下的数据。最后作者又设计了一个student网络,将可用的模态和生成的虚假(hallucination)的模态同时输入到网络中,进行分类。训练过程中,除了知识蒸馏KD(knowledge distillation),还有一个模式倒塌的问题。
• We introduce a novel teacher-student based modality distillation framework for RS image classification where a novel C-GAN based cross-modality mapping module is proposed. We also consider the KD technique to ensure that the student’s classifier does not diverge too much from the teacher’s classifier. 提出了一个teacher-student的模态蒸馏网络,用于遥感图像分类。并且student模型并没有与teacher模型差距很多。
• We perform data augmentation through noise perturbation on the teacher’s training samples in order to train the hallucination and student models. 为了训练模型,使用了数据增强(用了噪声干扰的方式)
• We perform extensive experiments on HSI classification and RS scene classification using MS-PAN image pairs where improved results can be observed. 对HSI分类和MS-PAN场景分类,表明该算法的有效性。
遥感图像分类(RS image classification):太多了,有用多光谱MS数据做的,还有用SAR,LiDAR,HSI,PAN数据做的都有。目前已经有相关研究是用CNN做的,当然这些工作都是基于单模态来做的。多模态方法也有,比如Pan-Sharpening GAN(PSGAN),CNN+CGAN,OrthoSeg(三个模态,RGB,DSM,NIR),modified Squeeze and Excitation Networks。
隐藏信息学习(Learning under privileged information (LUPI)):LUPI可以用在很多领域,比如非监督学习(unsupervised learning),度量学习(metric learning),目标定位object localization,人脸检测face detection,表达识别expression recognition。
使用模态蒸馏的隐藏信息学习(LUPI with modality distillation):有人将LUPI+CNN用于分类,姿势识别等,这些都是使用的多模态数据。
有何不同(How are we different):最接近的两篇文章如下:
假设有两个模态x1和x2,对应的标签为y,土地利用类型有C类,teacher网络是非常繁杂的,它的训练使用了隐藏信息(privileged information)。它的网络结构如下图所示:
(2)使用CGAN获得虚假的模态(Modality hallucination using C-GAN)
Multispectral-panchromatic dataset:4个MS+一个PAN波段,一共8万张,数据来自GF-1,MS大小为64×64×4,PAN大小为256*256,如下图:
Indian Pines hyperspectral dataset:145×145×200,分辨率20m,如下图:
Houston hyperpectral dataset:1905 × 349 × 144(bands),分辨率2.5m,共144类。如下图: