CVPR2020 The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

论文下载地址:
https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_The_Secret_Revealer_Generative_Model-Inversion_Attacks_Against_Deep_Neural_Networks_CVPR_2020_paper.pdf

摘要:
This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data. Since its first introduction by [7], such attacks have raised serious concerns given that training data usually contain privacy-sensitive information. Thus far, successful
model-inversion attacks have only been demonstrated on simple models, such as linear regression and logistic regression. Previous attempts to invert neural networks, even the ones with simple architectures, have failed to produce convincing results. We present a novel attack method, termed the generative model-inversion attack, which can invert deep neural networks with high success rates. Rather than reconstructing private training data from scratch, we leverage partial public information, which can be very generic, to learn a distributional prior via generative adversarial networks (GANs) and use it to guide the inversion process. Moreover, we theoretically prove that a model's predictive power and its vulnerability to inversion attacks are indeed two sides of the same coin—highly predictive models are able to establish a strong correlation between features and labels, which coincides exactly with what an adversary exploits to mount the attacks. Our extensive experiments demonstrate that the proposed attack improves identification accuracy over the existing work by about 75% for reconstructing face images from a state-of-the-art face recognition classifier. We also show that differential privacy, in its canonical form, is of little avail to defend against our attacks.

模型逆向(model inversion)研究的是利用训练好的模型,来推断出训练集数据。之前的研究,逆向的模型都是一些诸如线性回归和逻辑回归此类的简单模型,而对于神经网络模型,即使简单的网络结构,逆向的结果也比较差。这篇文章将人脸识别的模型作为攻击对象,利用一些公开的数据集作为对于需要逆向获取的训练数据集数据的先验信息,利用这些先验信息训练对抗生成网络((GAN),然后用训练的对抗生成网络(GAN)来引导逆向的过程。具体的问题是,给定一些公开的数据集,(或者有些需要推测数据集的模糊或者遮挡版本),需要训练模型,从而达到,给定数据标签y,能够推断出标签y所对应的图片数据x。



本文的方法如图所示,整个训练过程分为两个阶段,第一个阶段,用公开的数据训练GAN。训练GAN的loss项,包含了Wasserstein GAN的loss项,和一个多样性loss:


Wasserstein GAN loss

diversity loss

除了公开数据集数据集(比如人脸数据集等等,跟需要逆向模型任务相关的数据集)之外,文中假设如果还能拿到需要推测的数据集的模糊版,可以作为辅助信息,也输入GAN中,用来训练逆向模型。

在第二阶段,是隐私揭露阶段,其实就是将训练数据集还原的阶段。这个阶段,问题变成了优化问题,需要找到的是输入GAN的噪声z,迭代优化z,然后让z生成的图片x,图片x满足:1)能骗过分辨器(生成的图片比较真实),2)能在需要逆向的模型中(本文处理的是人脸分类模型),在指定的数据标签y下获得较高的分类概率。
文中在模型逆向的方法中,探索了模型的预测能力,与其对于模型逆向方法的脆弱性(容易被逆向方法进行逆向)之间关系的探究:主要是探究,敏感特征和不敏感特征,对于预测能力的不同贡献。

文中大多数实验,还是着重研究在敏感信息被遮挡情况下,文中提出的模型方法,能够利用这些辅助信息(敏感信息被遮挡的图片),对于训练数据集的数据进行了还原。


你可能感兴趣的:(CVPR2020 The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks)