【论文笔记11】 Deep Domain Adaptation With Differential Privay 差分隐私下的深度域泛化, IEEE TIFS 2020

目录导引

  • 系列传送
    • Abstract
    • Intro & Illustration
      • Basic Intro
      • Intro of their work
    • DPDA Model
      • Problem Statement
      • Framework
    • Experiments
  • Reference

系列传送

我的论文笔记频道

【Active Learning】
【论文笔记01】Learning Loss for Active Learning, CVPR 2019
【论文笔记02】Active Learning For Convolutional Neural Networks: A Core-Set Approch, ICLR 2018
【论文笔记03】Variational Adversarial Active Learning, ICCV 2019
【论文笔记04】Ranked Batch-Mode Active Learning,ICCV 2016

【Transfer Learning】
【论文笔记05】Active Transfer Learning, IEEE T CIRC SYST VID 2020
【论文笔记06】Domain-Adversarial Training of Neural Networks, JMLR 2016
【论文笔记10】A unified framework of active transfer learning for cross-system recommendation, AI 2017
【论文笔记14】Transfer Learning via Minimizing the Performance Gap Between Domains, NIPS 2019

【Differential Privacy】
【论文笔记07】A Survey on Differentially Private Machine Learning, IEEE CIM 2020
【论文笔记09】Differentially Private Hypothesis Transfer Learning, ECML&PKDD 2018
【论文笔记11】 Deep Domain Adaptation With Differential Privay, IEEE TIFS 2020
【论文笔记12】Differential privacy based on importance weighting, Mach Learn 2013
【论文笔记13】Differentially Private Optimal Transport: Application to Domain Adaptation, IJCAI 2019

【Model inversion attack】
【论文笔记08】Model inversion attacks that exploit confidence information and basic countermeasures, SIGSAC 2015

Abstract

第一个出发点是为什么要用迁移学习;第二个出发点自然是为什么要用到差分隐私。

Intro & Illustration

Basic Intro

  • To achieve better generalization and higher performance
  • large-scale datasets are required
  • 1 labor-intensive and time-consuming 2 difficult to apply a trained model to unseen data
  • Motivate the development of unsupervised domain adaptation

DA aims to map both the source-domain data and target-domain data into a common feature space and make the classifier trained from the source domain capable of classifying the unlabeled target-domain data.

  • privacy concern ← \leftarrow hospitals and schools (the owners of target data)
    【论文笔记11】 Deep Domain Adaptation With Differential Privay 差分隐私下的深度域泛化, IEEE TIFS 2020_第1张图片This picture illustrates a practical scenario: a hospital reveices unlabeled medical data of different users and expects to make a classification or prediction on these data. Since the hospital only has few labeled data (那这个算unsupervised DA还是semi-supervised DA), it needs the help of cloud server where has a large amount of labeled medical data.
  • The hospital sends its unlabeled medical data to the server.
  • The cloud server trains the model with the unlabeled target-domain data from the hospital and its own source-domain data together by a domain adaptation method.
  • The trained model is sent back to the hospital and accessible to users.

Under attack, trained models may leak information about the ownership of the training data.

【保护谁】保护医院上传的数据的隐私,也就是 target data. 参考我后面那个DPOT的笔记,这两篇保护的对象还不一样呢。

Differential Privacy has been used to solve this problem. Its key idea is to add noise to the data for privacy protection. 这里提到了两篇文章做过迁移学习隐私保护。一片提到的是 DP hypothesis transfer learning, 另一篇是 Yao et al.的,exploited feature-wise partitioned stacking to enhance privacy-preserving logistic regression, and combined with hypothesis transfer elarning to enable learning across different organizations.

【还有,如果我没记错的话上面两篇以及instance weighting都是非deep model,但是这篇是deep model】

Intro of their work

In this paper, they propose a novel framework for deep domain adaptation with a strong privacy guarantee and low utility loss. Specifically, the DPDA task mainly includes two phases:

  • The model is trained by the labeled data using a traditional optimization algorithm.
  • The unlabeled target data is exploited to obtain domain-invariant features by an adversarial learning strategy. [The shallow layers of the NNs are trained in a fine-tuning manner, and the first few layers are frozen to preserve the efficacy in DA.]

How to achieve DP?

DPDA They only add noise into the specified gradients(gaussian noise) in model, i.e., in the training process on unlabeled target data, which can reduce the privacy cost. (认为有标记数据往往是公开采集的,没必要保护) 同时,为了减少privacy loss casued by running a large number of epochs, they use the RDP accountant to track the detailed information of the privacy loss and train their model within a modest privacy budget.

G-DPDA Global Differentially Private Domain Adaptation. 这帮人把能做的都做了……这个版本的同时保护target and source domains的privacy.

【插播文章中所提到的first贡献】

  • 他提到了很多Differential Private Deep Learning的方法,说他们only focus on supervised learning while domain adaptation considers unsupervised learning.(Related Work最后)
  • They propose a new and novel framework to protect sensitive private training data in domain adaptation. To their best knowledge, they are the first to provide a feasible可行的 solution for privacy-assured domain adaptation. (正经介绍Contributions的第一点)

DPDA Model

Problem Statement

target domain(unlabeled), source domain(adequate and labeled)
ultimate goal train a model which can predict the labels of the target-domain data while preventing the privacy leakages of the sensitive training data.

Framework

【论文笔记11】 Deep Domain Adaptation With Differential Privay 差分隐私下的深度域泛化, IEEE TIFS 2020_第2张图片

Experiments

这篇文章推出的是一个网络结构,对比的对象包括CNN, GFK, TCA, D-CORAL, DDC, DAN, DANN.
包含自己对比DPDA和G-DPDA.
另外测试不同的训练策略,通过调整frozen layers的数量。

测试数据集:

  • Office=31
  • Amazon review dataset

Reference

[1] Wang, Qian, et al. “Deep domain adaptation with differential privacy.” IEEE Transactions on Information Forensics and Security 15 (2020): 3093-3106.

你可能感兴趣的:(论文笔记,深度学习,机器学习,迁移学习,数据安全,神经网络)