作为综述,本文提供了很多写作素材。
原文标题:A Comprehensive Survey of Privacy-preserving Federated Learning: A Taxonomy, Review, and Future Directions
本文不局限于介绍了以下内容:
(1) 高效学习:improving the efficiency and effectiveness of FL [24, 68, 87, 100, 120, 211, 213], (2) 抵御攻击 improving the security of FL to attacks that aim to undermine the integrity of FL models and degrade the model performance [9, 13, 185, 198], and (3) 保护训练集隐私 improving the privacy preservation of FL on private user data and avoiding privacy leakages [116, 137, 144, 193, 199].
包括server初始化、client本地训练、global aggregation
根据 data partitioning,可以分为 纵向FL、横向FL、混合FL(这一种对应的是联邦迁移学习)
这篇文章用符号表示的不错。
分类
Based on the categorization of transfer learning [136], FTL methods can be divided into three categories [205]: instance-based FTL, feature-based FTL, and parameter-based FTL.
聚合 gradient or weight? 优劣分析
gradient:更加容易收敛
weight:不需要每一轮都聚合一次,稍微安全一些
介绍了主流的保护机制,并进行了优劣比较。
主要有:homomorphic encryption, secret sharing, and secure multi-party computation.
homomorphic encryption 还可以分为 partially homomorphic encryption (支持 additively homomorphic encryption [88] 或 multiplicative homomorphic encryption,不可以兼得) [134] and fully homomorphic encryption (supports both additive and multiplicative operations ).
比较:Compared with partially homomorphic encryption, fully homomorphic encryption provides stronger encryption but suffers from computation costs.
Secret sharing [156] is a cryptographic scheme where a secret key consisting of n shares can be reconstructed only if a sufficient number of shares are combined.
缺点:
However, these methods are vulnerable to the dishonest dealer or malicious participants.
优劣:
As its advantages, (1) there is no requirement of trusted third-parties, (2) the tradeoff between data utility and data privacy is eliminated, and (3) a high accuracy is achieved.
The disadvantages are computational overhead and high communication costs.
In summary, perturbation techniques are simple, efficient, and usually do not require knowledge of the data distribution. However, perturbed data may be vulnerable to probabilistic attacks, and it is difficult to mitigate such risk without reducing the data utility.
分为:global differential and local differential privacy
This technique is simple and can preserve the statistical properties [86]. However, it may degrade the data utility and may be vulnerable to a noise reduction.
Compared with the additive perturbation, multiplicative perturbation is more effective, because reconstructing the original data from the perturbed data of the multiplicative perturbation is more difficult [50].
主要有 k-anonymity, l-diversity, and t-closeness。
分为: (1) privacy metrics for measuring the loss of privacy of a dataset.
(2) utility metrics for measuring the data utility of the protected data for data analysis purposes.
主要的保护方法有:encryption-based, perturbation-based, anonymization-based, and hybrid PPFL.
本章先分析 risks, 再提出解决方案。
internal actors (participating clients and the central server) and external actors (model consumers and eavesdroppers).
With FL, passive attackers only observe computations (e.g., the weights, gradients, and final model) during the training and inference phases [123, 228], whereas active attackers can influence the FL system by manipulating the model parameters to achieve adversarial goals [129, 165].
Training phase and inference phase
Weight update, gradient update, and the final model
model-parameter-based attacks leak much more sensitive information than the query-based attacks.
Inference attacks, including inference of class representatives, memberships , properties of training data, and training samples and labels.
然后他把每一类攻击都讲了一遍。
有点类似第三章,但是是针对 FL 场景下做的。
一般而言, each participant calculates its local gradient using its local dataset, and then encrypts the local gradient, and sends the encrypted gradient to the server for aggregation.
但是具体方法还是有不同的改进
However, the homomorphic encryption utilized in these studies brings about a large communication and
computational overhead.
However, the secure aggregation protocol used in this approach incurs significant communication costs. A challenge of SMC-based FL methods is to improve the computational efficiency, because significant computational resources are required to complete a training round in FL frameworks [147].
(1) global differential privacy-based, (2) local differential privacy-based, (3) additive perturbation-based, and (4) multiplicative perturbation-based PPFL methods.
本文对上述四种方法都做了介绍。
混合了上述几种方法,提出的一个完整的框架。