论文链接:https://arxiv.org/pdf/1812.03337.pdf
机器学习依赖于大量数据的可用性来进行训练。然而,实际上,大多数数据分散在不同的组织中,并且在许多法律和实际限制下很难聚合。在本文中,我们引入了一种新技术和框架,称为联邦迁移学习(FTL),以改进数据联合下的统计模型。联盟允许在不损害用户隐私的情况下共享知识,并且允许在网络中传送互补知识。因此,目标域方可以通过利用源域方的丰富标签来构建更灵活,更强大的模型。还提出了一种安全传输交叉验证方法来保护联邦下的FTL性能。该框架需要对现有模型结构进行少量的修改,并提供与非隐私保护方法相同的准确度。该框架非常灵活,可以有效地适应各种安全的多方机器学习任务。
□背景:
☆Recent Artificial Intelligence (AI) achievements have been depending on the availability of massive amount of labeled data.
☆AlphaGo (Silver et al. 2016) uses 30 millions of moves from 160,000 actual games.The ImageNet dataset (Deng et al. 2009) has over 14 million images.
□困难:
However,across various industries, more fields of application have only small or poor quality data. Labeling data is very expensive, especially in fields which require human expertise and domain knowledge. In addition, data needed for a specific task may not be kept in one place. Many organizations may only have unlabeled data, and some other organizations may have very limited amount of labels. It has been increasingly difficult for organizations to combine their data too.
在各个行业中,更多的应用领域仅有较少或质量差的数据。给数据打标签非常昂贵,特别是在需要人类专业知识和领域知识的领域。此外,特定任务所需的数据不能保存在一个地方。许多组织可能只有未标注的数据,而其他一些组织的标签数量可能非常有限。 组织也越来越难以将其数据结合起来。
□应对:
①
Google first introduced a federated learning (FL) system (McMahan et al. 2016) in which a global machine learning model is updated by a federation of distributed participants while keeping their data locally.
△不足:
These existing approaches are only applicable to either common features or common samples under a federation.
(在现实中,公共实体的集合可能很小,从而降低了联邦的吸引力,使得大多数不重叠的数据受到了破坏。)
②
In this paper, we propose a possible solution to these challenges: Federated Transfer Learning (FTL), which leverages transfer learning technique (Pan et al. 2010) to provide solutions for the entire sample and feature space under a federation.
△主要贡献:
(①我们在privacypreserving设置中引入联合转移学习,以提供超出现有联合学习方法范围的联邦问题的隐私保护解决方案*;
②我们为拟议的FTL问题提供端到端解决方案,并表明所提方法的收敛性和准确性与非私有保留方法相当;
③我们提供了一种采用 加性同态加密(HE)与神经网络的多方计算(MPC) 的新方法,这样只需要对神经网络进行最小的修改,并且精度几乎是无损的,而大多数现有的安全深度 学习框架在采用隐私保护技术时会失去准确性。 )*
(一)Federated learning and Secure Deep Learning
Server-end Models:(applicable for inference only)
① Google:a secure aggregation scheme(安全聚合方案)
②CryptoNets:Neural computations to work with data encrypted with Homomorphic Encryption
③CryptoDL: the activation functions in neural networks with low degree polynomials
④DeepSecure(uses Yao’s Garbled Circuit Protocol for data encryption instead of HE)
In this paper: SecureML
加密:use secret-sharing and Yao’s Garble Circuit for encryption
训练:support collaborative training for linear regression(线性回归) , logistic regression and neural networks
(二)Transfer Learning
应用情景:small dataset(小数据集) or weak supervison(弱监督)
实际应用:图像分类、情感分析……
要求:(尽量)同行业——>知识迁移
数据集:
☆ Without losing generality, we assume all labels are in party A, but all the deduction here can be adapted to the case where labels exist in party B. One can find the commonly shared sample ID set in a privacy-preserving setting by masking data IDs with encryption techniques such as RSA scheme. Here we assume that A and B already found or both know their commonly shared sample IDs. Given the above setting, the objective is for the two parities to build a transfer learning model to predict labels for the target-domain party as accurately as possible without exposing data to each other.
△Security Definition:
① all parties are honest-but-curious
② assume a threat model with a semi-honest adversary D (半诚实的敌人)
③ Protocol P(协议P)
作用:控制信息的披露
① Transfer Learning Model and Federated Framework
② Deep Neural Networks
③ Hidden representation layer :d
④ Prediction function
⑤ Translator function
⑥ Loss function (logistic loss)
⑦ minimize the alignment loss
⑧ Final objective function
⑨ Gradients(梯度——用来更新参数)
背景:广泛用于隐私保护机器学习
本文:二阶泰勒级数计算loss和gradients
① 初始化并在本地独立运行神经网络Net(A),Net(B)获取隐藏表示u(i,A),u(i,B)
② A方计算并加密,发送给B协助B计算梯度Net(B)
③ B方同理②
存在风险:间接泄露(梯度)
应对:采用随机掩码加密传输
模型训练完后。就可以对B方未打标签的数据进行预测,评估了:
① B方利用训练好的神经网络的参数Θ(B)计算u(j,B),并把加密结果[[G(u(j,B)]]发送给A方
② A使用随机值进行评估和掩码,并将加密和掩码的φ(u(j,B))发送到B,B解密并发送回A
③ A获得解密好的φ(u(j,B))从而得到标签,并把标签发送给B方
唯一的性能损失:最终损失函数的二阶泰勒级数(而不是神经网络中每个非线性激活层)
优点:如实验部分所示,损耗和梯度计算中的误差以及采用我们的方法导致的精度损失很小。 因此,该方法可扩展并且灵活地适应神经网络结构的变化。
方法: a secure transfer cross validation approach (TrCV)
① 将有标签的源域数据集划分为k折,每一轮(总共k轮)去一折作为测试集,使用剩余的k-1折数据根据算法1来建模,利用算法2进行标签预测
② 将预测的标签和已有的数据集结合(对应)起来,用算法1对模型重新训练并用一折的测试集进行评估:
③最后获得最终的模型:
注意:
TrCV使用源域标签执行验证,这在目标标签难以获得的情况下可能是有利的。
自学习监督模型MF,Dc也是用Dc建立的,以提供防止负转移的保障措施。
在标签位于源域方的情况下,自学习被简化为基于特征的联合学习问题。 否则,目标域方将自己构建自学模型。 在转移学习模型不如自学模型的情况下,知识不需要转移。
(Notice that TrCV performs validations using source domain labels, which could be advantageous in situations where target labels are difficult to obtain. A self-learning supervised
model MF,Dc is also built with Dc to provide safeguards against negative transfer (Kuzborskij and Orabona 2013; Zhong et al. 2010). In the scenario that the labels are in the source-domain party, the self-learning is reduced to a feature-based federated learning problem. Otherwise the target-domain party will build the self-learning model itself. In the cases that the transfer learning model is inferior to a self-learning model, knowledge needs not to be transfered.)
Theorem 1. The protocol in Algorithm 1 and 2 is secure under our security definition, provided that the underlying additively homomorphic encryption scheme is secure.
Proof:
① The training protocol in Algorithm 1 and 2 do not reveal any information, because all A and B learns are the masked gradients.
As long as the encryption scheme is considered secure, the protocol is secure.
② At inference time, the two parties need to collaboratively compute the prediction results. Note the protocol does not deal with a malicious party. If party A fakes its inputs and submits only one non-zero input, it may tell the value of u(B) i at that input’s position. It still can not tell x(B) i or Θ(B), and neither party will get correct results.
(一)过程:(略)
(二)Impact of Taylor approximation
①As we increased the depth of the neural networks, the convergence and the performance of the model do not decay. (随着神经网络深度增加,收敛性和性能不会衰减)
②大多数现有的安全深度学习神经网络框架在采用隐私保护的方法时会造成精度下降,而本文的方法对更深的网络具有很强的适应性
(三)Transfer learning vs self-learning
① 在使用少量样本时,迁移学习方法优于自主学习
② 随样本数增加,性能表现得到改进
③ 性能随重叠样本数的增加而增加
(四)Scalability
As expected from the above analysis, as we increase the dimension of the hidden representation d, the increase of the running time is accelerating across different values of number of overlapping samples tested. On the other hand, the running time grows linearly with respect to the number of target-domain features, as well as the number of samples shared.
(随着我们隐藏表示维度d的增加,运行时间的增加在所测试的重叠样本数量的不同值上加速。
另一方面,运行时间相对于目标域特征的数量以及共享的样本数量线性增长。)
① The proposed framework is a complete privacypreserving solution which includes training, evaluation and cross validation.
② The current framework is not limited to any specific learning models but rather a general framework for privacy-preserving transfer learning.
③ Future works for FTL may include exploring and adopting the methodology to other deep learning systems where privacy-preservingdata collaborationis needed,and continuingimprovingthe efficiencyof the algorithms by using distributed computing techniques, and finding less expensive encryption schemes.
①FTL框架是一个完整的隐私保护解决方案,包括训练,评估和交叉验证。
②目前的框架不局限于任何特定的学习模式,而是保护隐私的迁移学习的一般框架。
③FTL的未来工作可能包括探索和采用其他深度学习系统的方法,其中需要隐私保护数据协作,并通过使用分布式计算技术继续提高算法的效率,并找到更便宜的加密方案。