PySyft是将隐私引入深度学习模型的框架。

Trust is a key factor in the implementation of deep learning applications. From training to optimization, the lifecycle of a deep learning model is tied to trusted data exchanges between different parties. That dynamic is certainly effective for a lab environment but results vulnerable to several all sorts of security attacks that manipulate the trusted relationships between the different participants in a model. Let’s take the example of a credit scoring model based that uses financial transaction to classify the credit risk for a specific customer. The traditional mechanisms for training or optimizing a model assume that the entities performing those actions will have full access to those financial datasets which opens the door to all sorts of privacy risks. As deep learning evolves, the need for mechanisms that enforce privacy constraints during the lifecycle of the datasets and model is becoming increasingly important. Among the technologies trying to address this monumental challenge, PySyft is a recent framework that has been steadily gaining traction within the deep learning community.

信任是实施深度学习应用程序的关键因素。 从培训到优化,深度学习模型的生命周期与不同各方之间的可信数据交换紧密相关。 这种动态方式对于实验室环境无疑是有效的,但容易受到多种安全攻击的攻击,这些安全攻击操纵模型中不同参与者之间的信任关系。 让我们以信用评分模型为例,该模型使用金融交易对特定客户的信用风险进行分类。 用于训练或优化模型的传统机制假设执行这些操作的实体将完全访问那些财务数据集,从而为各种隐私风险打开了大门。 随着深度学习的发展,对在数据集和模型的生命周期内强制执行隐私约束的机制的需求变得越来越重要。 在试图解决这一巨大挑战的技术中, PySyft是一个最近的框架,在深度学习社区中一直稳步增长。

The importance of privacy in deep learning application is directly tied to the emergence of distributed, multi-party models. The traditional approach to deep learning solutions relies on centralized parties that control the entire lifecycle of a model even if using large distributed computing infrastructure. This is the case of an organization that creates a prediction model that manages the preferences of customers visiting their website. However, centralized deep learning topologies have proven impractical in scenarios such as mobile or internet of things(IOT) that rely on large number of devices producing data and executing models. In those scenarios, the distributed parties are not only often producing sensitive datasets but also executing and evaluating the performance of deep learning models. That dynamic requires a bidirectional privacy relationship between different parties responsible for creating, training and executing deep learning models.

隐私在深度学习应用程序中的重要性直接与分布式多方模型的出现有关。 深度学习解决方案的传统方法依赖于即使在使用大型分布式计算基础架构的情况下,也可以控制模型整个生命周期的集中方。 这是一家组织创建的预测模型来管理客户访问其网站的偏好的情况。 但是,事实证明,集中式深度学习拓扑在诸如移动或物联网(IOT)等依赖大量设备来生成数据和执行模型的场景中是不切实际的。 在这些情况下,分布式方不仅经常生成敏感数据集,而且还会执行和评估深度学习模型的性能。 这种动态需要负责创建,训练和执行深度学习模型的不同各方之间的双向隐私关系。

The transition towards more distributed architectures is one of the primary forces behind the need for strong privacy mechanisms in deep learning models. That’s the challenge that PySyft setup to address but it wouldn’t have been possible without the evolution of several areas of research in machine learning and distributed programming.

向更分布式架构的过渡是深度学习模型中强大的隐私机制需求背后的主要力量之一。 这是PySyft设置要解决的挑战,但是如果没有机器学习和分布式编程的若干研究领域的发展,这将是不可能的。

推动者 (The Enablers)

Privacy in deep learning models has been a well-known problem for years but the technologies that can provide a solution are just now achieving certain levels of viability. In the case of PySyft, the framework leverages three of the most fascinating techniques in machine learning and cryptography of the last decade:

多年来,深度学习模型中的隐私一直是一个众所周知的问题,但是可以提供解决方案的技术刚刚达到了一定程度的可行性。 以PySyft为例,该框架利用了过去十年中机器学习和密码学中最引人入胜的三种技术:

· Secured Multi-Party Computations

·安全的多方计算

· Federated Learning

·联合学习

· Differential Privacy

·差异性隐私

安全的多方计算 (Secured Multi-Party Computations)

Secured Multi-Party Computations(sMPC) is a cryptographic technique that allows different parties to perform computations over inputs while maintaining those inputs private. In computer science theory, sMPC is often seen as a solution to the famous Yao’s Millionaires’ Problem introduced in the 1980s by computer scientist Andrew Yao. The problem describes a setting in which multiple millionaires would like to knowing which of them is richer without disclosing their actual wealth. The millionaire’s problem is present in many real world scenarios such as auctions, elections or online gaming.

安全多方计算(sMPC)是一种加密技术,它允许不同方对输入执行计算,同时将这些输入保持私有。 在计算机科学理论中,sMPC通常被视为对计算机科学家安德鲁·姚 ( Andrew Yao)在1980年代提出的著名的姚明百万富翁问题的一种解决方案。 该问题描述了一种背景,在这种背景下,多个百万富翁想知道他们中的哪个更富有而又不披露其实际财富。 百万富翁的问题存在于许多现实世界中,例如拍卖,选举或在线游戏。

Conceptually, sMPC provides replaces the need for a trusted intermediary with secured computations. In the sMPC model, a set of parties with private inputs compute a distributed functions such as security properties such as fairness, privacy and correctness are preserved.

从概念上讲,sMPC提供了用安全计算代替对可信中介的需求。 在sMPC模型中,一组具有私人输入的参与者计算了分布式功能,例如保留了公平,隐私和正确性之类的安全属性。

Source: https://www.semanticscholar.org/paper/Collaborative-network-outage-troubleshooting-with-Djatmiko-Schatzmann/e932792557c785e7084e16691512d1866a6264d5/figure/0 来源: https : //www.semanticscholar.org/paper/Collaborative-network-outage-troubleshooting-with-Djatmiko-Schatzmann/e932792557c785e7084e16691512d1866a6264d5/figure/0

联合学习 (Federated Learning)

Federated learning is a new learning architecture for AI systems that operate in highly distributed topologies such as mobile or internet of things(IOT) systems. Initially proposed by Google research labs, federated learning represents an alternative to centralized AI training in which a shared global model is trained under the coordination of a central server, from a federation of participating devices. In that model, the different devices can contribute to the training and knowledge of the model while keeping most of the data in the device.

联合学习是一种用于AI系统的新学习体系结构,该AI体系结构在高度分布式的拓扑(例如移动或物联网(IOT)系统)中运行。 联合学习最初是由Google研究实验室提出的,它代表了集中式AI训练的一种替代方法,在该方法中,共享的全局模型是在中央服务器的协调下从参与设备的联合中进行训练的。 在该模型中,不同的设备可以帮助模型的训练和知识,同时将大多数数据保留在设备中。

In a federated learning model, a party downloads the a deep learning model, improves it by learning from data on a given device, and then summarizes the changes as a small focused update. Only this update to the model is sent to the cloud, using encrypted communication, where it is immediately averaged with other user updates to improve the shared model. All the training data remains on the original device, and no individual updates are stored in the cloud.

在联合学习模型中,参与方下载深度学习模型,通过从给定设备上的数据中学习来改进深度学习模型,然后将更改总结为一个小的重点更新。 仅使用加密通信将模型的此更新发送到云,在此立即将其与其他用户更新平均,以改善共享模型。 所有培训数据都保留在原始设备上,并且没有单独的更新存储在云中。

Source: https://ai.googleblog.com/2017/04/federated-learning-collaborative.html 来源: https : //ai.googleblog.com/2017/04/federated-learning-collaborative.html

差异隐私 (Differential Privacy)

Differential privacy is a technique used to limit the impact that statistical algorithms can have on the privacy of subjects whose information is part of a larger dataset. Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual’s information was used in the computation. Differential privacy often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms provably resist such attacks.

差异隐私是一种用于限制统计算法可能会对信息是较大数据集一部分的对象的隐私产生影响的技术。 粗略地说,如果观察者看不到算法的输出,则该算法是差分私有的。 通常在识别其信息可能在数据库中的个人的上下文中讨论差异隐私。 尽管它不直接涉及标识和重新标识攻击,但差分私有算法可证明可抵抗此类攻击。

Source: https://towardsdatascience.com/understanding-differential-privacy-85ce191e198a 资料来源: https : //towardsdatascience.com/understanding-differential-privacy-85ce191e198a

PySyft (PySyft)

PySyft is a framework that enables secured, private computations in deep learning models. PySyft combines federated learning, secured multiple-party computations and differential privacy in a single programming model integrated into different deep learning frameworks such as PyTorch, Keras or TensorFlow. The principles of PySyft were originally outlined in a research paper and its first implementation was lead by OpenMind, one of the leading decentralized AI platforms.

PySyft是一个框架,可在深度学习模型中实现安全的私有计算。 PySyft在集成到不同深度学习框架(例如PyTorch,Keras或TensorFlow)的单个编程模型中结合了联合学习,安全的多方计算和差异隐私。 PySyft的原理最初是在一份研究论文中概述的,其首个实现是由领先的去中心化AI平台之一OpenMind领导的。

The core component of PySyft is an an abstraction called the SyftTensor. SyftTensors are meant to represent a state or transformation of the data and can be chained together. The chain structure always has at its head the PyTorch tensor, and the transformations or states embodied by the SyftTensors are accessed downward using the child attribute and upward using the parent attribute.

PySyft的核心组件是称为SyftTensor的抽象。 SyftTensors旨在表示数据的状态或转换,并且可以链接在一起。 链结构始终在其头部具有PyTorch张量,并且使用child属性向下访问由SyftTensor体现的变换或状态,而使用parent属性向上访问由SyftTensor体现的变换或状态。

Source: https://arxiv.org/pdf/1811.04017.pdf 资料来源: https : //arxiv.org/pdf/1811.04017.pdf

Using PySyft is relatively simple and not very different than your standard PyTorch or Keras program. The animation below illustrates a simple classification model using PySyft.

使用PySyft相对简单,并且与标准PyTorch或Keras程序没有太大区别。 下面的动画说明了使用PySyft的简单分类模型。

PySyft represents one of the first attempts to enable robust privacy models in deep learning programs. As the space evolves, privacy is likely to become one of the foundational building blocks of the next generation of deep learning frameworks.

PySyft代表了在深度学习程序中启用可靠的隐私模型的首批尝试之一。 随着空间的发展,隐私可能会成为下一代深度学习框架的基础构建块之一。

翻译自: https://medium.com/swlh/pysyft-is-a-framework-for-bringing-privacy-to-deep-learning-models-5ac23131954

你可能感兴趣的:(深度学习,机器学习,python,tensorflow,人工智能)