同态加密有什么缺点吗
The hype is dead, long live the hype. After deep learning, a new entry is about ready to go on stage. The usual journalists are warming up their keyboards for blogs, news feeds, tweets, in one word, hype. This time it’s all about privacy and data confidentiality. The new words, homomorphic encryption.
炒作已经死了,炒作万岁。 深度学习后,一个新的条目即将准备上台。 通常,新闻记者正在为博客,新闻提要,推文(一句话,就是炒作)加油。 这次全部是关于隐私和数据机密性的。 新词, 同态加密 。
For the record, I am not personally against such a technology — quite the opposite I think it is very powerful and clever, rather against the misleading claims that usually make more followers than the technology itself. The purpose of this post is to shed light on homomorphic encryption, its benefits, and limitations, being as impartial as possible.
作为记录,我个人并不反对这种技术,相反,我认为它非常强大和聪明,而是反对通常比技术本身引起更多追随者的误导性主张。 这篇文章的目的是阐明同态加密,它的好处和局限性,并且要尽可能公正。
什么是同态加密? (What is Homomorphic Encryption?)
Homomorphic encryption (HE) is an encryption scheme that allows one to compute something like Encrypted(2) + Encrypted(3) = Encrypted(5)
. While such operation does not shock anyone per se, as a matter of fact, operands 2
, 3
, and result 5
are never disclosed. Boom!
同态加密(HE)是一种加密方案,允许人们计算诸如Encrypted(2) + Encrypted(3) = Encrypted(5)
。 虽然这样的操作不震本身任何人,作为事实上,操作数2
, 3
,和导致5
从未公开。 繁荣!
The mathematics behind HE refers to the concept of homomorphism in algebra, a structure-preserving map between two algebraic structures of the same type. Basically, both encryption and decryption functions can be thought of as homomorphisms between plaintext and ciphertext spaces. This definition explains why the sum of two operands in the plaintext space is preserved in the ciphertext space too.
HE背后的数学是指代数中的同构概念,即同类型的两个代数结构之间的保留结构图。 基本上,加密和解密功能都可以视为纯文本空间和密文空间之间的同构。 此定义解释了为什么明文空间中两个操作数之和也保留在密文空间中。
All computations within an HE setting are represented as either Boolean or arithmetic circuits, depending on the type of computation to represent and the types of gates allowed in each one. Feel free to expand on this at the links provided above.
HE设置内的所有计算均表示为布尔或算术电路,具体取决于要表示的计算类型和每个运算所允许的门的类型。 在上面提供的链接上随意扩展。
The theory behind HE is not new (as the theory behind deep learning), being the first HE scheme proposed in the late 70s. Since then, many flavors of homomorphic encryption have been proposed, populating the literature with three major schemes recognized by the research community:
HE背后的理论并不新颖(作为深度学习背后的理论),是70年代后期提出的第一个HE计划。 从那时起,提出了多种同态加密形式,并用研究界认可的三种主要方案填充了文献:
partially HE, supporting the evaluation of circuits of only one type of gate (e.g. only additions or only multiplications)
部分HE ,支持仅对一种门的电路评估(例如,仅加法或乘法)
somewhat HE, capable of evaluating two types of gates
有点HE ,能够评估两种类型的门
fully HE, capable of evaluating arbitrary circuits (becoming the most valuable and powerful of the three)
完全HE ,能够评估任意电路(成为三个电路中最有价值和功能最强大的)
From a practical perspective, somewhat homomorphic encryption schemes are limited to evaluating low-degree polynomials over encrypted data. Such a scheme is limited because each ciphertext is noisy, and such noise grows as one keeps performing additions and multiplications in the ciphertext space. At some point, the amount of noise that gets accumulated becomes so high that the resulting ciphertext becomes indecipherable. The presence of noise is essential to guarantee a certain degree of security and protecting the secret operands and results of intermediate computations. As in classic encryption, the value to be protected is spread in a large noisy space. The role of the secret key is to decipher and reconstruct securely the encrypted components of a computation. The aforementioned space is usually a very very large polynomial with a degree of 10K or more. In such a context, each coefficient is represented by more than 600 bits. This explains how large a dataset would become after encrypting it with HE schemes to guarantee industrial security. Obviously, one could trade security for a lower degree polynomial and a smaller cyphertext.
从实践的角度来看, 有些同态加密方案仅限于评估加密数据上的低次多项式。 由于每个密文都是嘈杂的,因此这种方案受到限制,并且随着人们在密文空间中不断执行加法和乘法运算,这种噪声也会增加。 在某些时候,累积的噪声量变得如此之高,以致最终的密文变得难以理解。 噪声的存在对于确保一定程度的安全性以及保护秘密操作数和中间计算结果至关重要。 与经典加密一样,要保护的值会散布在一个嘈杂的空间中。 密钥的作用是解密和安全地重建计算的加密组件。 上述空间通常是一个非常大的多项式,其阶数为10K或更大。 在这种情况下,每个系数由600多个位表示。 这就解释了在使用HE方案加密数据集以保证工业安全之后,数据集将变得多大。 显然,可以将安全性换成低阶多项式和较小的密文。
For many realistic use cases, the only flavor of HE that is considered is fully HE. Performing additions and multiplications in an encrypted state are two essential operations to build all other non-primitive operations that are necessary to represent any type of computation. The ultimate goal of such a scheme would be to be capable of performing arbitrary circuits and essentially represent a Turing complete framework.
对于许多现实的用例,唯一考虑到的HE风味就是完全HE 。 在加密状态下执行加法和乘法是构建表示其他任何类型的计算所必需的所有其他非本原运算的两个基本操作。 这种方案的最终目标是能够执行任意电路,并实质上代表图灵完整的框架。
您为什么要关心FHE? (Why should you care about FHE?)
Because under an FHE scheme no operand and no result is ever disclosed, highly regulated environments are the first to benefit of such a powerful property. Banks, social media platforms and pharmaceutical companies running their business in Amazon Web Services are in fact giving away their secrets. Because “There’s no cloud. It’s just someone else’s computer”. As a matter of fact, doing computation on a system that is being managed by third parties, means that the BOFH has access to your data.
因为在FHE方案下,没有操作数和结果都没有被公开,所以受严格监管的环境是第一个受益于这种强大特性的环境。 实际上,在Amazon Web Services中开展业务的银行,社交媒体平台和制药公司正在泄露自己的秘密。 因为“没有云。 这只是别人的计算机 。” 实际上,在由第三方管理的系统上进行计算意味着BOFH可以访问您的数据。
While technologies like AMD’s Secure Encrypted Virtualization (SEV) implemented at the level of the CPU, encrypt the entire CPU state (basically all the registers) with keys that are not accessible to the other guests, many other use cases require more sophisticated approaches to the problem of data confidentiality. For instance, the requirements for private machine learning go beyond protecting the CPU state and register set.
虽然像AMD的安全加密虚拟化(SEV)之类的技术是在CPU级别实施的,但使用其他来宾无法访问的密钥对整个CPU状态(基本上是所有寄存器)进行加密,但许多其他用例需要更复杂的方法来实现。数据机密性问题。 例如,私有机器学习的要求不仅限于保护CPU状态和寄存器集。
现实世界中的同态加密 (Homomorphic encryption in the real world)
Some practical — but nonexhaustive — use cases that would be possible with FHE schemes are listed below:
下面列出了FHE方案可能会用到的一些实用但非穷举的用例:
secure cloud outsourcing (e.g. sending both data and computation to Amazon EC2 instances and being sure nobody can access them) (already possible with SEV-ES and orders of magnitude more efficient)
安全的云外包 (例如,将数据和计算都发送到Amazon EC2实例,并确保没有人可以访问它们)(已经可以使用SEV-ES并提高效率几个数量级)
making the intersection of private sets without disclosing the results (e.g. find common records in two private datasets, without revealing any of the datasets, nor the common records if any)
在不公开结果的情况下进行私有集的交集 (例如,在两个私有数据集中找到公用记录,而不显示任何数据集,也没有公用记录(如果有))
training machine learning models on private data without revealing the models
在不透露模型的情况下在私有数据上训练机器学习模型
searching without revealing the query (nor the results)
搜索而不显示查询(也不显示结果)
性能和准确性 (Performance and accuracy)
Security and confidentiality usually come at a very high cost in terms of performance. Highly regulated environments that are prone to trade performance for maximum security might be interested in adopting technology like FHE. But what are these costs?
就性能而言,安全性和机密性通常要付出很高的代价。 易于交易性能以获得最大安全性的高度管制的环境可能对采用FHE等技术感兴趣。 但是这些费用是多少?
Benchmarks (the least scientific version of them) measured from the most widely used family of models in data science, linear regression, with a bit more than a dozen variables are provided below.
下面提供了从数据科学中使用最广泛的模型系列(线性回归)中测得的基准(其中最不科学的一种),其中包含十几个变量。
For the data scientist who is used to train a model in about 1 hour, the same model under the FHE setting would take about 2 days. The same model requiring 2 GB of memory, would need about 30GB. Still doable, if that model is logistic regression. Let’s see how long training a neural network would take.
对于习惯于在大约1个小时内训练模型的数据科学家而言,在FHE设置下的同一模型大约需要2天。 同一型号需要2 GB内存,大约需要30GB。 如果该模型是逻辑回归,则仍然可行。 让我们看看神经网络需要花费多长时间的训练。
A relatively small neural network that converges in approximately 12 hours, would take about 15 days. And the memory requirements would grow from 10GB to 150GB. However, despite the large divergence in terms of computational cost and memory usage, the accuracy degradation from computation in plaintext to one in cyphertext is negligible. The slight drop in accuracy is usually due to float-to-integer conversions. Each operation performed on a floating-point value decreases its accuracy a little for additions and a bit more for multiplications.
一个相对较小的神经网络,大约需要12个小时才能收敛,大约需要15天。 内存需求将从10GB增加到150GB。 但是,尽管在计算成本和内存使用方面存在很大差异,但是从纯文本计算到密文计算精度的降低可以忽略不计。 精度略有下降的原因通常是浮点数到整数的转换。 对于浮点值执行的每个运算,其加法运算的精度都会降低一点,而对于乘法运算则会降低一些精度。
Sometimes those overheads are acceptable in some domains e.g. analyzing financial data that are siloed among retail, loan, and investment banking. In many other domains (especially those in which a near real-time prediction is necessary), FHE could not even remotely be considered.
有时,这些开销在某些领域是可以接受的,例如分析零售,贷款和投资银行业务中孤立的财务数据。 在许多其他领域(尤其是需要近实时预测的领域)中,甚至无法远程考虑FHE。
结论 (Conclusion)
Homomorphic encryption is a very interesting field of research that — like deep learning — has been overlooked in the past years. Different market conditions and innovative findings in tech are making some of the FHE assumptions possible today. I am personally observing many analogies with what happened with back-propagation and neural networks in the 90s. An analogy I would definitely not like to see is the hype around a technology that not only is not ready and but is also limited by the theory of mathematics.
同态加密是一个非常有趣的研究领域,就像深度学习一样,在过去几年中被忽视了。 如今,不同的市场条件和技术创新成果使某些FHE假设成为可能。 我个人观察到许多与90年代反向传播和神经网络发生的类比。 我绝对不会看到这样的类比:围绕一项技术的炒作,该技术不仅尚未准备就绪,而且还受到数学理论的限制。
Feel free to follow me on GitHub or listen to my podcast Data Science at Home
欢迎在GitHub上关注我或在家里收听我的播客数据科学
Originally published at https://codingossip.github.io on August 11, 2020.
最初于 2020年8月11日 发布在 https://codingossip.github.io 。
翻译自: https://medium.com/swlh/why-you-care-about-homomorphic-encryption-c8615676f5d1
同态加密有什么缺点吗