拜占庭容错的三个基本理论(CAP/FLP/DLS)

拜占庭容错的三个基本理论

1) CAP理论 - "如果网路发生阻断(partition)时,你只能选择资料的一致性(consistency)或可用性(availability),无法两者兼得"。论点比较真观:如果网路因阻断而分隔为二,在其中一边我送出一笔交易:"将我的十元给A";在另一半我送出另一笔交易:"将我的十元给B "。则此时系统要不是,a)无可用性,即这两笔交易至少会有一笔交易不会被接受;要不就是,b)无一致性,一半看到的是A多了十元而另一半则看到B多了十元。要注意的是,CAP理论和扩展性(scalability)是无关的,在分片(sharded)或非分片的系统皆适用。

2)FLP impossibility (O网页链接)-在异步的环境中,如果节点间的网路延迟没有上限,只要有一个恶意的节点存在,就没有算法能在有限的时间内达成共识。但值得注意的是, "Las Vegas" algorithms(这个算法又叫撞大运算法,其保证结果正确,只是在运算时所用资源上进行赌博。一个简单的例子是随机快速排序,他的pivot是随机选的,但排序结果永远一致)在每一轮皆有一定机率达成共识,随着时间增加,机率会越趋近于1。而这也是许多成功的共识演算法会采用的解决办法。

3)容错的上限-由DLS论文(O网页链接)我们可以得到以下结论: (1)在部分同步(partially synchronous)的网路环境中(即网路延迟有一定的上限,但我们无法事先知道上限是多少),协议可以容忍最多1/3的拜占庭故障(Byzantine fault)。(2)在异步(asynchronous)的网路环境中,具确定性质的协议无法容忍任何错误,但这篇论文并没有提及randomized algorithms在这种情况可以容忍最多1/3的拜占庭故障。(3)在同步(synchronous)的网路环境中(即网路延迟有上限且上限是已知的),协议可以容忍100%的拜占庭故障,但当超过1/2的节点为恶意节点时,会有一些限制条件。要注意的是,我们考虑的是"具认证特性的拜占庭模型(authenticated Byzantine)",而不是"一般的拜占庭模型";具认证特性指的是将如今已经过大量研究且成本低廉的公私钥加密机制应用在我们的演算法中。


以上描述来自以太坊的官方Wiki - Proof of Stake FAQ (https://github.com/ethereum/wiki/wiki/Proof-of-Stake-FAQ)

 

  • CAP theorem - "in the cases that a network partition takes place, you have to choose either consistency or availability, you cannot have both". The intuitive argument is simple: if the network splits in half, and in one half I send a transaction "send my 10 coins to A" and in the other I send a transaction "send my 10 coins to B", then either the system is unavailable, as one or both transactions will not be processed, or it becomes inconsistent, as one half of the network will see the first transaction completed and the other half will see the second transaction completed. Note that the CAP theorem has nothing to do with scalability; it applies to sharded and non-sharded systems equally. See also https://github.com/ethereum/wiki/wiki/Sharding-FAQs#but-doesnt-the-cap-theorem-mean-that-fully-secure-distributed-systems-are-impossible-and-so-sharding-is-futile.
  • FLP impossibility - in an asynchronous setting (i.e. there are no guaranteed bounds on network latency even between correctly functioning nodes), it is not possible to create an algorithm which is guaranteed to reach consensus in any specific finite amount of time if even a single faulty/dishonest node is present. Note that this does NOT rule out "Las Vegas" algorithms that have some probability each round of achieving consensus and thus will achieve consensus within T seconds with probability exponentially approaching 1 as T grows; this is in fact the "escape hatch" that many successful consensus algorithms use.
  • Bounds on fault tolerance - from the DLS paper we have: (i) protocols running in a partially synchronous network model (i.e. there is a bound on network latency but we do not know ahead of time what it is) can tolerate up to 1/3 arbitrary (i.e. "Byzantine") faults, (ii) deterministic protocols in an asynchronous model (i.e. no bounds on network latency) cannot tolerate faults (although their paper fails to mention that randomized algorithms can with up to 1/3 fault tolerance), (iii) protocols in a synchronous model (i.e. network latency is guaranteed to be less than a known d) can, surprisingly, tolerate up to 100% fault tolerance, although there are restrictions on what can happen when more than or equal to 1/2 of nodes are faulty. Note that the "authenticated Byzantine" model is the one worth considering, not the "Byzantine" one; the "authenticated" part essentially means that we can use public key cryptography in our algorithms, which is in modern times very well-researched and very cheap.

 


=>更多文章请参考:《中国互联网业务研发体系架构指南》

=>更多行业权威架构案例及领域标准、技术趋势请关注微信公众号 '软件真理与光':

公众号:关注更多实时动态 更多权威内容关注公众号:软件真理与光

你可能感兴趣的:(拜占庭容错的三个基本理论(CAP/FLP/DLS))