Cockroach Design 翻译 ( 五) 无锁分布式事务

6  Lock-Free Distributed Transactions无锁分布式事务

Cockroachprovides distributed transactions without locks. Cockroach transactions supporttwo isolation levels:

l  snapshot isolation (SI) and

serializable snapshot isolation(SSI).




SI is simple to implement, highly performant, and correct for allbut a handful of anomalous conditions(e.g. write skew). SSIrequires just a touch more complexity, isstill highly performant (less so with contention), and has no anomalousconditions. Cockroach’s SSI implementation is based on ideas from theliterature and some possibly novel insights.

SI 实现简单、性能高,在排除少数特殊情况(如:写偏序)下也具有正确性。SSI则更为复杂,但也具有高性能(随竞争增加而降低),同时也保障了正确性(没有特殊情况引致错误)。Cockroach的SSI实现基于来自文献的想法和一些创新性见解。

SSI is thedefault level, with SI provided for application developers who are certainenough of their need for performance and the absence of write skew conditionsto consciously elect to use it. In a lightly contended system, ourimplementation of SSI is just as performant as SI, requiring no locking oradditional writes. With contention, our implementation of SSI still requires nolocking, but will end up aborting more transactions. Cockroach’s SI and SSIimplementations prevent starvation scenarios even for arbitrarily longtransactions.


See the Cahill paper for onepossible implementation of SSI.This is another great paper.For a discussion of SSI implemented by preventing read-write conflicts (incontrast to detecting them, called write-snapshot isolation), see the Yabandehpaper, which is the source of much inspiration for Cockroach’s SSI.

SSI的一种可供参考实现参见 MICHAEL JAMES CAHILL的论文(译注:指《SerializableIsolation for Snapshot Databases》)。还有另一个很棒的论文,是关于通过防止读写冲突来实现SSI的讨论,参见Yabandeh 的论文(译注:指《Predicting and Preventing Inconsistencies in DeployedDistributed Systems》),该论文是CockroachDBSSI实现中许多灵感的源泉。

Both SI andSSI require that the outcome of reads must be preserved, i.e. a write of a keyat a lower timestamp than a previous read must not succeed. To this end, eachrange maintains a bounded in-memory cache from key range to thelatest timestamp at which it was read.


Most updatesto this timestamp cache correspond to keys being read, thoughthe timestamp cache also protects the outcome of some writes(notably range deletions) which consequently must also populate the cache. Thecache’s entries are evicted oldest timestamp first,updating the low water mark of the cache appropriately.


EachCockroach transaction is assigned a random priority and a "candidatetimestamp" at start. The candidate timestamp is the provisional timestampat which the transaction will commit, and is chosen as the current clock timeof the node coordinating the transaction. This means that a transaction withoutconflicts will usually commit with a timestamp that, in absolute time, precedesthe actual work done by that transaction.


In the courseof coordinating a transaction between one or moredistributed nodes, the candidate timestamp may be increased, but will never bedecreased. The core difference between the two isolation levels SI and SSI isthat the former allows the transaction's candidate timestamp to increase andthe latter does not.

