UC Berkeley新发布的KV数据库Anna简评

要点:

  • 充分发挥多核的威力
  • Actor模型,不共享内存,lock-free
  • 每一个Actor固定在一个core的一个线程上
  • Key按一致性哈希分配到不同Server的不同Actor中
  • Hot Key采用多主(multi master)复制,由多个Actor同时并行处理,副本(replica)数量需要根据情况进行选择
  • Actor之间(包括本地和网络中)通过定期广播进行同步,且只同步本地更新的最终状态(如果一次广播期间发生多次改动)
  • Lattice Composition算法进行并行修改的合并,Actor接收到广播消息进行本地合并
  • 保证最终一致性
  • 跟Redis相比,单线程的性能变化不大,优势主要是伸缩性和Hot Key的多副本并发

需要更多细节可以阅读文末链接中的论文,下面是论文内容的一些摘录。

Anna is a new key-value store system called Anna: a partitioned, multi-mastered system that achieves high performance and elasticity via wait-free execution and coordination-free consistency

Our design rests on a simple architecture of coordination-free actors that perform state update via merge of lattice-based composite data structures.

Goal

providing excellent performance on a single multicore
machine, while scaling up elastically to geo-distributed
cloud deployment.

Requirements

  • partition (shard) the key space, not only across nodes at cloud scale but also across cores for high performance
  • workload scaling, employ multi-master replication to concurrently serve puts and gets against a single key from multiple threads
  • wait-free execution, meaning that each thread is always doing useful work (serving requests), and never waiting for other threads for reasons of consistency or semantics
  • coordination-free consistency models

Design

Coordination-free Actors

besting state-of-the-art lock-free shared memory implementations while scaling smoothly and making repartitioning for elasticity extremely responsive.

uses lattice composition to maintain the consistency of replicated state. Lattices are resilient to message re-ordering and duplication, allowing Anna to employ asynchronous multi-master replication without need for any waiting

Anna combines asynchronous multi-master replication with lattice-based state management to remain scalable across both low and high conflict workloads while still guaranteeing consistency

Multi Master Replication

In multi-master replication, a key is replicated on multiple actors, each of which can read and update its own local copy.

In a coordination-free approach, on the other hand, each actor can process a request locally without introducing any inter-actor communication on the critical path. Updates are periodically communicated to other actors when a timer is triggered or when the actor experiences a reduction in request load.

Unlike synchronous multi-master and single-master replication, a coordination-free multi-master scheme could lead to inconsistencies between replicas, because replicas may observe and process messages in different orders.

Rader: Key通过一致性哈希分不到不同的Server和Actors

Multi Cast Periodically

Anna perform updates against their local state in parallel without synchronizing, and periodically exchange state via multicast.

Anna employs simple eventual consistency, and threads are set to multicast every 100 milliseconds.

On single machine, Actors update their local states, then write the updates to a shared buffer and multicast the address of updates in buffer to other actors.

On different machines, updates needs to be serialized (e.g. through protobuf) and then broadcast through tcp.

Rader:Anna是最终一致性的,意味着会有一个时间窗口各个Actor本地的状态是不同步的

Results

Good performance than shared-memory models

Anna indeed achieves wait-free execution: the vast majority of CPU time (90%) is spent processing requests without many cache misses, while overheads of lattice merge and multicast are small. In short, Anna’s Coordination-free actor model addresses the heart of the scalability limitations of multi-core KVS systems.

TBB and Masstree spend 92% - 95% of the CPU time on atomic instructions under high contention, and only 4% - 7% of the CPU time is devoted to request handling. As a result, the TBB hash map and Masstree perform 50× slower than Anna (rep= 1) and 700× slower than Anna (full replication).

Rader:更新冲突较多的情况下,共享内存模型花费了绝大多数的CPU在原子操作上,不管是有锁还是无锁的实现方式维护“缓存一致性”都是瓶颈。 Redis因为是单线程的,没有这方面的问题。

Be care of replications

for systems that support multi-master replication, having a high replication factor under low contention workloads can hurt performance. Instead, we want to dynamically monitor the data’s contention level and
selectively replicate the highly contented keys across threads

Rader:冲突较少的情况下,谨慎选择副本数量,过多的副本会伤害性能

Compare with Redis Cluster

Anna can significantly outperform Redis Cluster by replicating hot keys under high contention, and can
match the performance of Redis Cluster under low contention.

Rader:低冲突的情况下跟Redis性能差不多,但是高冲突的时候可以通过Hot Key副本提高性能

Refs

  • Paper about Anna
  • Blog announce Anna

你可能感兴趣的:(UC Berkeley新发布的KV数据库Anna简评)