要点:
- 充分发挥多核的威力
- Actor模型,不共享内存,lock-free
- 每一个Actor固定在一个core的一个线程上
- Key按一致性哈希分配到不同Server的不同Actor中
- Hot Key采用多主(multi master)复制,由多个Actor同时并行处理,副本(replica)数量需要根据情况进行选择
- Actor之间(包括本地和网络中)通过定期广播进行同步,且只同步本地更新的最终状态(如果一次广播期间发生多次改动)
- Lattice Composition算法进行并行修改的合并,Actor接收到广播消息进行本地合并
- 保证最终一致性
- 跟Redis相比,单线程的性能变化不大,优势主要是伸缩性和Hot Key的多副本并发
需要更多细节可以阅读文末链接中的论文,下面是论文内容的一些摘录。
Anna is a new key-value store system called Anna: a partitioned, multi-mastered system that achieves high performance and elasticity via wait-free execution and coordination-free consistency
Our design rests on a simple architecture of coordination-free
actors
that perform state update via merge of lattice-based composite data structures.
Goal
providing excellent performance on a single multicore
machine, while scaling up elastically to geo-distributed
cloud deployment.
Requirements
-
partition
(shard) the key space, not only across nodes at cloud scale but also across cores for high performance - workload scaling, employ
multi-master replication
to concurrently serve puts and gets against a single key from multiple threads -
wait-free execution
, meaning that each thread is always doing useful work (serving requests), and never waiting for other threads for reasons of consistency or semantics - coordination-free consistency models
Design
Coordination-free Actors
besting state-of-the-art lock-free shared memory implementations while scaling smoothly and making repartitioning for elasticity extremely responsive.
uses
lattice composition
to maintain the consistency of replicated state. Lattices are resilient to message re-ordering and duplication, allowing Anna to employ asynchronous multi-master replication without need for any waiting
Anna combines
asynchronous multi-master replication
withlattice-based state management
to remain scalable across both low and high conflict workloads while still guaranteeing consistency
Multi Master Replication
In multi-master replication, a key is replicated
on multiple actors, each of which can read and update its own local copy.
In a coordination-free approach, on the other hand, each actor can process a request locally without introducing any inter-actor communication on the critical path. Updates are periodically communicated to other actors when a timer is triggered or when the actor experiences a reduction in request load.
Unlike synchronous multi-master and single-master replication, a coordination-free multi-master scheme could lead to inconsistencies between replicas, because replicas may observe and process messages in different orders.
Rader: Key通过一致性哈希分不到不同的Server和Actors
Multi Cast Periodically
Anna perform updates against their local state in parallel without synchronizing, and periodically exchange state via multicast.
Anna employs simple eventual consistency
, and threads are set to multicast every 100 milliseconds.
On single machine, Actors update their local states, then write the updates to a shared buffer and multicast the address of updates in buffer to other actors.
On different machines, updates needs to be serialized (e.g. through protobuf) and then broadcast through tcp.
Rader:Anna是最终一致性的,意味着会有一个时间窗口各个Actor本地的状态是不同步的
Results
Good performance than shared-memory models
Anna indeed achieves wait-free execution: the vast majority of CPU time (90%) is spent processing requests without many cache misses, while overheads of lattice merge and multicast are small. In short, Anna’s Coordination-free actor model addresses the heart of the scalability limitations of multi-core KVS systems.
TBB and Masstree spend 92% - 95% of the CPU time on atomic instructions under high contention, and only 4% - 7% of the CPU time is devoted to request handling. As a result, the TBB hash map and Masstree perform 50× slower than Anna (rep= 1) and 700× slower than Anna (full replication).
Rader:更新冲突较多的情况下,共享内存模型花费了绝大多数的CPU在原子操作上,不管是有锁还是无锁的实现方式维护“缓存一致性”都是瓶颈。 Redis因为是单线程的,没有这方面的问题。
Be care of replications
for systems that support multi-master replication, having a high replication factor under low contention workloads can hurt performance. Instead, we want to dynamically monitor the data’s contention level and
selectively replicate the highly contented keys across threads
Rader:冲突较少的情况下,谨慎选择副本数量,过多的副本会伤害性能
Compare with Redis Cluster
Anna can significantly outperform Redis Cluster by replicating hot keys under high contention, and can
match the performance of Redis Cluster under low contention.
Rader:低冲突的情况下跟Redis性能差不多,但是高冲突的时候可以通过Hot Key副本提高性能
Refs
- Paper about Anna
- Blog announce Anna