Neo4j Causal Cluster 概要

Causal Cluster

以下内容摘自 Neo4j Operation Manual - Causal Cluster 章节

Operational view

cluster is composed of two roles: Core and Read replica.

Core Server

  • Core Servers’ main responsibility is to safeguard data.
    via Raft protocol
  • once a majority of Core Servers in a cluster (N/2+1) have accepted the transaction, it is safe to acknowledge the commit to the end user application.
  • participate in decision making about cluster topology.
  • Note that should the Core Server cluster suffer enough failures that it can no longer process writes, it will become read-only to preserve safety.

Read Replicas

  • Read Replicas’ main responsibility is to scale out graph workloads (Cypher queries, procedures, and so on).
  • fully-fledged Neo4j database
  • capable of fulfilling arbitrary (read-only) graph queries and procedures
  • Read Replicas are asynchronously replicated from Core Servers via transaction log shipping.
  • relatively large numbers
  • should be treated as disposable.

Causal consistency

  • Causal consistency ensures that causally related operations are seen by every instance in the system in the same order.

On executing a transaction, the client can ask for a bookmark which it then presents as a parameter to subsequent transactions. Only servers which have processed the client’s bookmarked transaction will run its next transaction.

Causal Cluster Lifecycle

It takes in some hints about existing Core cluster servers and using these hints to initiate a network join protocol.
This section describes the lifecycle of a Causal Cluster, from discovery, joining the cluster, Core and Read replica membership and protocols for polling, catchup and backup, to leaving the cluster on shutdown.

Discovery protocol


Core-to-Core or Read replica-to-Core only.

On successful handshake with another server or servers
the current server will discover the whole current topology.
- maintain the current state of available servers.
- help clients route queries to an appropriate server via the client-side drivers.

Core membership

Raft handles cluster membership by making it a normal part of keeping a distributed log in sync.

Read replica membership

When a Read replica performs discovery, once it has made a connection to any of the available Core clusters it proceeds to add itself into a shared whiteboard.

All Read Replicas registered with shared whiteboard.

Transacting via the Raft protocol

Once bootstrapped, each Core Server spends its time processing database transactions.
Updates are reliably replicated around Core Servers via the Raft protocol.
Updates appear in the form of a (committed) Raft log entry containing transaction commands which is subsequently applied to the graph model.
One of Raft’s primary design goals is to be easily understandable so that there are fewer places for tricky bugs to hide in implementations.

  • The Raft Leader for the current term (a logical clock) appends the transaction (an ‘entry’ in Raft terminology) to the head of its local log and asks the other instances to do the same.
  • When the Leader can see that a majority instances have appended the entry, it can be considered committed into the Raft log.
  • The client application can now be informed that the transaction has safely committed since there is sufficient redundancy in the system to tolerate any (non-pathological) faults.
  • only one Leader able to make forward progress in any given term.
  • The Leader bears the responsibility for imposing order on Raft log entries and driving the log forward with respect to the Followers.
  • Followers maintain their logs with respect to the current Leader’s log.
  • Should any participant in the cluster suspect that the Leader has failed, then they can instigate a leadership election by entering the Candidate state. In Neo4j Core Servers this is happens at ms timescale, around 500ms by default.
  • election: The “best state” for a Leader is decided by highest term, then by longest log, then by highest committed entry.
  • the protocol can rapidly piece together which of the remaining instances is best placed to take over from the failed instance (or instances) without data loss. This is the essence of a non-blocking consensus protocol which allows Neo4j Causal Clustering to provide continuous availability to applications.

Catchup protocol


Transaction shipping is instigated by Read Replicas frequently polling any of the Core Servers specifying the ID of the last transaction they received and processed.
If there is a large difference between an Read replica’s transaction history and that of a Core Server, will fall back to copying the database store directly from Core Server to Read replica.

你可能感兴趣的:(图数据库)