Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. Kafka differs from traditional messaging systems in that:
It's designed as a distributed system that's easy to scale out.
It persists messages on disk and thus can be used for batched consumption .
It offers high throughput for both publishing and subscribing.
It supports multi-subscribers and automatically balances the consumers during failure.
With replication, Kafka clients will get the following benefits:
A producer can continue to publish messages during failure and it can choose between latency and durability, depending on the application.
A consumer continues to receive the correct messages in real time, even when there is failure.
All distributed systems must make trade-offs between guaranteeing consistency, availability, and partition tolerance (CAP Theorem).
Our goal was to support replication in a Kafka cluster within a single datacenter, where network partitioning is rare, so our design focuses on maintaining highly available and strongly consistent replicas.
Strong consistency means that all replicas are byte-to-byte identical, which simplifies the job of an application developer.
There are two typical approaches of maintaining strongly consistent replicas.
Both require one of the replicas to be designated as the leader, to which all writes are issued.
The leader is responsible for ordering all incoming writes, and for propagating those writes to other replicas (followers), in the same order.
The first approach is quorum-based. The leader waits until a majority of replicas have received the data before it is considered safe (i.e., committed). On leader failure, a new leader is elected through the coordination of a majority of the followers. This approach is used in Apache Zookeeper and Google'sSpanner.
The second approach is for the leader to wait for "all" (to be clarified later) replicas to receive the data. When the leader fails, any other replica can then take over as the new leader.
We selected the second approach for Kafka replication for two primary reasons:
The second approach can tolerate more failures with the same number of replicas. That is, it can tolerate f failures with f+1 replicas, while the first approach often only tolerates f failures with 2f +1 replicas. For example, if there are only 2 replicas, the first approach can't tolerate any failures.
While the first approach generally has better latency, as it hides the delay from a slow replica, our replication is designed for a cluster within the same datacenter, so variance due to network delay is small.
To understand how replication is implemented in Kafka, we need to first introduce some basic concepts.
In Kafka, a message stream is defined by a topic, divided into one or more partitions.
Replication happens at the partition level and each partition has one or more replicas.
The replicas are assigned evenly to different servers (called brokers) in a Kafka cluster. Each replica maintains a log on disk. Published messages are appended sequentially in the log and each message is identified by a monotonically increasing offset within the log.
The offset is logical concept within a partition. Given an offset, the same message can be identified in each replica of the partition. When a consumer subscribes to a topic, it keeps track of an offset in each partition for consumption and uses it to issue fetch requests to the broker.