Kafka分布式消息发布和订阅系统简介



kafka 官网上对 kafka 的定义叫:A distributed publish-subscribe messaging system。publish-subscribe是发布和订阅的意思,所以更准确的说kafka是一个消息订阅和发布的系统。publish-subscribe这个概念很重要,因为kafka的设计理念就可以从这里说起。

Kafka有哪些吸引程序员去使用的特点:
Apache网站给出以下介绍
1. Fast(高吞吐量的,一个Kafka的broker每秒能读写数百M的数据)
A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.
2.Scalable(可扩展的,不须停机就可扩展)
Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers
3.Durable(持久化的)
Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
4.Distributed by Design(分布式的)
Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.


我们将消息的发布(publish)暂时称作producer,将消息的订阅(subscribe)表述为consumer,将中间的存储阵列称作broker,这样我们就可以大致描绘出这样一个场面:
Kafka分布式消息发布和订阅系统简介_第1张图片

生产者将数据生产出来,丢给broker进行存储,消费者需要消费数据了,就从broker中去拿出数据来,然后完成一系列对数据的处理。

乍一看这也太简单了,不是说了它是分布式么,难道把producer、broker和consumer放在三台不同的机器上就算是分布式了么。我们看kafka官方给出的图:

Kafka分布式消息发布和订阅系统简介_第2张图片

多个broker协同合作,producer和consumer部署在各个业务逻辑中被频繁的调用,三者通过zookeeper管理协调请求和转发。这样一个高性能的分布式消息发布与订阅系统就完成了。图上有个细节需要注意,producer到broker的过程是push,也就是有数据就推送到broker,而consumer到broker的过程是pull,是通过consumer主动去拉数据的,而不是broker把数据主动发送到consumer端的。


整个系统运行的顺序:

1.         启动zookeeper的server

2.         启动kafka的server(broker),注册在zookeeper上

3.         Producer如果生产了数据,会先通过zookeeper找到broker,然后将数据存放进broker

4.         Consumer如果要消费数据,会先通过zookeeper找对应的broker,然后消费。




你可能感兴趣的:(Kafka分布式消息发布和订阅系统简介)