Kafka开发实战(一)-入门篇

概述

1.简介

Kafka官网介绍如下:

Apache Kafka is publish-subscribe messaging rethought as a distributed
commit log.

Apache Kafka 是一个高吞吐量分布式消息系统,由LinkedIn开源。

“publish-subscribe” 是Kafka设计的核心思想,也是Kafka最具特色的地方。在Kafka的生态系统中 publish是一个producer的角色,subscribe是consumer的角色,就像我们生活中的一样,生产商生产出来的产品,消费者一般不能够直接去工厂购买,还需要一个代理经销商,所以同样的在Kafka的生态系统中,有一个broker的角色。所以kafka的生态系统大致可以表述如下:
“producer —> broker <— consumer”

摘录一段Kafka官网Introduction中的一段话

Kafka is a distributed, partitioned, replicated commit log service. It
provides the functionality of a messaging system, but with a unique
design. What does all that mean?

First let’s review some basic messaging terminology:

  • Kafka maintains feeds of messages in categories called topics.
  • We’ll call processes that publish messages to a Kafka topic producers.
  • We’ll call processes that subscribe to topics and process the feed of published messages consumers.
  • Kafka is run as a cluster comprised of one or more servers each of which is called a broker.

So, at a high level, producers send messages over the network to the
Kafka cluster which in turn serves them up to consumers like this:

Kafka开发实战(一)-入门篇_第1张图片

Kafka提供了类似于JMS的特性,但是在设计实现上完全不同。kafka对消息流根据topic进行归类,发送消息者称为Producer,订阅消息者称为Consumer,此外Kafka集群有一个或多个server组成,每个server称为broker。

2.Topics and Logs
A topic is a category or feed name to which messages are published. For each topic, the Kafka cluster maintains a partitioned log that looks like this:
Kafka开发实战(一)-入门篇_第2张图片

Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition.

3.Distribution

The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.

4.Producers

Producers publish data to the topics of their choice. The producer is responsible for choosing which message to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the message). More on the use of partitioning in a second.

5.Consumers

Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each message goes to one of them; in publish-subscribe the message is broadcast to all consumers. Kafka offers a single consumer abstraction that generalizes both of these—the consumer group.

优缺点

1.优点

  • 高性能和高吞吐率
  • 支持消息无限堆积,消息存储在磁盘上,存取复杂度为O(1)
  • 支持多分区,同一个partition内保证消息顺序
  • 分布式,易于扩展

2.缺点

  • Kafka不保证严格的消息顺序。
  • 消息可能有丢失,broker没有副本机制,一旦broker宕机,该broker的消息将都不可用。
  • 没有消息确认机制,哪些消息已消费需由consumer维护

使用场景

Kafka作为一个优秀的消息中间件对于一些常规的消息系统是一个不错的选择,另外partitons/replication和容错,可以使Kafka具有良好的扩展性和性能优势。Kafka广泛应用在日志分析系统、网站浏览数据追踪等等。不过到目前为止,我们应该很清楚认识到,Kafka并没有提供JMS中的”事务性”、”消息传输担保(消息确认机制)”等企业级特性;在这方面RabbitMQ和RocketMQ 更胜一筹。

关于Kafka详细介绍请阅读Kafka官网introduction。

参考资料:
kafka入门:简介、使用场景、设计原理、主要配置及集群搭建
RocketMQ与Kafka对比(18项差异)

你可能感兴趣的:(kafka,MQ)