RabbitMQ vs Kafka

原文地址:http://www.quora.com/RabbitMQ-vs-Kafka-which-one-for-durable-messaging-with-good-query-features

RabbitMQ vs Kafka: which one for durable messaging with good query features?


Options which I've already dropped:  -- Redis  -- Kestrel  -- HornetQ or any other JMS-based message queuesWhy have I ended up with these two options about RabbitMQ and Kaffka?  
  1. Cross-platform  
  2. Mature and production ready  
  3. High throughput and officially supported clustering  
  4. Strict ordering of messages (minus Kestrel here)  
  5. Native durability and persistence for messages  
  6. Advanced filtering and querying for messagesBottom line 
- I have many different messaging scenarios, where I will need durability, persistence and batch processing for the stored messages. Which one should I choose?
Kafka is a general purpose message broker, like RabbItMQ, with similar distributed deployment goals, but with very different assumptions on message model semantics.   I would be skeptical of the "AMQP is more mature" argument and look at the facts of how either solution solves your problem.

TL;DR, 

a) Use Kafka if you have a fire hose of events (100k+/sec) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, you want to be able to re-read messages, you can deal with current limitations around node-level HA (or can use trunk code), and/or you don't mind supporting incubator-level software yourself via forums/IRC.  

b) Use Rabbit if you have messages (20k+/sec) that need to be routed in complex ways to consumers, you want per-message delivery guarantees, you don't care about ordered delivery, you need HA at the cluster-node level now, and/or you need 24x7 paid support in addition to forums/IRC.

Neither offers great "filter/query" capabilities - if you need that, consider using Storm on top of one of these solutions to add computation, filtering, querying, on your streams.   Or use something like Cassandra as your queryable cache.   Kafka is also definitely not "mature" even though it is "production ready". 

Details (caveat - my opinion, I've not used either in great anger, and I have more exposure to RabbitMQ)

Firstly, on RabbitMQ vs. Kafka.   They are both excellent solutions, RabbitMQ being more mature, but both have very different design philosophies.    Fundamentally, I'd say RabbitMQ is broker-centric, focused around delivery guarantees between producers and consumers, with transient preferred over durable messages.   Whereas Kafka is producer-centric, based around partitioning a fire hose of event data into durable message brokers with cursors, supporting batch consumers that may be offline, or online consumers that want messages at low latency.  

RabbitMQ uses the broker itself to maintain state of what's consumed (via message acknowledgements) - it uses Erlang's Mnesia to maintain delivery state around the broker cluster.  Kafka doesn't have message acknowledgements, it assumes the consumer tracks of what's been consumed so far.   Both Kafka brokers & consumers use Zookeeper to reliably maintain their state across a cluster.

RabbitMQ presumes that consumers are mostly online, and any messages "in wait" (persistent or not) are held opaquely (i.e. no cursor).  RabbitMQ pre-2.0 (2010) would fall over if your consumers were too slow, but now it's robust for online and batch consumers - but clearly large amounts of persistent messages sitting in the broker was not the main design case for AMQP in general.   Kafka was based from the beginning around both online and batch consumers, and also has producer message batching - it's designed for holding and distributing large volumes of messages.  

RabbitMQ provides rich routing capabilities with AMQP 0.9.1's exchange, binding and queuing model.   Kafka has a very simple routing approach - in AMQP parlance it uses topic exchanges only.  

Both solutions run as distributed clusters, but RabbitMQ's philosophy is to make the cluster transparent, as if it were a virtual broker.   Kafka makes it explicit, by forcing the producer to know it is partitioning a topic's messages across several nodes, this has the benefit of preserving ordered delivery within a partition, which is richer than what RabbitMQ exposes, which is almost always unordered delivery (the AMQP 0.9.1 model says "one producer channel, one exchange, one queue, one consumer channel" is required for in-order delivery).   

Put another way, Kafka presumes that producers generate a massive stream of events on their own timetable - there's no room for throttling producers because consumers are slow, since the data is too massive.  The whole job of Kafka is to provide the "shock absorber" between the flood of events and those who want to consume them in their own way -- some online, others offline - only batch consuming on an hourly or even daily basis.   Kafka can deliver "at least once" semantics per partition (since maintains delivery order), just like RabbitMQ, but it does it in a very different way.  

Performance-wise, if you require ordered durable message delivery, currently it looks like there's no comparison:  Kafka currently blows away RabbitMQ in terms of performance on synthetic benchmarks.   This paper indicates 500,000 messages published per second and 22,000 messages consumed per second on a 2-node cluster with 6-disk RAID 10.  
http://research.microsoft.com/en...

Of course this was written by the LinkedIn guys without necessarily expert RabbitMQ input, so YMMV.

Finally, a reminder:  Kafka is an early Apache incubator project.  It doesn't necessarily have all the hard-learned aspects in RabbitMQ. 

Now, a word on AMQP.   Frankly, it seems the standard is a mess.   Officially there is a 1.0 proposed specification that is going through the OASIS standards process.  In practice it is a forked standard, one (0.9.1) supported by vendors, the other (1.0) supported by the working group.    A set of generally available, widely-adopted, production-quality AMQP 1.0 implementations across the major releases (Qpid from Redhat, RabbitMQ, etc.) won't exist until 2013, if ever.

As an external observer with no inside knowledge, here is what it looks like:   the working group spent 5 years on a spec, from 2003 to 2008, culminating in a widely adopted release (0.9.1).   Then a subset of more powerful working group members rewrote the spec by late 2011, completely shifting the focus of the spec from a messaging model to a transport protocol (sort of like TCP++), and declared it 1.0.    So, we have the strange case where the "mature" AMQP is the non-standard 0.9.1 specification and the "immature" AMQP is the actual 1.0 standard.    

This isn't to suggest 1.0 isn't good technology, it likely is, but that it's a much lower-level spec than AMQP intended to be for most of its published life, and is not widely supported yet beyond prototypes and one GA implementation that I know of (IIT SwiftMQ). The RabbitMQ folks have a prototype that has layers the 0.9.1 model on top of 1.0 but have not committed to a GA timeframe.

So, in my opinion, AMQP has lost some of its sheen, as while there's ample evidence it is interoperable from the various connect-fests over the years, the standards politics have delayed the official standard and called into question its widespread support.   On the bright side, one can argue that AMQP has already succeeded in its goal of helping to break the hold TIBCO had on high performance, low latency messaging through 2007 or so.   Now there are many options.  Bet on the broker you choose to use, and don't expect bug-free interoperability for a few years (if ever).

你可能感兴趣的:(rabbitmq,大数据,kafka,实时计算,分布式处理)