Kafka and Samza: Real-time stream processing

As we known, for big data analysis, we have those two already learned[1]: 


Kafka and Samza: Real-time stream processing_第1张图片



Batch Processing is map-reduce. And Iterative Processing is Spark. These two have one thing in common which is what they are processing is a fixed data. Once the processing job starts, you cannot change the input data at all. This gives some disadvantage for real time data analysis.  


Now, for real time analysis, we introduce stream processing. Here is a concept of stream processing[1]: 


Kafka and Samza: Real-time stream processing_第2张图片


In our situation of Kafka + Samza, Samza is the processing framework. Kafka only is a source of organising stream as topics and messages. Now, let's take a look of the details.


 Kafka and Samza: Real-time stream processing_第3张图片


Here is some concepts in Kafka:


Kafka and Samza: Real-time stream processing_第4张图片


Here are some basic concepts about Samza: 


Kafka and Samza: Real-time stream processing_第5张图片


NM = Node Manager; RM = Resource Manager.


Here is a typical job of Samza: 


Kafka and Samza: Real-time stream processing_第6张图片


In general, one task in Samza is one consumer in Kafka. One stream in the input streams is one partition of topic in kafka. 


Reference:


[1] 15619 Cloud Computing CMU


你可能感兴趣的:(Stream,kafka,processing,Samza)