Samza relevant concept

For kafka:

topics are partitioned into partitions by key;

partitions are on brokers; each broker can hold partitions from different topics

each consumers group hold different consumers, each consumer receives data from multiple partitions;

each producer/topic writes to multiple partitions.

Samza relevant concept_第1张图片

For yarn:

NodeManager, which is responsible for launching processes on that machine

ResourceManager talks to all of the NodeManagers to tell them what to run

ApplicationMaster, is actually application-specific code that runs in the YARN cluster

 

 

Samza supports 2 kinds of  processing:

stateless processing: does not retain any state associated with the current message after it has been processed

stateful processing: requires you to record some state about a message even after processing it

 

Samza supports two notions of time: processing time and embedded source time

 

Samza guarantee each record is processed at least once

 

Samza's cordinator supports both embedded library model(kafka) and framework model(flink).

 

Samza supports both in-order and out-of-order processing. 

Each thread runs one or more tasks

 

Samza relevant concept_第2张图片

reference:http://samza.apache.org/learn/documentation/latest/core-concepts/core-concepts.html

你可能感兴趣的:(Samza relevant concept)