storm trident 如何标记一个batch被处理——coordinator spout

Splitting a stream has no effect on the batch. If you join the stream back together, then yes, it will be the same batch.

Tuples are passed between partitions in the order they're emitted (repartitioning happens on groupBy, partitioning operations, and global aggregations). These are the same semantics you get from Storm.

State updates are ordered among batches.

Each batch has both a "txid" and an "attempt id". The attempt id is a random long. This ensures that Storm can distinguish between multiple attempts for the same batch.

Batches are controlled by a single coordinator thread (which is a regular Storm spout) that determines when batches get processed and when they get committed (commits are when state updates happen).

The coordinator also ensures the ordering.

The coordinator abstraction is actually quite elegant. It builds upon the primitives that the tuple tree/acking framework provides to implement a relatively sophisticated distributed coordination algorithm.

Also, is there any way to turn off acking in Trident? Not tagging tuples with message IDs and setting ackers to 0 don't seem to work (the latter causes a stack overflow).

No, you can't. Acking isn't really expensive in Trident as long as your batches are of non-trivial size.

https://groups.google.com/forum/#!topic/storm-user/AUajG72kxmo

你可能感兴趣的:(storm trident 如何标记一个batch被处理——coordinator spout)