trident是storm的更高层次抽象,相对storm,它主要提供了3个方面的好处:
(1)提供了更高层次的抽象,将常用的count,sum等封装成了方法,可以直接调用,不需要自己实现。
(2)以批次代替单个元组,每次处理一个批次的数据。
(3)提供了事务支持,可以保证数据均处理且只处理了一次。
本文介绍了在一个Trident拓扑中,spout是如何被产生并被调用的。首先介绍了用户如何创建一个Spout以及其基本原理,然后介绍了Spout的实际数据流,最后解释了在创建topo时如何设置一个Spout。
MaterBatchCorodeinator —————> ITridentSpout.Coordinator#isReady
|
|
v
TridentSpoutCoordinator —————> ITridentSpout.Coordinator#[initialTransaction, success, close]
|
|
v
TridentSpoutExecutor —————> ITridentSpout.Emitter#(emitBatch, success(),close)
Spout中涉及2组类,第一组类定义了用户如何创建一个Spout,这些用户的代码会被第二组的类调用。第二组类定义了实际的数据流是如何发起并传送的。
涉及三个类:ItridentSpout, BatchCoordinator, Emitter,其中后面2个是第一个的内部类。
用户创建一个Spout需要实现上述三个接口。比如storm-kafka中的Spout就是实现了这3个接口或者其子接口。
也是涉及三个类:MasterBatchCoordinator, TridentSpoutCoordinator, TridentSpoutExecutor。它们除了自身固定的逻辑以外,还会调用用户的代码,就是上面介绍的Spout代码。
它们的定义分别为:
MasterBatchCoordinator extends BaseRichSpout
TridentSpoutCoordinator implements IBasicBolt
TridentSpoutExecutor implements ITridentBatchBolt
可以看出来,MasterBatchCoordinator才是真正的spout,另外2个都是bolt。
MasterBatchCoordinator会调用用户定义的BatchCoordinator的isReady()方法,返回true的话,则会发送一个id为 batch的消息流,从而开始一个数据流转。TridentSpoutCoordinator接到MBC的 batch流后,会调用BatchCoordinator的initialTransaction()初始化一个消息,并继续向外发送 batch流。TridentSpoutExecutor接到 batch流后,会调用用户代码中的TridentSpoutExecutor#emitBatch()方法,开始发送实际的业务数据。
(1)MasterBatchCoordinator是Trident中真正的Spout,它可以包含多个TridentSpoutCoordinator的节点。MBC向外发送id为$batch的流,作为整个数据流的起点。
if(!_activeTx.containsKey(curr) && isReady(curr)) {
..........
_collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);
..........
}
(2)当整个消息被成功处理完后,会调用MBC的ack()方法,ack方法会将事务的状态从PROCESSING改为PROCESSED:
if(status.status==AttemptStatus.PROCESSING) {
status.status = AttemptStatus.PROCESSED;
}
当然,如果fail掉了,则会调用fail()方法。
当sync()方法接收到事务状态为PROCESSED时,将其改为COMMITTING的状态,并向外发送id为$commit的流。
if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {
maybeCommit.status = AttemptStatus.COMMITTING;
_collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);
}
(3)当$commit流处理完后,MBC的ack方法会被再次调用,同时向外发送$success流
else if(status.status==AttemptStatus.COMMITTING) {
//如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。
_activeTx.remove(tx.getTransactionId());
_attemptIds.remove(tx.getTransactionId());
_collector.emit(SUCCESS_STREAM_ID, new Values(tx));
_currTransaction = nextTransactionId(tx.getTransactionId());
for(TransactionalState state: _states) {
state.setData(CURRENT_TX, _currTransaction);
}
由上面分析可知,MBC依次发送$batch, $commit, $success流。
而TSC只处理$batch, $success 2个流,TSE处理全部三个流。
TSC处理$succss流:
if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
_state.cleanupBefore(attempt.getTransactionId());
_coord.success(attempt.getTransactionId());
}
主要是调用用户在coodinatior中定义 的success方法。
TSE处理$commit, $success流:
if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {
if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {
((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);
_activeBatches.remove(attempt.getTransactionId());
} else {
throw new FailedException("Received commit for different transaction attempt");
}
} else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
// valid to delete before what's been committed since
// those batches will never be accessed again
_activeBatches.headMap(attempt.getTransactionId()).clear();
_emitter.success(attempt);
}
总结说就是消息是从MasterBatchCoordinator开始的,它是一个真正的spout,而TridentSpoutCoordinator与TridentSpoutExecutor都是bolt,MasterBatchCoordinator发起协调消息,最后的结果是TridentSpoutExecutor发送业务消息。而发送协调消息与业务消息的都是调用用户Spout中BatchCoordinator与Emitter中定义的代码。
可以参考《storm源码分析》P458的流程图
(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。
在Trident中用户定义的Spout需要实现ItridentSpout接口。我们先看看ItridentSpout的定义
package storm.trident.spout;
import backtype.storm.task.TopologyContext;
import storm.trident.topology.TransactionAttempt;
import backtype.storm.tuple.Fields;
import java.io.Serializable;
import java.util.Map;
import storm.trident.operation.TridentCollector;
public interface ITridentSpout<T> extends Serializable {
public interface BatchCoordinator<X> {
X initializeTransaction(long txid, X prevMetadata, X currMetadata);
void success(long txid);
boolean isReady(long txid)
void close();
}
public interface Emitter<X> {
void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);
void success(TransactionAttempt tx);
void close();
}
BatchCoordinator<T> getCoordinator(String txStateId, Map conf, TopologyContext context);
Emitter<T> getEmitter(String txStateId, Map conf, TopologyContext context);
Map getComponentConfiguration();
Fields getOutputFields();
}
它有2个内部接口,分别是BatchCoordinator和Emitter,分别是用于协调的Spout接口和发送消息的Bolt接口。实现一个Spout的主要工作就在于实现这2个接口,创建实际工作的Coordinator和Emitter。Spout中提供了2个get方法用于分别用于指定使用哪个Coordinator和Emitter类,这些类会由用户定义。稍后我们再分析Coordinator和Emitter的内容。
除此之外,还提供了getComponentConfiguration用于获取配置信息,getOutputFields获取输出field。
我们再看看2个内部接口的代码。
public interface BatchCoordinator<X> {
X initializeTransaction(long txid, X prevMetadata, X currMetadata);
void success(long txid);
boolean isReady(long txid);
void close();
}
(1)initializeTransaction方法返回一个用户定义的事务元数据。X是用户自定义的与事务相关的数据类型,返回的数据会存储到zk中。
其中txid为事务序列号,prevMetadata是前一个事务所对应的元数据。若当前事务为第一个事务,则其为空。currMetadata是当前事务的元数据,如果是当前事务的第一次尝试,则为空,否则为事务上一次尝试所产生的元数据。
(2)isReady方法用于判断事务所对应的数据是否已经准备好,当为true时,表示可以开始一个新事务。其参数是当前的事务号。
BatchCoordinator中实现的方法会被部署到多个节点中运行,其中isReady是在真正的Spout(MasterBatchCoordinator)中执行的,其余方法在TridentSpoutCoordinator中执行。
public interface Emitter<X> {
void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);
void success(TransactionAttempt tx);
void close();
}
消息发送节点会接收协调spout的$batch和$success流。
(1)当收到$batch消息时,节点便调用emitBatch方法来发送消息。
(2)当收到$success消息时,会调用success方法对事务进行后处理
参考 DiagnosisEventSpout
(1)Spout的代码
package com.packtpub.storm.trident.spout;
import backtype.storm.task.TopologyContext;
import backtype.storm.tuple.Fields;
import storm.trident.spout.ITridentSpout;
import java.util.Map;
@SuppressWarnings("rawtypes")
public class DiagnosisEventSpout implements ITridentSpout<Long> {
private static final long serialVersionUID = 1L;
BatchCoordinator<Long> coordinator = new DefaultCoordinator();
Emitter<Long> emitter = new DiagnosisEventEmitter();
@Override
public BatchCoordinator<Long> getCoordinator(String txStateId, Map conf, TopologyContext context) {
return coordinator;
}
@Override
public Emitter<Long> getEmitter(String txStateId, Map conf, TopologyContext context) {
return emitter;
}
@Override
public Map getComponentConfiguration() {
return null;
}
@Override
public Fields getOutputFields() {
return new Fields("event");
}
}
(2)BatchCoordinator的代码
package com.packtpub.storm.trident.spout;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import storm.trident.spout.ITridentSpout.BatchCoordinator;
import java.io.Serializable;
public class DefaultCoordinator implements BatchCoordinator<Long>, Serializable {
private static final long serialVersionUID = 1L;
private static final Logger LOG = LoggerFactory.getLogger(DefaultCoordinator.class);
@Override
public boolean isReady(long txid) {
return true;
}
@Override
public void close() {
}
@Override
public Long initializeTransaction(long txid, Long prevMetadata, Long currMetadata) {
LOG.info("Initializing Transaction [" + txid + "]");
return null;
}
@Override
public void success(long txid) {
LOG.info("Successful Transaction [" + txid + "]");
}
}
(3)Emitter的代码
package com.packtpub.storm.trident.spout;
import com.packtpub.storm.trident.model.DiagnosisEvent;
import storm.trident.operation.TridentCollector;
import storm.trident.spout.ITridentSpout.Emitter;
import storm.trident.topology.TransactionAttempt;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
public class DiagnosisEventEmitter implements Emitter<Long>, Serializable {
private static final long serialVersionUID = 1L;
AtomicInteger successfulTransactions = new AtomicInteger(0);
@Override
public void emitBatch(TransactionAttempt tx, Long coordinatorMeta, TridentCollector collector) {
for (int i = 0; i < 10000; i++) {
List<Object> events = new ArrayList<Object>();
double lat = new Double(-30 + (int) (Math.random() * 75));
double lng = new Double(-120 + (int) (Math.random() * 70));
long time = System.currentTimeMillis();
String diag = new Integer(320 + (int) (Math.random() * 7)).toString();
DiagnosisEvent event = new DiagnosisEvent(lat, lng, time, diag);
events.add(event);
collector.emit(events);
}
}
@Override
public void success(TransactionAttempt tx) {
successfulTransactions.incrementAndGet();
}
@Override
public void close() {
}
}
(4)最后,在创建topo时指定spout
TridentTopology topology = new TridentTopology();
DiagnosisEventSpout spout = new DiagnosisEventSpout();
Stream inputStream = topology.newStream("event", spout);
以上的内容说明了如何在用户代码中创建一个Spout,以及其基本原理。但创建Spout后,它是怎么被加载到拓扑真正的Spout中呢?我们继续看trident的实现。
总体而言,MasterBatchCoordinator作为一个数据流的真正起点:
* 首先调用open方法完成初始化,包括读取之前的拓扑处理到的事务序列号,最多同时处理的tuple数量,每个事务的尝试次数等。
* 然后nextTuple会改变事务的状态,或者是创建事务并发送$batch流。
* 最后,ack方法会根据流的状态向外发送$commit流,或者是重新调用sync方法,开始创建新的事务。
总而言之,MasterBatchCoordinator作为拓扑数据流的真正起点,通过循环发送协调信息,不断的处理数据流。MasterBatchCoordinator的真正作用在于协调消息的起点,里面所有的map,如_activeTx,_attemptIds等都只是为了保存当前正在处理的情况而已。
(1)MasterBatchCoordinator是一个真正的spout
public class MasterBatchCoordinator extends BaseRichSpout
一个Trident拓扑的真正逻辑就是从MasterBatchCoordinator开始的,先调用open方法完成一些初始化,然后是在nextTuple中发送$batch和$commit流。
(2)看一下open方法
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1);
for(String spoutId: _managedSpoutIds) {
//每个MasterBatchSpout可以处理多个ITridentSpout,这里将多个spout的元数据放到_states这个Map中。稍后再看看放进来的是什么内容。
_states.add(TransactionalState.newCoordinatorState(conf, spoutId));
}
//从zk中获取当前的transation事务序号,当拓扑新启动时,需要从zk恢复之前的状态。也就是说zk存储的是下一个需要提交的事务序号,而不是已经提交的事务序号。
_currTransaction = getStoredCurrTransaction();
_collector = collector;
//任何时刻中,一个spout task最多可以同时处理的tuple数量,即已经emite,但未acked的tuple数量。
Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING);
if(active==null) {
_maxTransactionActive = 1;
} else {
_maxTransactionActive = active.intValue();
}
//每一个事务的当前尝试编号,即_currTransaction这个事务序号中,各个事务的尝试次数。
_attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive);
for(int i=0; i<_spouts.size(); i++) {
//将各个Spout的Coordinator保存在_coordinators这个List中。
String txId = _managedSpoutIds.get(i);
_coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context));
}
}
(3)再看一下nextTuple()方法,它只调用了sync()方法,主要完成了以下功能:
* 如果事务状态是PROCESSED,则将其状态改为COMMITTING,然后发送 commit流。接收到 commit流的节点会调用finishBatch方法,进行事务的提交和后处理
* 如果_activeTx.size()小于_maxTransactionActive,则新建事务,放到_activeTx中,同时向外发送$batch流,等待Coordinator的处理。( 当ack方法被 调用时,这个事务会被从_activeTx中移除)
注意:当前处于acitve状态的应该是序列在[_currTransaction,_currTransaction+_maxTransactionActive-1]之间的事务。
private void sync() {
// note that sometimes the tuples active may be less than max_spout_pending, e.g.
// max_spout_pending = 3
// tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet),
// and there won't be a batch for tx 4 because there's max_spout_pending tx active
//判断当前事务_currTransaction是否为PROCESSED状态,如果是的话,将其状态改为COMMITTING,然后发送$commit流。接收到$commit流的节点会调用finishBatch方法,进行事务的提交和后处理。
TransactionStatus maybeCommit = _activeTx.get(_currTransaction);
if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {
maybeCommit.status = AttemptStatus.COMMITTING;
_collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);
}
//用于产生一个新事务。最多存在_maxTransactionActive个事务同时运行,当前active的事务序号区间处于[_currTransaction,_currTransaction+_maxTransactionActive-1]之间。注意只有在当前
//事务结束之后,系统才会初始化新的事务,所以系统中实际活跃的事务可能少于_maxTransactionActive。
if(_active) {
if(_activeTx.size() < _maxTransactionActive) {
Long curr = _currTransaction;
//创建_maxTransactionActive个事务。
for(int i=0; i<_maxTransactionActive; i++) {
//如果事务序号不存在_activeTx中,则创建新事务,并发送$batch流。当ack被调用时,这个序号会被remove掉,详见ack方法。
if(!_activeTx.containsKey(curr) && isReady(curr)) {
// by using a monotonically increasing attempt id, downstream tasks
// can be memory efficient by clearing out state for old attempts
// as soon as they see a higher attempt id for a transaction
Integer attemptId = _attemptIds.get(curr);
if(attemptId==null) {
attemptId = 0;
} else {
attemptId++;
}
//_activeTx记录的是事务序号和事务状态的map,而_activeTx则记录事务序号与尝试次数的map。
_attemptIds.put(curr, attemptId);
for(TransactionalState state: _states) {
state.setData(CURRENT_ATTEMPTS, _attemptIds);
}
//TransactionAttempt包含事务序号和尝试编号2个变量,对应于一个具体的事务。
TransactionAttempt attempt = new TransactionAttempt(curr, attemptId);
_activeTx.put(curr, new TransactionStatus(attempt));
_collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);
_throttler.markEvent();
}
//如果事务序号已经存在_activeTx中,则curr递增,然后再循环检查下一个。
curr = nextTransactionId(curr);
}
}
}
}
完整代码见最后。
(4)继续往下,看看ack方法。
@Override
public void ack(Object msgId) {
//获取某个事务的状态
TransactionAttempt tx = (TransactionAttempt) msgId;
TransactionStatus status = _activeTx.get(tx.getTransactionId());
if(status!=null && tx.equals(status.attempt)) {
//如果当前状态是PROCESSING,则改为PROCESSED
if(status.status==AttemptStatus.PROCESSING) {
status.status = AttemptStatus.PROCESSED;
} else if(status.status==AttemptStatus.COMMITTING) {
//如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。
_activeTx.remove(tx.getTransactionId());
_attemptIds.remove(tx.getTransactionId());
_collector.emit(SUCCESS_STREAM_ID, new Values(tx));
_currTransaction = nextTransactionId(tx.getTransactionId());
for(TransactionalState state: _states) {
state.setData(CURRENT_TX, _currTransaction);
}
}
//由于有些事务状态已经改变,需要重新调用sync()继续后续处理,或者发送新tuple。
sync();
}
}
(5)还有fail方法和declareOutputFileds方法。
@Override
public void fail(Object msgId) {
TransactionAttempt tx = (TransactionAttempt) msgId;
TransactionStatus stored = _activeTx.remove(tx.getTransactionId());
if(stored!=null && tx.equals(stored.attempt)) {
_activeTx.tailMap(tx.getTransactionId()).clear();
sync();
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// in partitioned example, in case an emitter task receives a later transaction than it's emitted so far,
// when it sees the earlier txid it should know to emit nothing
declarer.declareStream(BATCH_STREAM_ID, new Fields("tx"));
declarer.declareStream(COMMIT_STREAM_ID, new Fields("tx"));
declarer.declareStream(SUCCESS_STREAM_ID, new Fields("tx"));
}
TridentSpoutCoordinator接收来自MasterBatchCoordinator的$success流与$batch流,并通过调用用户代码,实现真正的逻辑。此外还向TridentSpoutExecuter发送$batch流,以触发后者开始真正发送业务数据流。
(1)TridentSpoutCoordinator是一个bolt
public class TridentSpoutCoordinator implements IBasicBolt
(2)在创建TridentSpoutCoordinator时,需要传递一个ITridentSpout对象,
public TridentSpoutCoordinator(String id, ITridentSpout spout) {
_spout = spout;
_id = id;
}
然后使用这个对象来获取到用户定义的Coordinator:
_coord = _spout.getCoordinator(_id, conf, context);
(3)_state和_underlyingState保存了zk中的元数据信息
_underlyingState = TransactionalState.newCoordinatorState(conf, _id);
_state = new RotatingTransactionalState(_underlyingState, META_DIR);
(4)在execute方法中,TridentSpoutCoordinator接收$success流与$batch流,先看看$success流:
if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
_state.cleanupBefore(attempt.getTransactionId());
_coord.success(attempt.getTransactionId());
}
即接收到$success流时,调用用户定义的Coordinator中的success方法。同时还清理了zk中的数据。
(5)再看看$batch流
else {
long txid = attempt.getTransactionId();
Object prevMeta = _state.getPreviousState(txid);
Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid));
_state.overrideState(txid, meta);
collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta));
}
当收到$batch流流时,初始化一个事务并将其发送出去。由于在trident中消息有可能是重放的,因此需要prevMeta。注意,trident是在bolt中初始化一个事务的。
TridentSpoutExecutor接收来自TridentSpoutCoordinator的消息流,包括$commit,$success与$batch流,前面2个分别调用emmitter的commit与success方法,$batch则调用emmitter的emitBatch方法,开始向外发送业务数据。
(1) TridentSpoutExecutor与是一个bolt
publicclassTridentSpoutExecutorimplementsITridentBatchBolt
(2)核心的execute方法
@Override
public void execute(BatchInfo info, Tuple input) {
// there won't be a BatchInfo for the success stream
TransactionAttempt attempt = (TransactionAttempt) input.getValue(0);
if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {
if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {
((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);
_activeBatches.remove(attempt.getTransactionId());
} else {
throw new FailedException("Received commit for different transaction attempt");
}
} else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
// valid to delete before what's been committed since
// those batches will never be accessed again
_activeBatches.headMap(attempt.getTransactionId()).clear();
_emitter.success(attempt);
} else {
_collector.setBatch(info.batchId);
//发送业务消息
_emitter.emitBatch(attempt, input.getValue(1), _collector);
_activeBatches.put(attempt.getTransactionId(), attempt);
}
}
通过上面的分析,一个Spout是准备好了,但如何将它加载到拓扑中,并开始真正的数据流:
(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。
在TridentTopologyBuilder中的buildTopology的前半部分中,设置了Spout的相关信息。后半部分设置了bolt的信息。这里我们只看spout相关的内容:
TopologyBuilder builder = new TopologyBuilder();
Map<GlobalStreamId, String> batchIdsForSpouts = fleshOutStreamBatchIds(false);
Map<GlobalStreamId, String> batchIdsForBolts = fleshOutStreamBatchIds(true);
Map<String, List<String>> batchesToCommitIds = new HashMap<String, List<String>>();
Map<String, List<ITridentSpout>> batchesToSpouts = new HashMap<String, List<ITridentSpout>>();
for(String id: _spouts.keySet()) {
TransactionalSpoutComponent c = _spouts.get(id);
if(c.spout instanceof IRichSpout) {
//TODO: wrap this to set the stream name
builder.setSpout(id, (IRichSpout) c.spout, c.parallelism);
} else {
String batchGroup = c.batchGroupId;
if(!batchesToCommitIds.containsKey(batchGroup)) {
batchesToCommitIds.put(batchGroup, new ArrayList<String>());
}
batchesToCommitIds.get(batchGroup).add(c.commitStateId);
if(!batchesToSpouts.containsKey(batchGroup)) {
batchesToSpouts.put(batchGroup, new ArrayList<ITridentSpout>());
}
batchesToSpouts.get(batchGroup).add((ITridentSpout) c.spout);
BoltDeclarer scd =
builder.setBolt(spoutCoordinator(id), new TridentSpoutCoordinator(c.commitStateId, (ITridentSpout) c.spout))
.globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.BATCH_STREAM_ID)
.globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.SUCCESS_STREAM_ID);
for(Map m: c.componentConfs) {
scd.addConfigurations(m);
}
Map<String, TridentBoltExecutor.CoordSpec> specs = new HashMap();
specs.put(c.batchGroupId, new CoordSpec());
BoltDeclarer bd = builder.setBolt(id,
new TridentBoltExecutor(
new TridentSpoutExecutor(
c.commitStateId,
c.streamName,
((ITridentSpout) c.spout)),
batchIdsForSpouts,
specs),
c.parallelism);
bd.allGrouping(spoutCoordinator(id), MasterBatchCoordinator.BATCH_STREAM_ID);
bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.SUCCESS_STREAM_ID);
if(c.spout instanceof ICommitterTridentSpout) {
bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.COMMIT_STREAM_ID);
}
for(Map m: c.componentConfs) {
bd.addConfigurations(m);
}
}
}
for(String id: _batchPerTupleSpouts.keySet()) {
SpoutComponent c = _batchPerTupleSpouts.get(id);
SpoutDeclarer d = builder.setSpout(id, new RichSpoutBatchTriggerer((IRichSpout) c.spout, c.streamName, c.batchGroupId), c.parallelism);
for(Map conf: c.componentConfs) {
d.addConfigurations(conf);
}
}
for(String batch: batchesToCommitIds.keySet()) {
List<String> commitIds = batchesToCommitIds.get(batch);
builder.setSpout(masterCoordinator(batch), new MasterBatchCoordinator(commitIds, batchesToSpouts.get(batch)));
}
创建一个spout节点,并将之add到拓扑中。
public Stream newStream(String txId, ITridentSpout spout) {
Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);
return addNode(n);
}