Storm Trident中的Spout源码解读

    • 一概述
      • 1简介
      • 2关键类
        • 1Spout的创建
        • 2spout的消息流
      • 3spout调用的整体流程
      • 4TSC与TSE
      • 5spout如何被 加载到拓扑中
    • 二Spout的创建
      • 1ItridentSpout
      • 2BatchCoordinator
      • 3Emmitter
      • 4一个示例
    • 三spout实际的消息流
      • 1MasterBatchCoordinator
      • 2TridentSpoutCoordinator
      • 3TridentSpoutExecutor
    • 四在TridentTopologyBuilder中设置Spout
      • 1TridentTopologyBuilder
      • 2TridentTopology

(一)概述

1、简介

trident是storm的更高层次抽象,相对storm,它主要提供了3个方面的好处:
(1)提供了更高层次的抽象,将常用的count,sum等封装成了方法,可以直接调用,不需要自己实现。
(2)以批次代替单个元组,每次处理一个批次的数据。
(3)提供了事务支持,可以保证数据均处理且只处理了一次。

本文介绍了在一个Trident拓扑中,spout是如何被产生并被调用的。首先介绍了用户如何创建一个Spout以及其基本原理,然后介绍了Spout的实际数据流,最后解释了在创建topo时如何设置一个Spout。

2、关键类

MaterBatchCorodeinator —————> ITridentSpout.Coordinator#isReady
|
|
v
TridentSpoutCoordinator —————> ITridentSpout.Coordinator#[initialTransaction, success, close]
|
|
v
TridentSpoutExecutor —————> ITridentSpout.Emitter#(emitBatch, success(),close)

Spout中涉及2组类,第一组类定义了用户如何创建一个Spout,这些用户的代码会被第二组的类调用。第二组类定义了实际的数据流是如何发起并传送的。

(1)Spout的创建

涉及三个类:ItridentSpout, BatchCoordinator, Emitter,其中后面2个是第一个的内部类。
用户创建一个Spout需要实现上述三个接口。比如storm-kafka中的Spout就是实现了这3个接口或者其子接口。

(2)spout的消息流

也是涉及三个类:MasterBatchCoordinator, TridentSpoutCoordinator, TridentSpoutExecutor。它们除了自身固定的逻辑以外,还会调用用户的代码,就是上面介绍的Spout代码。
它们的定义分别为:

MasterBatchCoordinator extends BaseRichSpout
TridentSpoutCoordinator implements IBasicBolt
TridentSpoutExecutor implements ITridentBatchBolt

可以看出来,MasterBatchCoordinator才是真正的spout,另外2个都是bolt。
MasterBatchCoordinator会调用用户定义的BatchCoordinator的isReady()方法,返回true的话,则会发送一个id为 batchTridentSpoutCoordinatorMBC batch流后,会调用BatchCoordinator的initialTransaction()初始化一个消息,并继续向外发送 batchTridentSpoutExecutor batch流后,会调用用户代码中的TridentSpoutExecutor#emitBatch()方法,开始发送实际的业务数据。

3、spout调用的整体流程

(1)MasterBatchCoordinator是Trident中真正的Spout,它可以包含多个TridentSpoutCoordinator的节点。MBC向外发送id为$batch的流,作为整个数据流的起点。

if(!_activeTx.containsKey(curr) && isReady(curr)) {
       ..........
      _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);
         ..........
                }

(2)当整个消息被成功处理完后,会调用MBC的ack()方法,ack方法会将事务的状态从PROCESSING改为PROCESSED:

if(status.status==AttemptStatus.PROCESSING) {
     status.status = AttemptStatus.PROCESSED;
}

当然,如果fail掉了,则会调用fail()方法。
当sync()方法接收到事务状态为PROCESSED时,将其改为COMMITTING的状态,并向外发送id为$commit的流。

if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {
            maybeCommit.status = AttemptStatus.COMMITTING;
            _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);
        }

(3)当$commit流处理完后,MBC的ack方法会被再次调用,同时向外发送$success流

else if(status.status==AttemptStatus.COMMITTING) {
                //如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。
                _activeTx.remove(tx.getTransactionId());
                _attemptIds.remove(tx.getTransactionId());
                _collector.emit(SUCCESS_STREAM_ID, new Values(tx));
                _currTransaction = nextTransactionId(tx.getTransactionId());
                for(TransactionalState state: _states) {
                    state.setData(CURRENT_TX, _currTransaction);                    
                }

4、TSC与TSE

由上面分析可知,MBC依次发送$batch, $commit, $success流。
而TSC只处理$batch, $success 2个流,TSE处理全部三个流。

TSC处理$succss流:

if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
        _state.cleanupBefore(attempt.getTransactionId());
        _coord.success(attempt.getTransactionId());
    }

主要是调用用户在coodinatior中定义 的success方法。

TSE处理$commit, $success流:

if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {
        if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {
            ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);
            _activeBatches.remove(attempt.getTransactionId());
        } else {
             throw new FailedException("Received commit for different transaction attempt");
        }
    } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
        // valid to delete before what's been committed since 
        // those batches will never be accessed again
        _activeBatches.headMap(attempt.getTransactionId()).clear();
        _emitter.success(attempt);
    }

总结说就是消息是从MasterBatchCoordinator开始的,它是一个真正的spout,而TridentSpoutCoordinator与TridentSpoutExecutor都是bolt,MasterBatchCoordinator发起协调消息,最后的结果是TridentSpoutExecutor发送业务消息。而发送协调消息与业务消息的都是调用用户Spout中BatchCoordinator与Emitter中定义的代码。

可以参考《storm源码分析》P458的流程图

5、spout如何被 加载到拓扑中

(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。

(二)Spout的创建

1、ItridentSpout

在Trident中用户定义的Spout需要实现ItridentSpout接口。我们先看看ItridentSpout的定义

package storm.trident.spout;

import backtype.storm.task.TopologyContext;
import storm.trident.topology.TransactionAttempt;
import backtype.storm.tuple.Fields;
import java.io.Serializable;
import java.util.Map;
import storm.trident.operation.TridentCollector;


public interface ITridentSpout<T> extends Serializable {
    public interface BatchCoordinator<X> {
        X initializeTransaction(long txid, X prevMetadata, X currMetadata);       
        void success(long txid);  
        boolean isReady(long txid)
        void close();
    }

    public interface Emitter<X> {
        void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);
        void success(TransactionAttempt tx);
        void close();
    }

    BatchCoordinator<T> getCoordinator(String txStateId, Map conf, TopologyContext context);
    Emitter<T> getEmitter(String txStateId, Map conf, TopologyContext context); 

    Map getComponentConfiguration();
    Fields getOutputFields();
}

它有2个内部接口,分别是BatchCoordinator和Emitter,分别是用于协调的Spout接口和发送消息的Bolt接口。实现一个Spout的主要工作就在于实现这2个接口,创建实际工作的Coordinator和Emitter。Spout中提供了2个get方法用于分别用于指定使用哪个Coordinator和Emitter类,这些类会由用户定义。稍后我们再分析Coordinator和Emitter的内容。
除此之外,还提供了getComponentConfiguration用于获取配置信息,getOutputFields获取输出field。

我们再看看2个内部接口的代码。

2、BatchCoordinator

public interface BatchCoordinator<X> {
     X initializeTransaction(long txid, X prevMetadata, X currMetadata);
     void success(long txid);
     boolean isReady(long txid);
     void close();
}

(1)initializeTransaction方法返回一个用户定义的事务元数据。X是用户自定义的与事务相关的数据类型,返回的数据会存储到zk中。
其中txid为事务序列号,prevMetadata是前一个事务所对应的元数据。若当前事务为第一个事务,则其为空。currMetadata是当前事务的元数据,如果是当前事务的第一次尝试,则为空,否则为事务上一次尝试所产生的元数据。
(2)isReady方法用于判断事务所对应的数据是否已经准备好,当为true时,表示可以开始一个新事务。其参数是当前的事务号。
BatchCoordinator中实现的方法会被部署到多个节点中运行,其中isReady是在真正的Spout(MasterBatchCoordinator)中执行的,其余方法在TridentSpoutCoordinator中执行。

3、Emmitter

public interface Emitter<X> {
     void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);
     void success(TransactionAttempt tx);
     void close();
}

消息发送节点会接收协调spout的$batch和$success流。
(1)当收到$batch消息时,节点便调用emitBatch方法来发送消息。
(2)当收到$success消息时,会调用success方法对事务进行后处理

4、一个示例

参考 DiagnosisEventSpout

(1)Spout的代码

package com.packtpub.storm.trident.spout;

import backtype.storm.task.TopologyContext;
import backtype.storm.tuple.Fields;
import storm.trident.spout.ITridentSpout;

import java.util.Map;

@SuppressWarnings("rawtypes")
public class DiagnosisEventSpout implements ITridentSpout<Long> {
    private static final long serialVersionUID = 1L;
    BatchCoordinator<Long> coordinator = new DefaultCoordinator();
    Emitter<Long> emitter = new DiagnosisEventEmitter();

    @Override
    public BatchCoordinator<Long> getCoordinator(String txStateId, Map conf, TopologyContext context) {
        return coordinator;
    }

    @Override
    public Emitter<Long> getEmitter(String txStateId, Map conf, TopologyContext context) {
        return emitter;
    }

    @Override
    public Map getComponentConfiguration() {
        return null;
    }

    @Override
    public Fields getOutputFields() {
        return new Fields("event");
    }
}

(2)BatchCoordinator的代码

package com.packtpub.storm.trident.spout;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import storm.trident.spout.ITridentSpout.BatchCoordinator;

import java.io.Serializable;

public class DefaultCoordinator implements BatchCoordinator<Long>, Serializable {
    private static final long serialVersionUID = 1L;
    private static final Logger LOG = LoggerFactory.getLogger(DefaultCoordinator.class);

    @Override
    public boolean isReady(long txid) {
        return true;
    }

    @Override
    public void close() {
    }

    @Override
    public Long initializeTransaction(long txid, Long prevMetadata, Long currMetadata) {
        LOG.info("Initializing Transaction [" + txid + "]");
        return null;
    }

    @Override
    public void success(long txid) {
        LOG.info("Successful Transaction [" + txid + "]");
    }
}

(3)Emitter的代码

package com.packtpub.storm.trident.spout;

import com.packtpub.storm.trident.model.DiagnosisEvent;
import storm.trident.operation.TridentCollector;
import storm.trident.spout.ITridentSpout.Emitter;
import storm.trident.topology.TransactionAttempt;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;

public class DiagnosisEventEmitter implements Emitter<Long>, Serializable {
    private static final long serialVersionUID = 1L;
    AtomicInteger successfulTransactions = new AtomicInteger(0);

    @Override
    public void emitBatch(TransactionAttempt tx, Long coordinatorMeta, TridentCollector collector) {
        for (int i = 0; i < 10000; i++) {
            List<Object> events = new ArrayList<Object>();
            double lat = new Double(-30 + (int) (Math.random() * 75));
            double lng = new Double(-120 + (int) (Math.random() * 70));
            long time = System.currentTimeMillis();

            String diag = new Integer(320 + (int) (Math.random() * 7)).toString();
            DiagnosisEvent event = new DiagnosisEvent(lat, lng, time, diag);
            events.add(event);
            collector.emit(events);
        }
    }

    @Override
    public void success(TransactionAttempt tx) {
        successfulTransactions.incrementAndGet();
    }

    @Override
    public void close() {
    }

}

(4)最后,在创建topo时指定spout

    TridentTopology topology = new TridentTopology();
    DiagnosisEventSpout spout = new DiagnosisEventSpout();
    Stream inputStream = topology.newStream("event", spout);

(三)spout实际的消息流

以上的内容说明了如何在用户代码中创建一个Spout,以及其基本原理。但创建Spout后,它是怎么被加载到拓扑真正的Spout中呢?我们继续看trident的实现。

1、MasterBatchCoordinator

总体而言,MasterBatchCoordinator作为一个数据流的真正起点:
* 首先调用open方法完成初始化,包括读取之前的拓扑处理到的事务序列号,最多同时处理的tuple数量,每个事务的尝试次数等。
* 然后nextTuple会改变事务的状态,或者是创建事务并发送$batch流。
* 最后,ack方法会根据流的状态向外发送$commit流,或者是重新调用sync方法,开始创建新的事务。

总而言之,MasterBatchCoordinator作为拓扑数据流的真正起点,通过循环发送协调信息,不断的处理数据流。MasterBatchCoordinator的真正作用在于协调消息的起点,里面所有的map,如_activeTx,_attemptIds等都只是为了保存当前正在处理的情况而已。

(1)MasterBatchCoordinator是一个真正的spout

  public class MasterBatchCoordinator extends BaseRichSpout 

一个Trident拓扑的真正逻辑就是从MasterBatchCoordinator开始的,先调用open方法完成一些初始化,然后是在nextTuple中发送$batch和$commit流。

(2)看一下open方法

   @Override
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        _throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1);
        for(String spoutId: _managedSpoutIds) {
            //每个MasterBatchSpout可以处理多个ITridentSpout,这里将多个spout的元数据放到_states这个Map中。稍后再看看放进来的是什么内容。
            _states.add(TransactionalState.newCoordinatorState(conf, spoutId));
        }
        //从zk中获取当前的transation事务序号,当拓扑新启动时,需要从zk恢复之前的状态。也就是说zk存储的是下一个需要提交的事务序号,而不是已经提交的事务序号。
        _currTransaction = getStoredCurrTransaction();

        _collector = collector;

        //任何时刻中,一个spout task最多可以同时处理的tuple数量,即已经emite,但未acked的tuple数量。
        Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING);
        if(active==null) {
            _maxTransactionActive = 1;
        } else {
            _maxTransactionActive = active.intValue();
        }
        //每一个事务的当前尝试编号,即_currTransaction这个事务序号中,各个事务的尝试次数。
        _attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive);


        for(int i=0; i<_spouts.size(); i++) {
            //将各个Spout的Coordinator保存在_coordinators这个List中。
            String txId = _managedSpoutIds.get(i);
            _coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context));
        }
    }

(3)再看一下nextTuple()方法,它只调用了sync()方法,主要完成了以下功能:
* 如果事务状态是PROCESSED,则将其状态改为COMMITTING,然后发送 commit commit流的节点会调用finishBatch方法,进行事务的提交和后处理
* 如果_activeTx.size()小于_maxTransactionActive,则新建事务,放到_activeTx中,同时向外发送$batch流,等待Coordinator的处理。( 当ack方法被 调用时,这个事务会被从_activeTx中移除)
注意:当前处于acitve状态的应该是序列在[_currTransaction,_currTransaction+_maxTransactionActive-1]之间的事务。

    private void sync() {
    // note that sometimes the tuples active may be less than max_spout_pending, e.g.
    // max_spout_pending = 3
    // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet),
    // and there won't be a batch for tx 4 because there's max_spout_pending tx active
    //判断当前事务_currTransaction是否为PROCESSED状态,如果是的话,将其状态改为COMMITTING,然后发送$commit流。接收到$commit流的节点会调用finishBatch方法,进行事务的提交和后处理。
    TransactionStatus maybeCommit = _activeTx.get(_currTransaction);
    if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {
        maybeCommit.status = AttemptStatus.COMMITTING;
        _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);
    }
    //用于产生一个新事务。最多存在_maxTransactionActive个事务同时运行,当前active的事务序号区间处于[_currTransaction,_currTransaction+_maxTransactionActive-1]之间。注意只有在当前
    //事务结束之后,系统才会初始化新的事务,所以系统中实际活跃的事务可能少于_maxTransactionActive。
    if(_active) {
        if(_activeTx.size() < _maxTransactionActive) {
            Long curr = _currTransaction;
            //创建_maxTransactionActive个事务。
            for(int i=0; i<_maxTransactionActive; i++) {
                //如果事务序号不存在_activeTx中,则创建新事务,并发送$batch流。当ack被调用时,这个序号会被remove掉,详见ack方法。
                if(!_activeTx.containsKey(curr) && isReady(curr)) {
                    // by using a monotonically increasing attempt id, downstream tasks
                    // can be memory efficient by clearing out state for old attempts
                    // as soon as they see a higher attempt id for a transaction
                    Integer attemptId = _attemptIds.get(curr);
                    if(attemptId==null) {
                        attemptId = 0;
                    } else {
                        attemptId++;
                    }
                    //_activeTx记录的是事务序号和事务状态的map,而_activeTx则记录事务序号与尝试次数的map。
                    _attemptIds.put(curr, attemptId);
                    for(TransactionalState state: _states) {
                        state.setData(CURRENT_ATTEMPTS, _attemptIds);
                    }
                    //TransactionAttempt包含事务序号和尝试编号2个变量,对应于一个具体的事务。
                    TransactionAttempt attempt = new TransactionAttempt(curr, attemptId);
                    _activeTx.put(curr, new TransactionStatus(attempt));
                    _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);
                    _throttler.markEvent();
                }
                //如果事务序号已经存在_activeTx中,则curr递增,然后再循环检查下一个。
                curr = nextTransactionId(curr);
            }
        }
    }
}

完整代码见最后。

(4)继续往下,看看ack方法。

@Override
public void ack(Object msgId) {
    //获取某个事务的状态
    TransactionAttempt tx = (TransactionAttempt) msgId;
    TransactionStatus status = _activeTx.get(tx.getTransactionId());

    if(status!=null && tx.equals(status.attempt)) {
        //如果当前状态是PROCESSING,则改为PROCESSED
        if(status.status==AttemptStatus.PROCESSING) {
            status.status = AttemptStatus.PROCESSED;
        } else if(status.status==AttemptStatus.COMMITTING) {
            //如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。
            _activeTx.remove(tx.getTransactionId());
            _attemptIds.remove(tx.getTransactionId());
            _collector.emit(SUCCESS_STREAM_ID, new Values(tx));
            _currTransaction = nextTransactionId(tx.getTransactionId());
            for(TransactionalState state: _states) {
                state.setData(CURRENT_TX, _currTransaction);                    
            }
        }
        //由于有些事务状态已经改变,需要重新调用sync()继续后续处理,或者发送新tuple。
        sync();
    }
}

(5)还有fail方法和declareOutputFileds方法。

@Override
public void fail(Object msgId) {
    TransactionAttempt tx = (TransactionAttempt) msgId;
    TransactionStatus stored = _activeTx.remove(tx.getTransactionId());
    if(stored!=null && tx.equals(stored.attempt)) {
        _activeTx.tailMap(tx.getTransactionId()).clear();
        sync();
    }
}

@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
    // in partitioned example, in case an emitter task receives a later transaction than it's emitted so far,
    // when it sees the earlier txid it should know to emit nothing
    declarer.declareStream(BATCH_STREAM_ID, new Fields("tx"));
    declarer.declareStream(COMMIT_STREAM_ID, new Fields("tx"));
    declarer.declareStream(SUCCESS_STREAM_ID, new Fields("tx"));
}

2、TridentSpoutCoordinator

TridentSpoutCoordinator接收来自MasterBatchCoordinator的$success流与$batch流,并通过调用用户代码,实现真正的逻辑。此外还向TridentSpoutExecuter发送$batch流,以触发后者开始真正发送业务数据流。

(1)TridentSpoutCoordinator是一个bolt

 public class TridentSpoutCoordinator implements IBasicBolt

(2)在创建TridentSpoutCoordinator时,需要传递一个ITridentSpout对象,

 public TridentSpoutCoordinator(String id, ITridentSpout spout) {
        _spout = spout;
        _id = id;
    }

然后使用这个对象来获取到用户定义的Coordinator:

_coord = _spout.getCoordinator(_id, conf, context);

(3)_state和_underlyingState保存了zk中的元数据信息

_underlyingState = TransactionalState.newCoordinatorState(conf, _id);
_state = new RotatingTransactionalState(_underlyingState, META_DIR);

(4)在execute方法中,TridentSpoutCoordinator接收$success流与$batch流,先看看$success流:

if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
_state.cleanupBefore(attempt.getTransactionId());
_coord.success(attempt.getTransactionId());
}

即接收到$success流时,调用用户定义的Coordinator中的success方法。同时还清理了zk中的数据。
(5)再看看$batch流

else {
        long txid = attempt.getTransactionId();
        Object prevMeta = _state.getPreviousState(txid);
        Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid));
        _state.overrideState(txid, meta);
        collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta));
    }

当收到$batch流流时,初始化一个事务并将其发送出去。由于在trident中消息有可能是重放的,因此需要prevMeta。注意,trident是在bolt中初始化一个事务的。

3、TridentSpoutExecutor

TridentSpoutExecutor接收来自TridentSpoutCoordinator的消息流,包括$commit,$success与$batch流,前面2个分别调用emmitter的commit与success方法,$batch则调用emmitter的emitBatch方法,开始向外发送业务数据。

(1) TridentSpoutExecutor与是一个bolt

 publicclassTridentSpoutExecutorimplementsITridentBatchBolt

(2)核心的execute方法

@Override
public void execute(BatchInfo info, Tuple input) {
    // there won't be a BatchInfo for the success stream
    TransactionAttempt attempt = (TransactionAttempt) input.getValue(0);
    if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {
        if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {
            ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);
            _activeBatches.remove(attempt.getTransactionId());
        } else {
             throw new FailedException("Received commit for different transaction attempt");
        }
    } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
        // valid to delete before what's been committed since 
        // those batches will never be accessed again
        _activeBatches.headMap(attempt.getTransactionId()).clear();
        _emitter.success(attempt);
    } else {            
        _collector.setBatch(info.batchId);
        //发送业务消息
        _emitter.emitBatch(attempt, input.getValue(1), _collector);
        _activeBatches.put(attempt.getTransactionId(), attempt);
    }
}

(四)在TridentTopologyBuilder中设置Spout

通过上面的分析,一个Spout是准备好了,但如何将它加载到拓扑中,并开始真正的数据流:
(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。

1、TridentTopologyBuilder

在TridentTopologyBuilder中的buildTopology的前半部分中,设置了Spout的相关信息。后半部分设置了bolt的信息。这里我们只看spout相关的内容:

TopologyBuilder builder = new TopologyBuilder();
        Map<GlobalStreamId, String> batchIdsForSpouts = fleshOutStreamBatchIds(false);
        Map<GlobalStreamId, String> batchIdsForBolts = fleshOutStreamBatchIds(true);

        Map<String, List<String>> batchesToCommitIds = new HashMap<String, List<String>>();
        Map<String, List<ITridentSpout>> batchesToSpouts = new HashMap<String, List<ITridentSpout>>();

        for(String id: _spouts.keySet()) {
            TransactionalSpoutComponent c = _spouts.get(id);
            if(c.spout instanceof IRichSpout) {

                //TODO: wrap this to set the stream name
                builder.setSpout(id, (IRichSpout) c.spout, c.parallelism);
            } else {
                String batchGroup = c.batchGroupId;
                if(!batchesToCommitIds.containsKey(batchGroup)) {
                    batchesToCommitIds.put(batchGroup, new ArrayList<String>());
                }
                batchesToCommitIds.get(batchGroup).add(c.commitStateId);

                if(!batchesToSpouts.containsKey(batchGroup)) {
                    batchesToSpouts.put(batchGroup, new ArrayList<ITridentSpout>());
                }
                batchesToSpouts.get(batchGroup).add((ITridentSpout) c.spout);


                BoltDeclarer scd =
                      builder.setBolt(spoutCoordinator(id), new TridentSpoutCoordinator(c.commitStateId, (ITridentSpout) c.spout))
                        .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.BATCH_STREAM_ID)
                        .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.SUCCESS_STREAM_ID);

                for(Map m: c.componentConfs) {
                    scd.addConfigurations(m);
                }

                Map<String, TridentBoltExecutor.CoordSpec> specs = new HashMap();
                specs.put(c.batchGroupId, new CoordSpec());
                BoltDeclarer bd = builder.setBolt(id,
                        new TridentBoltExecutor(
                          new TridentSpoutExecutor(
                            c.commitStateId,
                            c.streamName,
                            ((ITridentSpout) c.spout)),
                            batchIdsForSpouts,
                            specs),
                        c.parallelism);
                bd.allGrouping(spoutCoordinator(id), MasterBatchCoordinator.BATCH_STREAM_ID);
                bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.SUCCESS_STREAM_ID);
                if(c.spout instanceof ICommitterTridentSpout) {
                    bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.COMMIT_STREAM_ID);
                }
                for(Map m: c.componentConfs) {
                    bd.addConfigurations(m);
                }
            }
        }

        for(String id: _batchPerTupleSpouts.keySet()) {
            SpoutComponent c = _batchPerTupleSpouts.get(id);
            SpoutDeclarer d = builder.setSpout(id, new RichSpoutBatchTriggerer((IRichSpout) c.spout, c.streamName, c.batchGroupId), c.parallelism);

            for(Map conf: c.componentConfs) {
                d.addConfigurations(conf);
            }
        }

        for(String batch: batchesToCommitIds.keySet()) {
            List<String> commitIds = batchesToCommitIds.get(batch);
            builder.setSpout(masterCoordinator(batch), new MasterBatchCoordinator(commitIds, batchesToSpouts.get(batch)));
        }

2、TridentTopology

创建一个spout节点,并将之add到拓扑中。

public Stream newStream(String txId, ITridentSpout spout) {
    Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);
    return addNode(n);
}

你可能感兴趣的:(源码,storm,trident)