http://blog.csdn.net/suifeng3051/article/details/41682441
Storm中的可靠性
Storm的ISpout接口定义了三个与可靠性有关的方法:nextTuple,ack和fail。
public interface ISpout extends Serializable {
void open( Map conf, TopologyContext context, SpoutOutputCollector collector);
void close();
void nextTuple();
void ack(Object msgId);
void fail(Object msgId);
}
我们知道,当Storm的Spout发射一个Tuple后,他便会调用nextTuple()方法,在这个过程中,保证可靠性处理的第一步就是为发射出的Tuple分配一个唯一的ID,并把这个ID传给emit()方法:
collector.emit( new Values("value1" , "value2") , msgId );
为Tuple分配一个唯一ID的目的就是为了告诉Storm,Spout希望这个Tuple产生的Tuple tree在处理完成或失败后告知它,如果Tuple被处理成功,Spout的ack()方法就会被调用,相反如果处理失败,Spout的fail()方法就会被调用,Tuple的ID也都会传入这两个方法中。
需要注意的是,虽然spout有可靠性机制,但这个机制是否启用由我们控制的。IBasicBolt在emit一个tuple后自动调用ack()方法,用来实现比较简单的计算。如果是IRichBolt的话,如果想要实现anchor,必须自己调用ack方法。
storm的可靠性是由spout和bolt共同决定的,storm利用了anchor机制来保证处理的可靠性。如果spout发射的一个tuple被完全处理,那么spout的ack方法即会被调用,如果失败,则其fail方法便会被调用。在bolt中,通过在emit(oldTuple,newTuple)的方式来anchor一个tuple,如果处理成功,则需要调用bolt的ack方法,如果失败,则调用其fail方法。一个tuple及其子tuple共同构成了一个tupletree,当这个tree中所有tuple在指定时间内都完成时spout的ack才会被调用,但是当tree中任何一个tuple失败时,spout的fail方法则会被调用。
IBasicBolt类会自动调用ack/fail方法,而IRichBolt则需要我们手动调用ack/fail方法。我们可以通过TOPOLOGY_MESSAGE_TIMEOUT_SECS参数来指定一个tuple的处理完成时间,若这个时间未被处理完成,则spout也会调用fail方法。
一个实现可靠性的spout:
public class ReliableSentenceSpout extends BaseRichSpout {
private static final long serialVersionUID = 1L;
private ConcurrentHashMap<UUID, Values> pending;
private SpoutOutputCollector collector;
private String[] sentences = { "my dog has fleas", "i like cold beverages" , "the dog ate my homework" , "don't have a cow man" , "i don't think i like fleas" };
private int index = 0;
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare( new Fields( "sentence"));
}
public void open( Map config, TopologyContext context, SpoutOutputCollector collector) {
this. collector = collector;
this. pending = new ConcurrentHashMap<UUID, Values>();
}
public void nextTuple() {
Values values = new Values( sentences[ index]);
UUID msgId = UUID. randomUUID();
this. pending.put(msgId, values);
this. collector.emit(values, msgId);
index++;
if ( index >= sentences. length) {
index = 0;
}
//Utils.waitForMillis(1);
}
public void ack(Object msgId) {
this. pending.remove(msgId);
}
public void fail(Object msgId) {
this. collector.emit( this. pending.get(msgId), msgId);
}
}
例子2:
public class RandomSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
private Random rand;
private AtomicInteger counter;
private static String[] sentences = new String[]{"edi:I'm happy", "marry:I'm angry", "john:I'm sad", "ted:I'm excited", "laden:I'm dangerous"};
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
this.rand = new Random();
counter = new AtomicInteger();
}
@Override
public void nextTuple() {
Utils.sleep(5000);
String toSay = sentences[rand.nextInt(sentences.length)];
int msgId = this.counter.getAndIncrement();
toSay = "[" + msgId + "]" + toSay;
PrintHelper.print("Send " + toSay);
this.collector.emit(new Values(toSay), msgId);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
@Override
public void ack(Object msgId) {
PrintHelper.print("ack " + msgId);
}
@Override
public void fail(Object msgId) {
PrintHelper.print("fail " + msgId);
}
}
一个实现可靠性的bolt:
public class ReliableSplitSentenceBolt extends BaseRichBolt {
private OutputCollector collector;
public void prepare( Map config, TopologyContext context, OutputCollector collector) {
this. collector = collector;
}
public void execute(Tuple tuple) {
String sentence = tuple.getStringByField("sentence" );
String[] words = sentence.split( " ");
for (String word : words) {
this. collector.emit(tuple, new Values(word));
}
this. collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare( new Fields( "word"));
}
}