一个spout是由流组成的数据源在storm的拓扑里,通常情况下会读取外部的数据源
然后emit(发射)到拓扑里面,比如是kafka,MySQL或者redis等等,Spout有两种实现一种是可靠的消息实现,如果发送失败则会重试,另外一种是不可靠的消息实现可能会出现消息丢失,spout可以一次声明多个数据流通过OutputFieldsDeclarer类的declareStream方法,当然前提是你的SpoutOutputCollector里的emit也是多个流
Spout里面主要的方法是nextTuple,它里面可以发射新的tuple到拓扑,或者当没有消息的时候就return,需要注意,这个方法里面不能阻塞,因为storm调用spout方法是单线程的,其他的主要方法是ack和fail,如果使用了可靠的spout,可以使用ack和fail来确定消息发送状态
相关扩展:
IRichSpout:spout类必须实现的接口
BaseRichSpout :可靠的spout有ack确保
BaseBasicSpout :不可靠的spout
WordCountSpout.java
package storm;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
import java.util.Map;
import java.util.Random;
public class WordCountSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
//模拟产生一些数据
private String[] data = {"I I love Beijing","I love love love China","Beijing is id is is the the capital of China"};
/**
* open方法的作用主要是将collector进行初始化
* collector的作用:将采集到的数据发送给下一个组件
*/
@Override
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector collector) {
this.collector=collector;
}
@Override
public void nextTuple() {
Utils.sleep(3000);
int random = (new Random()).nextInt(3);
String value = data[random];
System.out.println("产生的随机值是"+value);
//发送给下一个组件
collector.emit(new Values(value));
}
//申明发送给下一个组件的tuple的schema(结构)
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
}
Bolts 业务处理单元
所有的拓扑处理都会在bolt中进行,bolt里面可以做任何etl,比如过滤,函数,聚合,连接,写入数据库系统或缓存等,一个bolt可以做简单的事件流转换,如果是复杂的流转化,往往需要多个bolt参与,这就是流计算,每个bolt都进行一个业务逻辑处理,bolt也可以emit多个流到下游,通过declareStream方法声明输出的schema。
Bolt里面主要的方法是execute方法,每次处理一个输入的tuple,bolt里面也可以发射新的tuple使用OutputCollector类,bolt里面每处理一个tuple必须调用ack方法以便于storm知道某个tuple何时处理完成。Strom里面的IBasicBolt接口可以自动
调用ack。
相关拓展:
IRichBolt:bolts的通用接口
IBasicBolt:扩展的bolt接口,可以自动处理ack
OutputCollector:bolt发射tuple到下游bolt里面
WordCountSplitBolt.java
package storm;
import com.google.common.collect.Maps;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
public class WordCountSplitBolt extends BaseRichBolt{
//bolt组件的收集器 用于将数据发送给下一个bolt
private OutputCollector collector;
//初始化
@Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(Tuple tuple) {
//处理上一级发来的数据
String value = tuple.getStringByField("sentence");
String[] data= value.split(" ");
//输出
for (String word : data){
collector.emit(new Values(word,1));
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
//申明发送给下一个组件的tuple schema结构
declarer.declare(new Fields("word","count"));
}
}
WordCountTotalBolt.java
package storm;
import com.google.common.collect.Maps;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
import java.util.Set;
public class WordCountTotalBolt extends BaseRichBolt{
private OutputCollector collector;
Map result=Maps.newHashMap();
@Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(Tuple tuple) {
String word = tuple.getStringByField("word");
Integer count = tuple.getIntegerByField("count");
if (result.get(word) == null){
result.put(word,count);
}else {
result.put(word,count + result.get(word));
}
result.entrySet().forEach(enty-> System.out.println("单词:"+enty.getKey()+" " + "数量:"+ enty.getValue()));
collector.emit(new Values(word,result.get(word)));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","total"));
}
}
WordCountTopology.java
package storm;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
public class WordCountTopology {
public static void main(String[] args) {
TopologyBuilder builder = new TopologyBuilder();
//1 指定任务的spout组件
builder.setSpout("1",new WordCountSpout());
//2 指定任务的第一个bolt组件
builder.setBolt("2",new WordCountSplitBolt()).shuffleGrouping("1");
//3 指定任务的第二个bolt组件
builder.setBolt("3",new WordCountTotalBolt()).fieldsGrouping("2",new Fields("word"));
//创建任务
StormTopology job = builder.createTopology();
Config config = new Config();
//运行任务有两种模式
//1 本地模式 2 集群模式
//1、本地模式
LocalCluster localCluster = new LocalCluster();
localCluster.submitTopology("MyWordCount",config,job);
//2、集群模式:用于打包jar,并放到storm运行
// StormSubmitter.submitTopology(args[0], conf, job);
}
}
org.apache.storm
storm-core
1.0.3
org.apache.storm
storm-rename-hack
1.0.3
org.apache.storm
storm-hbase
1.0.3
test
org.apache.storm
storm-redis
1.0.3