Storm的WordCount案例spout bolt详细总结 实现接口IRich IBASE区别

spout介绍

一个spout是由流组成的数据源在storm的拓扑里,通常情况下会读取外部的数据源 
然后emit(发射)到拓扑里面,比如是kafka,MySQL或者redis等等,Spout有两种实现一种是可靠的消息实现,如果发送失败则会重试,另外一种是不可靠的消息实现可能会出现消息丢失,spout可以一次声明多个数据流通过OutputFieldsDeclarer类的declareStream方法,当然前提是你的SpoutOutputCollector里的emit也是多个流 

Spout里面主要的方法是nextTuple,它里面可以发射新的tuple到拓扑,或者当没有消息的时候就return,需要注意,这个方法里面不能阻塞,因为storm调用spout方法是单线程的,其他的主要方法是ack和fail,如果使用了可靠的spout,可以使用ack和fail来确定消息发送状态 

相关扩展: 
IRichSpout:spout类必须实现的接口 
BaseRichSpout :可靠的spout有ack确保 
BaseBasicSpout :不可靠的spout 

1.Spout组件:创建Spout(WordCountSpout)组件采集数据,作为整个Topology的数据源

WordCountSpout.java

package storm;

import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
import java.util.Map;
import java.util.Random;

public class WordCountSpout extends BaseRichSpout {

    private SpoutOutputCollector collector;
    //模拟产生一些数据
    private String[] data = {"I I love Beijing","I love love love  China","Beijing is id is is the the capital of China"};

    /**
     * open方法的作用主要是将collector进行初始化
     * collector的作用:将采集到的数据发送给下一个组件
     */
    @Override
    public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector collector) {
        this.collector=collector;
    }

    @Override
    public void nextTuple() {
        Utils.sleep(3000);
        int random = (new Random()).nextInt(3);
        String value = data[random];
        System.out.println("产生的随机值是"+value);
        //发送给下一个组件
        collector.emit(new Values(value));
    }


   //申明发送给下一个组件的tuple的schema(结构)
    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("sentence"));
    }
}

 

bolt介绍

Bolts 业务处理单元 
所有的拓扑处理都会在bolt中进行,bolt里面可以做任何etl,比如过滤,函数,聚合,连接,写入数据库系统或缓存等,一个bolt可以做简单的事件流转换,如果是复杂的流转化,往往需要多个bolt参与,这就是流计算,每个bolt都进行一个业务逻辑处理,bolt也可以emit多个流到下游,通过declareStream方法声明输出的schema。 

Bolt里面主要的方法是execute方法,每次处理一个输入的tuple,bolt里面也可以发射新的tuple使用OutputCollector类,bolt里面每处理一个tuple必须调用ack方法以便于storm知道某个tuple何时处理完成。Strom里面的IBasicBolt接口可以自动 
调用ack。 

相关拓展: 
IRichBolt:bolts的通用接口 
IBasicBolt:扩展的bolt接口,可以自动处理ack 
OutputCollector:bolt发射tuple到下游bolt里面 

2.Bolt组件1:创建Bolt(WordCountSplitBolt)组件进行分词操作

WordCountSplitBolt.java

package storm;

import com.google.common.collect.Maps;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

import java.util.Map;

public class WordCountSplitBolt extends BaseRichBolt{

    //bolt组件的收集器 用于将数据发送给下一个bolt
    private OutputCollector collector;


    //初始化
    @Override
    public void prepare(Map map, TopologyContext topologyContext, OutputCollector collector) {
        this.collector = collector;
    }

    @Override
    public void execute(Tuple tuple) {
        //处理上一级发来的数据
        String value = tuple.getStringByField("sentence");
        String[] data= value.split(" ");
        //输出
        for (String word : data){
            collector.emit(new Values(word,1));
        }
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        //申明发送给下一个组件的tuple schema结构
        declarer.declare(new Fields("word","count"));
    }
}

 

3.Bolt组件2:创建Bolt(WordCountTotalBolt)组件进行单词统计操作

WordCountTotalBolt.java

package storm;

import com.google.common.collect.Maps;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

import java.util.Map;
import java.util.Set;

public class WordCountTotalBolt extends BaseRichBolt{

    private OutputCollector collector;
    Map result=Maps.newHashMap();
    @Override
    public void prepare(Map map, TopologyContext topologyContext, OutputCollector collector) {
        this.collector = collector;
    }

    @Override
    public void execute(Tuple tuple) {
        String word = tuple.getStringByField("word");
        Integer count = tuple.getIntegerByField("count");

        if (result.get(word) == null){
            result.put(word,count);
        }else {
            result.put(word,count + result.get(word));
        }
        result.entrySet().forEach(enty-> System.out.println("单词:"+enty.getKey()+" " + "数量:"+ enty.getValue()));

        collector.emit(new Values(word,result.get(word)));

    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word","total"));
    }
}

 

4.Topology主程序:(WordCountTopology

WordCountTopology.java

package storm;

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;

public class WordCountTopology {
    public static void main(String[] args) {
        TopologyBuilder builder = new TopologyBuilder();

        //1 指定任务的spout组件
        builder.setSpout("1",new WordCountSpout());

        //2 指定任务的第一个bolt组件
        builder.setBolt("2",new WordCountSplitBolt()).shuffleGrouping("1");

        //3 指定任务的第二个bolt组件
        builder.setBolt("3",new WordCountTotalBolt()).fieldsGrouping("2",new Fields("word"));

        //创建任务
        StormTopology job = builder.createTopology();

        Config config = new Config();

        //运行任务有两种模式
        //1 本地模式   2 集群模式

        //1、本地模式
        LocalCluster localCluster = new LocalCluster();
        localCluster.submitTopology("MyWordCount",config,job);

        //2、集群模式:用于打包jar,并放到storm运行
//        StormSubmitter.submitTopology(args[0], conf, job);
    }
}

 

pom.xml


org.apache.storm
storm-core
1.0.3




org.apache.storm
storm-rename-hack
1.0.3



org.apache.storm
storm-hbase
1.0.3
test



org.apache.storm
storm-redis
1.0.3

 

你可能感兴趣的:(Storm,java)