这里介绍使用java对storm的操作示例,包括拓扑定义、提交等。
storm能保证每一个元组都被完全处理,如果任务处理失败,会从消息源重新处理。每个元组完全处理后,会调用源spout的ack方法,否则调用源spout的fail方法,基于此能保证元组的可靠性。
storm被设置为无状态和快速失败。如果worker死亡,storm会重启,如果一个节点失败,storm会在其他节点重启。也就是直接kill进程不会影响topology。
storm核心是基于thrift接口,因此topology可以用任何语言定义和提交。
IComponent是所有组件的接口。方法有:
//指定元组的输出模式,声明输出流id、输出字段等。
void declareOutputFields(OutputFieldsDeclarer declarer);
//声明指定组件的配置
Map<String, Object> getComponentConfiguration();
ISpout是spout的核心接口,方法有:
//初始化spout时调用,其中conf为该spout的storm配置,context提供spout上下文信息,collector为收集器,用于发送元组。
void open(Map<String, Object> conf, TopologyContext context, SpoutOutputCollector collector);
//消息的源头,storm定期调用该方法获取元组。
void nextTuple();
//元组被完全处理时调用。
void ack(Object msgId);
//元组未被完全处理时(配置的超时时间内)调用,storm将重新处理该元组。
void fail(Object msgId);
//spout关闭时调用。
void close();
//spout从失效模式中激活时调用,后续nextTuple方法将会调用。
void activate();
//spout失效时调用,后续nextTuple方法不会被调用。
void deactivate();
IBolt是bolt的核心接口,方法有:
//组件初始化时调用,topoConf为该bolt的storm配置,context为该bolt上下文信息,collector为收集器,用于发送元组。
void prepare(Map<String, Object> topoConf, TopologyContext context, OutputCollector collector);
//处理输入元组。
void execute(Tuple input);
//bolt关闭时调用。(非正常情况不会调用)
void cleanup();
IRichSpout接口继承ISpout和IComponent,定义如下:
public interface IRichSpout extends ISpout, IComponent {}
IRichBolt接口继承自IBolt和IComponent,定义如下:
public interface IRichBolt extends IBolt, IComponent {}
//IComponent接口的基本抽象实现
public abstract class BaseComponent implements IComponent{}
BaseRichSpout类是开发中Spout使用的基础类,定义如下:
public abstract class BaseRichSpout extends BaseComponent implements IRichSpout{}
BaseRichBolt类是开发中Bolt使用的基础类,自动对处理元组确认(区别于直接实现IRichBolt接口,需要手动确认collector.ack(tuple)),定义如下:
//元组自动确认
public abstract class BaseBasicBolt extends BaseComponent implements IBasicBolt{}
org.apache.storm.topology.TopologyBuilder是拓扑创建类。
TopologyBuilder是拓扑创建类,示例生成如下:
TopologyBuilder builder = new TopologyBuilder();
常用方法有:
//在拓扑中声明一个spout(该方法有其他重载方法),id为该组件id,spout定义的spout,parallelism_hint为分配的任务数(每个任务一个线程)
public SpoutDeclarer setSpout(String id, IRichSpout spout, Number parallelism_hint){}
//在拓扑中声明一个bolt(该方法有其他重载方法),id为该组件id,将会被想消费的组件引用;bolt为定义的bolt;parallelism_hint为分配的任务数(每个任务一个线程)。
public BoltDeclarer setBolt(String id, IBasicBolt bolt, Number parallelism_hint){}
//根据定义生成拓扑
public StormTopology createTopology(){}
TopologyBuilder中添加spout和bolt会用到接口SpoutDeclarer和BoltDeclarer,其定义如下:
//SpoutDeclarer接口
public interface SpoutDeclarer extends ComponentConfigurationDeclarer<SpoutDeclarer> {}
//SpoutDeclarer接口的实现类BoltGetter(为TopologyBuider内部类)
protected class BoltGetter extends ConfigGetter<BoltDeclarer> implements BoltDeclarer{}
//BoltDeclarer接口
public interface BoltDeclarer extends InputDeclarer<BoltDeclarer>, ComponentConfigurationDeclarer<BoltDeclarer> {}
//BoltDeclarer接口的实现类BoltGetter(为TopologyBuider内部类)
protected class BoltGetter extends ConfigGetter<BoltDeclarer> implements BoltDeclarer {}
SpoutDeclarer和BoltDeclarer都继承了org.apache.storm.topology.ComponentConfigurationDeclarer类,在TopologyBuilder定义中常使用的方法有:
//设置组件的任务数
T setNumTasks(Number val);
org.apache.storm.generated.StormTopology为拓扑类,通过TopologyBuilder创建,如下:
TopologyBuilder builder = new TopologyBuilder();
StormTopology topology = builder.createTopology();
流分组定义了流(元组)在各个bolt间的转发,即是每个bolt应该接收哪个流作为输入。
org.apache.storm.topology.InputDeclarer是流分组接口,常用方法有:
//随机分组,随机发送元组到任务(任务得到相同元组), componentId为是消费的组件id(即是生产组件id),streamId为流id(流标识)
public T shuffleGrouping(String componentId, String streamId);
public T shuffleGrouping(String componentId);
//字段分组,按字段对流分组,相同字段到相同任务,componentId为是消费的组件id(即是生产组件id),streamId为流id(流标识),fields
public T fieldsGrouping(String componentId, String streamId, Fields fields);
public T fieldsGrouping(String componentId, Fields fields);
//全局分组,所有流都发送到bolt同一个任务中(id最小的任务)
public T globalGrouping(String componentId, String streamId);
public T globalGrouping(String componentId);
//直接分组,生产者决定消费者的哪些任务会接收
public T directGrouping(String componentId, String streamId);
public T directGrouping(String componentId);
//本地或随机分组,如果目标worker有多个任务,则按随机分组,否则按正常随机分组
public T localOrShuffleGrouping(String componentId, String streamId);
public T localOrShuffleGrouping(String componentId);
//广播分组,流被发送到所有bolt任务中
public T allGrouping(String componentId, String streamId);
public T allGrouping(String componentId);
//不关心分组方式,目前和随机分组一样
public T noneGrouping(String componentId, String streamId);
public T noneGrouping(String componentId);
storm 运行模式有两中,LocalCluster(本地模式)和StormSubmitter(集群模式)。
org.apache.storm.LocalCluster为本地模式,用开发测试。
//创建cluster
LocalCluster cluster = new LocalCluster();
LocalCluster实现了org.apache.storm.ILocalCluster接口,接口定义如下:
//AutoCloseable接口中定义了集群关闭方法void close()
public interface ILocalCluster extends AutoCloseable{}
ILocalCluster常用方法有:
//提交拓扑运行, topologyName为拓扑名称,conf为拓扑配置,topology为拓扑,submitOpts为拓扑参数
ILocalTopology submitTopologyWithOpts(String topologyName, Map<String, Object> conf, StormTopology topology,SubmitOptions submitOpts);
ILocalTopology submitTopology(String topologyName, Map<String, Object> conf, StormTopology topology);
//kill拓扑
void killTopologyWithOpts(String name, KillOptions options);
void killTopology(String topologyName);
//activate拓扑
void activate(String topologyName);
//deactivate拓扑
void deactivate(String topologyName);
//获取拓扑,id为拓扑id(非拓扑name)
StormTopology getTopology(String id);
//获取集群信息
ClusterSummary getClusterInfo();
//获取拓扑信息
TopologyInfo getTopologyInfo(String id);
org.apache.storm.StormSubmitter为集群模式,常用方法有:
//提交拓扑运行, topologyName为拓扑名称,conf为拓扑配置,topology为拓扑,submitOpts为拓扑参数
public static void submitTopologyAs(String name, Map<String, Object> topoConf, StormTopology topology, SubmitOptions opts,ProgressListener progressListener, String asUser){}
public static void submitTopology(String name, Map<String, Object> topoConf, StormTopology topology, SubmitOptions opts){}
public static void submitTopology(String name, Map<String, Object> topoConf, StormTopology topology)
集群模式使用:
#步骤一 编写拓扑代码,并打成包含所有依赖的jar包,以maven打包为例
maven clean package -Dmaven.test.skip=ture
#步骤二 集群中运行ja包,包含jar包路径、执行拓扑类、参数(可传递拓扑名)
bin/storm jar ~/storm_study/study20190618-1.0-SNAPSHOT.jar com.dragon.study.storm.CharCountTopology charCountTopology
这里以数据源随机出字符串,统计字符串中各字符总数为例。StrSpout类数据源,StrSplitBolt类切分字符串,CharCountBolt类统计字符个数,CharCountTopology类启动拓扑。
<dependencies>
<dependency>
<groupId>org.apache.stormgroupId>
<artifactId>storm-coreartifactId>
<version>2.0.0version>
dependency>
dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-pluginartifactId>
<configuration>
<descriptorRefs>jar-with-dependenciesdescriptorRefs>
<archive>
<manifest>
<mainClass>com.dragon.study.StormMainmainClass>
manifest>
archive>
configuration>
plugin>
plugins>
build>
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Random;
//数据源
public class StrSpout extends BaseRichSpout {
//数据收集器,用于发射
private SpoutOutputCollector collector;
//数据源
private List<String> dataList = Arrays.asList("study", "hard", "and", "make");
private Random random = new Random();
//初始化
public void open(Map<String, Object> map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
this.collector = spoutOutputCollector;
}
//元组获取
public void nextTuple() {
//随机生成数据
String str = dataList.get(random.nextInt(dataList.size()));
this.collector.emit(new Values(str));
System.out.println("StrSpout emit: " + str);
Utils.sleep(1000);
}
//声明输出字段
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("str"));
}
}
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
import java.util.stream.Stream;
//数据切分
public class StrSplitBolt extends BaseBasicBolt {
//初始化
@Override
public void prepare(Map<String, Object> topoConf, TopologyContext context) {
}
//数据处理,自动确认ack
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
//根据字段获取数据
String str = input.getStringByField("str");
//字符串切分为单字符串,再次发射
Stream.iterate(0, n -> n + 1).limit(str.length()).map(n -> "" + str.charAt(n)).forEach(t -> collector.emit(new Values(t)));
}
//声明输出字段
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("char"));
}
}
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Tuple;
import java.util.HashMap;
import java.util.Map;
//数据统计
public class CharCountBolt extends BaseBasicBolt {
//统计每个字符个数
private Map<String, Integer> countMap = new HashMap<>();
@Override
public void prepare(Map<String, Object> topoConf, TopologyContext context) {
}
//元组处理
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String data = input.getStringByField("char");
countMap.merge(data, 1, Integer::sum);
System.out.println("CharCountBolt result: " + countMap);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import java.util.concurrent.TimeUnit;
public class CharCountTopology {
public static void main(String[] args) throws Exception {
//配置拓扑
TopologyBuilder builder = new TopologyBuilder();
//添加id为strSpout,任务数为1的Spout
builder.setSpout("strSpout", new StrSpout(), 1);
//strSplitBolt,任务数为2的bolt,接收组件id为strSpout的数据
builder.setBolt("strSplitBolt", new StrSplitBolt(), 2).shuffleGrouping("strSpout");
builder.setBolt("charCountBolt", new CharCountBolt(), 1).shuffleGrouping("strSplitBolt");
//生成拓扑
StormTopology topology = builder.createTopology();
//storm配置
Config conf = new Config();
conf.setDebug(false); //不输出发送的消息及系统消息
//根据参数控制集群模式运行
if (args != null && args.length > 0) {
conf.setNumWorkers(3); //设置3个worker
StormSubmitter.submitTopology(args[0], conf, topology);
return;
}
//本地模式运行
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("storm_study", conf, topology);
TimeUnit.MINUTES.sleep(30);
cluster.killTopology("storm-study");
cluster.close();
}
}
运行CharCountTopology类,输出类似如下:
StrSpout emit: make
CharCountBolt result: {m=1}
CharCountBolt result: {a=1, m=1}
CharCountBolt result: {a=1, k=1, m=1}
CharCountBolt result: {a=1, e=1, k=1, m=1}
StrSpout emit: study
CharCountBolt result: {a=1, s=1, e=1, k=1, m=1}
CharCountBolt result: {a=1, s=1, t=1, e=1, k=1, m=1}
CharCountBolt result: {a=1, s=1, t=1, u=1, e=1, k=1, m=1}
CharCountBolt result: {a=1, s=1, d=1, t=1, u=1, e=1, k=1, m=1}
CharCountBolt result: {a=1, s=1, d=1, t=1, u=1, e=1, y=1, k=1, m=1}