(一)Aggregator函数是batch聚合,最好与groupBy分组联合使用,意思为根据具体的fields进行聚合,如果是分词那么就是根据具体的单词进行聚合,相同的单词聚合到一起,聚合并发单词的数量使用parallelismHint才可生效,否则永远都进行全聚合
生效方式
.partitionBy(new Fields("word")) //分区
.each(new Fields("word"),new Filter1()).parallelismHint(2) //几分区
.groupBy(new Fields("word"))
.aggregate(new Fields("word"),new Agg1(),new Fields("aggr1"))
.parallelismHint(2); //根据具体字段分两组
三种聚合接口在BaseAggregator结果展示
===============Aggregator==============
[0]the
[0]cow
[0]the
[1]man
[partitionId1]119 man
man119 end
[partitionId0]123 the
[partitionId0]123 cow
[partitionId0]123 the
thecowthe123 end
=========CombinerAggregator============
[0]the
[1]man
[0]cow
[0]the
combine:man
combine:the
combine:cow
combine:thethe //先局部
combine:man
combine:thethe
combine:cow //再汇总
=========ReducerAggregator=============
[1]man
[0]the
[0]cow
[0]the
str:man
str:the
str:cow
str:thethe
=======================================
(二)patitionAggregate为分区聚合函数,无需使用groupBy,后面无需使用parallelismHint,只要前方有分区即可按照分区聚合
生效方式
.partitionBy(new Fields("word")) //.分区
.each(new Fields("word"),new Filter1()).parallelismHint(2)//几分区
.partitionAggregate(new Fields("word"),new Agg1(),new Fields("aggr1")); //分区聚合
三种聚合接口在patitionAggregate结果展示
===============Aggregator==============
[1]man
[partitionId1]127 man
man127 end
[0]cow
[0]the
[0]the
[partitionId0]131 cow
[partitionId0]131 the
[partitionId0]131 the
thecowthe131 end
=========CombinerAggregator============
[1]man
combine:man
[0]the
[0]cow
[0]the
combine:the
combine:thecow
combine:thecowthe
=========ReducerAggregator=============
[1]man
str:man
[0]the
[0]cow
[0]the
str:the
str:thecow
str:thecowthe
=======================================
(三)持久聚合与分区持久化,为流聚合,持久数据用
(1)StateQuery
此方法用于state状态修改,可与数据库进行交互,实现步骤如下
1.创建State
import java.util.List;
import org.apache.storm.trident.state.State;
import org.apache.storm.trident.tuple.TridentTuple;
public class TestState implements State{
@Override
public void beginCommit(Long arg0) {
// TODO Auto-generated method stub
}
@Override
public void commit(Long arg0) {
// TODO Auto-generated method stub
}
public String getDBOption(int i){
return "success"+i;
}
}
2.创建StateFactory
import java.util.Map;
import org.apache.storm.task.IMetricsContext;
import org.apache.storm.trident.state.State;
import org.apache.storm.trident.state.StateFactory;
public class TestStateFactory implements StateFactory{
@Override
public State makeState(Map arg0, IMetricsContext arg1, int arg2, int arg3) {
// TODO Auto-generated method stub
return new TestState();
}
}
3.创建函数
import java.util.ArrayList;
import java.util.List;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.state.BaseQueryFunction;
import org.apache.storm.trident.tuple.TridentTuple;
public class TestQueryLocation extends BaseQueryFunction{
@Override
public List batchRetrieve(TestState state, List arg1) {
List list = new ArrayList();
for(int i = 0 ; i< arg1.size() ; i++){
list.add(state.getDBOption(i));
}
return list;
}
@Override
public void execute(TridentTuple arg0, String arg1, TridentCollector arg2) {
System.out.println(arg0.getString(0));
System.out.println(arg1);
}
}
4.传入stateQuery方法进行
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.trident.TridentTopology;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.testing.FixedBatchSpout;
import org.apache.storm.trident.tuple.TridentTuple;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
public class TestStateToplogy {
public static void main(String agrs[]){
@SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(
new Fields("sentence"), 2,
new Values("the cow"),
new Values("the man"),
new Values("four score"),
new Values("many apples"));
spout.setCycle(false);
TridentTopology topology = new TridentTopology();
topology.newStream("spout",spout)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.stateQuery(topology.newStaticState(new TestStateFactory()),new Fields("word"), new TestQueryLocation(), new Fields("test"));
StormTopology stormTopology = topology.build();
LocalCluster cluster = new LocalCluster();
Config conf = new Config();
conf.setDebug(false);
cluster.submitTopology("test", conf,stormTopology);
}
public static class Split extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector collector) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
collector.emit(new Values(word));
}
}
}
}
5.输出结果
the
success0
cow
success1
the
success2
man
success3
four
success0
score
success1
many
success2
apples
success3
(2)StateUpdater
场景应用,入库操作,其他场景还没考虑到。。
1.准备StateFactory,State完毕后,准备StateUpdater
import java.util.List;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.state.BaseStateUpdater;
import org.apache.storm.trident.tuple.TridentTuple;
public class TestLocationUpdater extends BaseStateUpdater{
@Override
public void updateState(TestState state, List arg1,
TridentCollector arg2) {
state.getBatch(arg1);
}
}
2.State
import java.util.List;
import org.apache.storm.trident.state.State;
import org.apache.storm.trident.tuple.TridentTuple;
public class TestState implements State{
@Override
public void beginCommit(Long arg0) {
System.out.println("beginCommit");
}
@Override
public void commit(Long arg0) {
System.out.println("commit");
}
public String getDBOption(int i){
return "success"+i;
}
public void getBatch(List arg1){
for(int i = 0; i< arg1.size() ; i++){
System.out.println(arg1.get(i).getString(0));
}
System.out.println("insert batch over");
}
}
3.Topology调用方式
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.trident.TridentState;
import org.apache.storm.trident.TridentTopology;
import org.apache.storm.trident.operation.BaseFunction;
import org.apache.storm.trident.operation.TridentCollector;
import org.apache.storm.trident.testing.FixedBatchSpout;
import org.apache.storm.trident.tuple.TridentTuple;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
public class TestStateToplogy {
public static void main(String agrs[]){
@SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(
new Fields("sentence"), 2,
new Values("the cow"),
new Values("the man"),
new Values("four score"),
new Values("many apples"));
spout.setCycle(false);
TridentTopology topology = new TridentTopology();
topology.newStream("spout",spout)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.stateQuery(topology.newStaticState(new TestStateFactory()),new Fields("word"), new TestQueryLocation(), new Fields("test"))
.parallelismHint(2)
.partitionPersist(new TestStateFactory(), new Fields("test"), new TestLocationUpdater());
StormTopology stormTopology = topology.build();
LocalCluster cluster = new LocalCluster();
Config conf = new Config();
conf.setDebug(false);
cluster.submitTopology("test", conf,stormTopology);
}
public static class Split extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector collector) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
collector.emit(new Values(word));
}
}
}
}
4.结果
the
success0
cow
success1
the
success2
man
success3
beginCommit
the
cow
the
man
insert batch over
commit
four
success0
score
success1
many
success2
apples
success3
beginCommit
four
score
many
apples
insert batch over
commit
(3)patitionPersist分区持久化
待续
(4)persistAggregate持久化聚合
待续