流的合并操作,是指根据两个流的关联条件将两个流合并成一个流,然后在进行后面的处理操作,如果使用Spout和Bolt这种编程模型的话写起来会比较困难和繁琐,因为要设置缓冲区来保存第一次过来的数据,后续还要进行两者的比较,使用Trident应用起来比较方便,对原来的编程模型进行了一定的抽象。代码实例:
需求:
两个spout: spout1:里面的数据是 name ,id 和tel,
spout2是sex 和id;
首先对spout1进行过滤,过滤掉不是186的电话号码,然后显示
然后根据将过滤后的stream和spout2进行合并
代码如下:
过滤 :
public class FilelterTest extends BaseFilter{ @Override public boolean isKeep(TridentTuple tuple) { String get =tuple.getString(2); if(get.startsWith("186")){ return true; } return false; } }过滤 显示方法
public class FunctionTest extends BaseFunction { @Override public void execute(TridentTuple tuple, TridentCollector collector) { String getName = tuple.getString(0); String getid = tuple.getString(1); String getTel = tuple.getString(2); System.out.println("过滤后数据:============"+tuple.getValues()); collector.emit(new Values(getName,getid,getTel)); } }
最终显示方法: public class FunctionTest3 extends BaseFunction { @Override public void execute(TridentTuple tuple, TridentCollector collector) { System.out.println(tuple.getValues()); } }
public class TestTrident { static FixedBatchSpout spout = new FixedBatchSpout(new Fields("name","idName","tel"), 3, new Values("Jack","1","186107"), new Values("Tome","2","1514697"), new Values( "Lay","3","186745"), new Values("Lucy","4","1396478")); static FixedBatchSpout spout2 = new FixedBatchSpout(new Fields("sex","idSex"), 3, new Values("Boy","1"), new Values("Boy","2"), new Values( "Gril","3"), new Values("Gril","4")); public static void main(String[] args) { //设置是否循环 spout.setCycle(false); //构建Trident的Topo TridentTopology topology = new TridentTopology(); //定义过滤器: 电话号码不是 186开头的过滤 FilelterTest ft =new FilelterTest(); //定义方法 用来显示过滤后的数据 FunctionTest function = new FunctionTest(); //根据spout构建第一个Stream Stream st = topology.newStream("sp1", spout); //对第一个Stream数据显示。 Stream st_1=st.each(new Fields("name","idName","tel"), function, new Fields("out_name","out_idName","out_tel")); //根据第二个Spout构建Stream,为了测试join用 Stream st2 = topology.newStream("sp2", spout2); /** * 开始Join st和st2这两个流,类比sql中join的话是: * st join st2 on joinFields1 = joinFields2 * 需要注意的是以st为数据基础 * topology.join(st, new Fields("idName"), st2, new Fields("idSex"), new Fields("id","name","tel","sex")) * 那么结果将是以spout为数据基础,结果会将上面的4个数据信息全部打出 */ Stream st3= topology.join(st, new Fields("idName"), st2, new Fields("idSex"), new Fields("Res_id","Res_name","Res_tel","Res_sex")); //创建一个方法,为了显示合并和过滤后的结果 FunctionTest3 t3 =new FunctionTest3(); st3.each(new Fields("Res_id","Res_name","Res_tel","Res_sex"), ft).each( new Fields("Res_id","Res_name","Res_tel","Res_sex"),t3, new Fields("out1_id","out1_name","ou1t_tel","out1_sex")); Config cf = new Config(); cf.setNumWorkers(2); cf.setNumAckers(0); cf.setDebug(false); LocalCluster lc = new LocalCluster(); lc.submitTopology("TestTopo", cf, topology.build()); } }结果:
Spout1的数据:[Jack, 1, 186107] Spout1的数据:[Tome, 2, 1514697] Spout1的数据:[Lay, 3, 186745] Spout1的数据:[Lucy, 4, 1396478] [1, Jack, 186107, Boy] [3, Lay, 186745, Gril]