1、为什么有DRPC?
Storm里面引入DRPC主要是利用storm的实时计算能力来并行化CPU intensive的计算。DRPC的storm topology以函数的参数流作为输入,而把这些函数调用的返回值作为topology的输出流。 DRPC其实不能算是storm本身的一个特性, 它是通过组合storm的原语spout,bolt, topology而成的一种模式(pattern)。本来应该把DRPC单独打成一个包的, 但是DRPC实在是太有用了,所以我们我们把它和storm捆绑在一起。
2、DRPC的工作流程Distributed RPC是由一个”DPRC Server”协调的(storm自带了一个实现)。DRPC服务器协调
(1) 接收一个RPC请求
(2) 发送请求到storm topology
(3) 从storm topology接收结果
(4) 把结果发回给等待的客户端。
从客户端的角度来看一个DRPC调用跟一个普通的RPC调用没有任何区别。比如下面是客户端如何调用RPC的,方法的参数是:http://twitter.com。
DRPCClient client = newDRPCClient("drpc-host",3772); String result = client.execute("reach", "http://twitter.com");
客户端给DRPC服务器发送要执行的方法的名字,以及这个方法的参数。实现了这个函数的topology使用DRPCSpout
从DRPC服务器接收函数调用流。每个函数调用被DRPC服务器标记了一个唯一的id。 这个topology然后计算结果,在topology的最后一个叫做ReturnResults
的bolt会连接到DRPC服务器,并且把这个调用的结果发送给DRPC服务器(通过那个唯一的id标识)。DRPC服务器用那个唯一id来跟等待的客户端匹配上,唤醒这个客户端并且把结果发送给它。
3、LinearDRPCTopologyBuilder
Storm自带了一个称作LinearDRPCTopologyBuilder的topology builder, 它把实现DRPC的几乎所有步骤都自动化了。这些步骤包括:
import backtype.storm.Config; import backtype.storm.LocalCluster; import backtype.storm.LocalDRPC; import backtype.storm.StormSubmitter; import backtype.storm.drpc.LinearDRPCTopologyBuilder; import backtype.storm.topology.BasicOutputCollector; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseBasicBolt; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; public class BasicDRPCTopology { public static class ExclaimBolt extends BaseBasicBolt {//主要需要覆写execute方法和declareoutputfields方法 @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String input = tuple.getString(1); collector.emit(new Values(tuple.getValue(0), input + "!")); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("id", "result")); } } public static void main(String[] args) throws Exception { LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("exclamation");//实现DRPC模式 builder.addBolt(new ExclaimBolt(), 3); Config conf = new Config(); if (args == null || args.length == 0) {//本地调用 LocalDRPC drpc = new LocalDRPC();//本地模拟DRPCSever LocalCluster cluster = new LocalCluster();//本地模拟storm集群 cluster.submitTopology("drpc-demo", conf, builder.createLocalTopology(drpc));//组装 for (String word : new String[]{ "hello", "goodbye" }) { System.out.println("Result for \"" + word + "\": " + drpc.execute("exclamation", word)); } cluster.shutdown(); drpc.shutdown(); } else {//集群模式 conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createRemoteTopology()); } } }你声明的第一个bolt会接收两维tuple, tuple的第一个field是request-id,第二个field是这个请求的参数 。
LinearDRPCTopologyBuilder
同时要求我们topology的
最后一个bolt发射一个二维tuple: 第一个field是request-id, 第二个field是这个函数的结果
。最后所有中间tuple的第一个field必须是request-id。
首先介绍一下什么是reach值,要计算一个URL的reach值,我们需要:
LinearDRPCTopologyBuilder builder=new LinearDRPCTopologyBuilder("reach"); builder.addBolt(new GetTweeters(),4); builder.addBolt(new GetFollowers(),12).shuffleGrouping(); builder.addBolt(new PartialUniquer(),6).fieldsGrouping(new Fields("id","follower")); builder.addBolt(new CountAggregator(),3).fieldsGrouping(new Fields("id"));
这个topology分四步执行:
GetTweeters
获取所发微薄里面包含制定URL的所有用户。它接收输入流: [id, url]
, 它输出:[id, tweeter]
. 每一个URL tuple会对应到很多tweeter
tuple。GetFollowers
获取这些tweeter的粉丝。它接收输入流: [id, tweeter]
, 它输出: [id, follower]
PartialUniquer
通过粉丝的id来group粉丝。通过id和follower分组,因此不同的task接收到的粉丝是不同的 — 从而起到去重的作用。它的输出流:[id, count]
即输出这个task上统计的粉丝个数。CountAggregator
接收到所有的局部数量, 把它们加起来就算出了我们要的reach值。public static class PartialUniquer extends BaseBatchBolt{ BatchOutputCollector _collector; Object _id; Set<String> _followers=new HashSet<String>(); @Override public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector, Object id) { // TODO Auto-generated method stub _collector=collector; _id=id;//这个id是一个batch的id,事实上是一个TransactionAttempt对象,包含两个值,详见事物拓扑 } @Override public void execute(Tuple tuple) {//对于每一个batch里面的tuple都会执行一次 // TODO Auto-generated method stub _followers.add(tuple.getString(1)); } @Override public void finishBatch() { // TODO Auto-generated method stub _collector.emit(new Values(_id,_followers.size())); }每个请求的ID都会创建Batch Bolt的新实例。在底层CoordinateBolt被用来检测一个给定的Bolt是否已接受到请求的id的所有元组,即保证一个batch的所有tuple都已处理。
import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; import backtype.storm.Config; import backtype.storm.LocalCluster; import backtype.storm.LocalDRPC; import backtype.storm.StormSubmitter; import backtype.storm.coordination.BatchOutputCollector; import backtype.storm.drpc.LinearDRPCTopologyBuilder; import backtype.storm.generated.AlreadyAliveException; import backtype.storm.generated.InvalidTopologyException; import backtype.storm.task.TopologyContext; import backtype.storm.topology.BasicOutputCollector; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseBasicBolt; import backtype.storm.topology.base.BaseBatchBolt; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; public class ReachTopology {//获取一个url的reach值 public static Map<String,List<String>> TWEETERS_DB=new HashMap<String,List<String>>(){ { put("url1",Arrays.asList("sally","bob","tim","george","nathan")); put("url2",Arrays.asList("adam","david","sally","nathan")); put("url3",Arrays.asList("tim","mike","john")); } }; public static Map<String,List<String>> FOLLOWERS_DB=new HashMap<String,List<String>>(){ { put("sally",Arrays.asList("bob","tim","alice","adam","jim","chris","jai")); put("bob",Arrays.asList("sally","nathan","jim","mary","david","vivian")); put("tim",Arrays.asList("alex")); put("nathan",Arrays.asList("sally","bob","adam","harry","chris","vivian","emily","jordan")); put("adam",Arrays.asList("david","carissa")); put("mike",Arrays.asList("john","bob")); put("john",Arrays.asList("alice","nathan","jim","mike","bob")); } }; public static class GetTweeters extends BaseBasicBolt{ @Override public void execute(Tuple input, BasicOutputCollector collector) { Object id=input.getValue(0); String url=input.getString(1); List<String> tweeters=TWEETERS_DB.get(url); if(tweeters!=null) { for(String tweeter:tweeters) { collector.emit(new Values(id,tweeter)); } } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("id","tweeter")); } } public static class GetFollowers extends BaseBasicBolt{ @Override public void execute(Tuple input, BasicOutputCollector collector) { // TODO Auto-generated method stub Object id=input.getValue(0); String tweeter=input.getString(1); List<String>followers=FOLLOWERS_DB.get(tweeter); if(followers!=null) { for(String follower:followers) { collector.emit(new Values(id,follower)); } } } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // TODO Auto-generated method stub declarer.declare(new Fields("id","follower")); } } public static class PartialUniquer extends BaseBatchBolt{ BatchOutputCollector _collector; Object _id; Set<String> _followers=new HashSet<String>(); @Override public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector, Object id) { // TODO Auto-generated method stub _collector=collector; _id=id; } @Override public void execute(Tuple tuple) {//对于每一个batch里面的tuple都会执行一次 // TODO Auto-generated method stub _followers.add(tuple.getString(1)); } @Override public void finishBatch() { // TODO Auto-generated method stub _collector.emit(new Values(_id,_followers.size())); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // TODO Auto-generated method stub declarer.declare(new Fields("id","partial-count")); } } public static class CountAggregator extends BaseBatchBolt{ BatchOutputCollector _collector; Object _id; int _count=0; @Override public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector, Object id) { // TODO Auto-generated method stub _collector=collector; _id=id; } @Override public void execute(Tuple tuple) { // TODO Auto-generated method stub _count+=tuple.getInteger(1); } @Override public void finishBatch() { // TODO Auto-generated method stub _collector.emit(new Values(_id,_count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // TODO Auto-generated method stub declarer.declare(new Fields("id","reach")); } } public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException { // TODO Auto-generated method stub LinearDRPCTopologyBuilder builder=new LinearDRPCTopologyBuilder("reach"); builder.addBolt(new GetTweeters(),4); builder.addBolt(new GetFollowers(),12).shuffleGrouping(); builder.addBolt(new PartialUniquer(),6).fieldsGrouping(new Fields("id","follower")); builder.addBolt(new CountAggregator(),3).fieldsGrouping(new Fields("id")); Config conf=new Config(); if(args==null||args.length==0) { conf.setMaxTaskParallelism(3); LocalDRPC drpc=new LocalDRPC(); LocalCluster cluster=new LocalCluster(); cluster.submitTopology("reach-drpc", conf, builder.createLocalTopology(drpc)); String[] urls=new String[]{"url3","url2","url1"}; for(String url:urls) { System.out.println("Reach of "+url+": "+drpc.execute("reach", url)); } cluster.shutdown(); drpc.shutdown(); } else { conf.setNumWorkers(6); StormSubmitter.submitTopology(args[0], conf, builder.createRemoteTopology()); } } }