分布式实战(干货)
spring cloud 实战(干货)
mybatis 实战(干货)
spring boot 实战(干货)
React 入门实战(干货)
构建中小型互联网企业架构(干货)
python 学习持续更新
ElasticSearch 笔记
kafka storm 实战 (干货)
scala 学习持续更新
RPC
深度学习
GO 语言 持续更新
Storm 大数据实事计算系统,是Twitter开源的一个分布式的实时计算系统
全量数据处理使用的大多是鼎鼎大名的hadoop、hive,作为一个批处理系统,hadoop以其吞吐量大、自动容错等优点,在海量数据处理上得到了广泛的使用,但是,hadoop不擅长实时计算,因为它天然就是为批处理而生的,这也是业界的共识。而s4,storm,puma,spark等则是实时计算系统。
使用场景:数据的实时,持续计算,分布式RPC等。
Storm被广泛应用于实时分析,在线机器学习,持续计算、分布式远程调用等领域。来看一些实际的应用:
Spout(消息源)
public void nextTuple() {
Utils.sleep(100);
final String[] words = new String[] {"nathan", "mike",
"jackson", "golda", "bertels"};
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word));
}
Bolt(消息处理者)
public static class ExclamationBolt implements IRichBolt {
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context,
OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
Stream grouping(数据的分发方式)
Topology(拓扑)
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(1, new RandomSentenceSpout(), 5 );
builder.setBolt(2, new SplitSentence(), 8 ).shuffleGrouping(1);
builder.setBolt(3, new WordCount(), 12).fieldsGrouping(2, new Fields("word"));
Worker(工作进程)
Task(执行具体逻辑的任务)
Executor(执行Task的线程)
Configuration(配置)
Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes
topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4) //set tasks number to 4
.shuffleGrouping("blue-spout");
topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
.shuffleGrouping("green-bolt");
StormSubmitter.submitTopology(
"mytopology",
conf,
topologyBuilder.createTopology()
);
rebalancing动态的改变并发度
常用的类
BaseRichSpout (消息生产者)
BaseBasicBolt(消息处理者)
TopologyBuilder(拓扑的构建器)
Values(将数据存放的values,发送到下个组件)
Tuple
Config(配置)
StormSubmitter / LocalCluster (拓扑提交器)
storm里面各个对象的示意图
计算拓补:Topology
消息源: Spout
消息处理者: Bolt
Task:任务
配置Configuration
消息流:Stream
消息分发策略:Stream groupings
首先, 所有tuple都有一个唯一标识msgId, 当tuple被emit的时候确定
_collector.emit(new Values("field1", "field2", 3) , msgId);
其次, 看看下面的ISpout接口, 除了获取tuple的nextTuple
还有ack和fail, 当Storm detect到tuple被fully processed, 会调用ack, 如果超时或detect fail, 则调用fail
此处需要注意的是, tuple只有在被产生的那个spout task上可以被ack或fail, 具体原因看后面的实现解释就理解了
public interface ISpout extends Serializable {
void open(Map conf, TopologyContext context, SpoutOutputCollector collector);
void close();
void nextTuple();
void ack(Object msgId);
void fail(Object msgId);
}
最后, 在spout怎么实现的, 其实比较简单.
对于Spout queue, get message只是open而不是pop, 并且把tuple状态改为pending, 防止该tuple被多次发送.
一直等到该tuple被ack, 才真正的pop该tuple, 当然该tuple如果fail, 就重新把状态改回初始状态
这也解释, 为什么tuple只能在被emit的spout task被ack或fail, 因为只有这个task的queue里面有该tuple
前面一直没有说明的一个问题是, storm本身通过什么机制来判断tuple是否成功被fully processed?
要解决这个问题, 可以分为两个问题,
答案很简单, 你必须告诉它, 如何告诉它?
_collector.emit(tuple, new Values(word));
emit的第一个参数是tuple, 这就是用于建anchoring
当然你也可以直接调用unanchoring的emit版本, 如果不需要保证reliable的话, 这样效率会比较高
_collector.emit(new Values(word));
同时前面说了, 可能一个tuple依赖于多个输入,
List anchors = new ArrayList();
anchors.add(tuple1);
anchors.add(tuple2);
_collector.emit(anchors, new Values(1, 2, 3));
对于Multi-anchoring的情况会导致tuple tree变为tuple DGA, 当前storm的版本已经可以很好的支持DAG
对于tuple tree上每个节点的运行情况, 你需要在每个bolt的逻辑处理完后, 显式的调用OutputCollector的ack和fail来汇报
看下面的例子, 在execute函数的最后会调用
_collector.ack(tuple);
我比较迷惑, 为啥ack是OutputCollector的function, 而不是tuple的function?
而且就算ack也是应该对bolt的input进行ack, 为啥是output, 可能因为所有input都是其他bolt的output产生...这个设计的比较不合理
public class SplitSentence extends BaseRichBolt {
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
_collector.emit(tuple, new Values(word));
}
_collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
storm为了保证reliable, 必然是要牺牲效率的, 此处storm会在task memory里面去记录你汇报的tuple tree的结构和运行情况.
而只有当某tuple节点被ack或fail后才会被从内存中删除, 所以如果你总是不去ack或fail, 那么会导致task的out of memory
简单的版本, BasicBolt
上面的机制, 会给程序员造成负担, 尤其对于很多简单的case, 比如filter, 每次都要去显式的建立anchoring和ack…
所以storm提供简单的版本, 会自动的建立anchoring, 并在bolt执行完自动调用ack
public class SplitSentence extends BaseBasicBolt {
public void execute(Tuple tuple, BasicOutputCollector collector) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
collector.emit(new Values(word));
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
记录级容错
首先来看一下什么叫做记录级容错?Storm允许用户在spout中发射一个新的源tuple时为其指定一个message id,这个message id可以是任意的object对象。多个源tuple可以公用一个message id, 表示这多个源tuple对用户来说是同一个消息单元。Storm中记录级容错的意思是说,storm会告知用户每一个消息单元是否在指定时间内被完全处理了。那什么叫做完全处理呢,就是该message id绑定的源tuple及由该源tuple后续生成的tuple经过了topology中每一个应该到达的bolt的处理。
举个例子。在图中,在spout由message 1绑定的tuple1和tuple2经过了bolt1和bolt2处理生成两个新的tuple,并流向了bolt3。当这个过程完全处理完时,称message 1被完全处理了
在storm的topology中有一个系统级组件,叫做acker。这个acker的任务就是跟踪从spout中流出来的每一个message id绑定的若干tuple的处理路径,如果在用户设置的最大超时时间内这些tuple没有被完全处理,那么acker就 会告知spout该消息处理失败了,相反则会告知spout该消息处理成功了。在刚才的描述中,我们提到了 “记录tuple的处理路径”。采用一数学定理:异或。
Storm中使用的巧妙方法就是基于这个定理。具体过程:在spout中系统会为用户指定message id生成一个对应的64位整数,作为一个root id. root id会传递给acker及后续的bolt作为该消息单元的唯一标识。同时无论是spout还是bolt每次新生成一个tuple的时候,都会赋予改tuple一个64位的整数的id。Spout发射完某个message id 对应的源tuple之后,会告知acker自己发射的root id 及生成的那些源tuple的id.
而bolt呢,每次接收到一个输入tuple处理完之后,也会告知acker自己处理的输入tuple的id及新生成的那些tuple的id。Acker只需要对这些id做一个简单的异或运算,就能判断出该root id对应的消息单元是否处理完成了。下面图示说明:
可能有些细心的同学会发现,容错过程存在一个可能出错的地方,那就是,如果生产的tuple id并不是完全各异,acker可能会在消息单元完全处理完成之前就错误的计算为0.这个错误在理论上的确是存在的,但是在实际中其概念极低,完全可以忽略。
DRPC ,Distributed Remote Procedure Call
具体交互过程如下图
public static class ExclaimBolt implements IBasicBolt {
public void prepare(Map conf, TopologyContext context) {
}
public void execute(Tuple tuple, BasicOutputCollector collector) {
String input = tuple.getString(1);
collector.emit(new Values(tuple.getValue(0), input + "!"));
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "result"));
}
}
public static void main(String[] args) throws Exception {
LinearDRPCTopologyBuilder builder
= new LinearDRPCTopologyBuilder("exclamation");
builder.addBolt(new ExclaimBolt(), 3);
// ...
}
使用LinearDRPCTopologyBuilder
前面的例子, 无法说明为什么要使用storm来实现RPC, 那个操作直接在RPC server上就可以完成
当只有对并行计算和数据量有一定要求时, 才能体现出价值...
ReachTopology, 每个人发送的URL都会被所有follower收到, 所以要计算某URL的reach, 只需要如下几步,
找出所有发送该URL的tweeter, 取出他们的follower, 去重复, 计数
public class ReachTopology {
public static Map> TWEETERS_DB = new HashMap>() {
{
put("foo.com/blog/1", Arrays.asList("sally", "bob", "tim", "george", "nathan"));
put("engineering.twitter.com/blog/5", Arrays.asList("admam", "david", "sally", "nathan"));
}
};
public static Map> FOLLOWERS_DB = new HashMap>() {
{
put("sally", Arrays.asList("bob", "time", "alice", "adam", "jai"));
put("bob", Arrays.asList("admam", "david", "vivian", "nathan"));
}
};
public static class GetTweeters extends BaseBasicBolt {
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
Object id = tuple.getValue(0);
String url = tuple.getString(1);
List tweeters = TWEETERS_DB.get(url);
if (tweeters != null) {
for (String tweeter: tweeters) {
collector.emit(new Values(id, tweeter));
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "tweeter"));
}
}
public static class GetFollowers extends BaseBasicBolt {
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
Object id = tuple.getValue(0);
String tweeter = tuple.getString(1);
List followers = FOLLOWERS_DB.get(url);
if (followers != null) {
for (String follower: followers) {
collector.emit(new Values(id, follower));
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "follower"));
}
}
public static class PartialUniquer extends BaseBatchBolt {
BatchOutputCollector _collector;
Object _id;
Set _followers = new HashSet();
@Override
public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector, Object id) {
_collector = collector;
_id = id;
}
@Override
public void execute(Tuple tuple) {
_followers.add(tuple.getString(1));
}
@Override
public void finishBatch() {
_collector.emit(new Values(_id, _followers.size()));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "partial-count"));
}
}
public static class CountAggregator extends BaseBatchBolt {
BatchOutputCollector _collector;
Object _id;
int _count = 0;
@Override
public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector, Object id) {
_collector = collector;
_id = id;
}
@Override
public void execute(Tuple tuple) {
_count += tuple.getInteger(1);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "reach"));
}
}
public static LinearDRPCTopologyBuilder construct() {
LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("reach");
builder.addBolt(new GetTweeters(), 4);
builder.addBolt(new GetFollowers, 12).shuffleGrouping;
builder.addBolt(new PartialUniquer(), 6).filedsGrouping(new Fields("id","follower"));
builder.addBolt(new CountAggregator(), 3).filedsGrouping(new Fields("id"));
return builder;
}
public static void main(String[] args) throws Exception {
LinearDRPCTopologyBuilder builder = construct();
Config conf = new Config();
if (args == null || args.length ==0) {
LocalDRPC drpc = new LocalDRPC();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("reach-drpc",conf, builder.createLocalTopology(drpc));
String urlsToTry = new String[] { "foo.com/blog/1","engineering.twitter.com/blog/5"}
cluster.shutdown();
drpc.shutdown();
} else {
config.setNumberWorkers(6);
StormSubmitter.submitTopology(args[0], conf, builder.createRemoteTopology());
}
}
}
Nimbus和Supervisor
安装storm 需要安装如下软件:
JDK
Zeromq
jzmq-master
Zookeeper
Python
storm
安装ZeroMQ
wget http://download.zeromq.org/zeromq-2.2.0.tar.gz
tar zxf zeromq-2.2.0.tar.gz
cd zeromq-2.2.0
./configure (yum install libuuid-devel)
make
make install
zeroMQ有可能缺失g++
安装g++
yum install gcc gcc-c++
注意事项:如果./configure 或者make执行失败,请先安装util-linux-ng-2.17
#unzip util-linux-ng-2.17-rc1.zip
#cd util-linux-ng-2.17
#./configure
#make
#mv /sbin/hwclock /sbin/hwclock.old
#cp hwclock/hwclock /sbin/
#hwclock --show
#hwclock -w
#make install
注意:./configure出现如下错误: configure:error:ncurses or ncursesw selected, but library not found (--without-ncurses to disable)我们加上参数--without-ncurses
安装jzmq
#yum install git
git clone git://github.com/nathanmarz/jzmq.git
cd jzmq
./autogen.sh
./configure
make
make install
如果缺失libtool,则先安装
yum install libtool
安装Python
wget http://www.python.org/ftp/python/2.7.2/python-2.7.2.tgz
tar zxvf python-2.7.2.tgz
cd pyton-2.7.2
./configure
make
make install
安装storm
wget http://cloud.github.com/downloads/nathanmarz/storm/storm-0.8.1.zip
unzip storm-0.8.1.zip
vim /etc/profile
export STORM_HOME=/usr/local/storm-0.8.1
如果unzip不能用
yum install unzip
配置Storm
修改storm/conf/storm.yaml文件
storm.zookeeper.servers:
- "zk1"
- "zk2"
- "zk3"
nimbus.host: "zk1"
storm.local.dir: "/usr/tmp/storm"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
(注意:先搭建zookeeper集群)
说明下:storm.local.dir 表示storm需要用到的本地目录。
nimbus.host 表示哪一台机器是master机器,即nimbus.
storm.zookeeper.servers表示哪几台机器是zookeeper服务器
storm.zookeeper.port表示zookeeper的端口号,这里一定要与zookeeper配置的端口号一致,否则会出现通信错误。
supervisor.slots.ports表示supervisor节点的槽数,就是最多能跑几个worker进程(每个sprout或者bolt默认只启动一个worker,但是可以通过conf修改成多个)
java.library.path这是storm所依赖的本地依赖(ZeroMQ 和JZMQ)的加载地址,默认的是:/usr/local/lib :/opt/local/lib:usr/lib,大多数情况下是对的,所以你应该不用更改这个配置
注意:配置时一定注意在每一项的开始时要加空格(最好加两个空格),冒号后也必须要加空格,否则storm不认识这个配置文件。
在目录/usr/tmp下面增加storm文件夹
启动storm:
启动zookeeper环境(启动不正常,执行service iptables stop关闭防火墙)
执行storm nimbus 启动nimbus
执行storm supervisor 启动从节点
执行storm ui 启动ui(ui 和nimbus 需要在同一台机子上面)
注意事项:
storm后台进程被启动后,将在storm安装部署目录下的logs/子目录下生成各个进程的日志文件。
为了方便使用,可以将bin/storm加入到系统环境变量中
启动完毕,通过http://ip:8080/访问UI
#./storm nimbus
#Jps
nimbus
quorumPeerMain
启动UI
#./storm ui > /dev/null 2>&1 &
#Jps
Nimbus
Core
QuorumPeerMain
配置主节点和从节点
1.配置slave节点的配置文件
#cd /cloud/storm-0.8.2/conf/
#vi storm.yaml
storm.zookeeper.servers:
-”master”
nimbus.host: “master”
”master”
nimbus.host: “master”
2.启动主节点zookeeper
#./zkserver.sh start
#jps
QuorumPeerMain
3.启动主节点strom
# ./storm nimbus > ../logs/info 2>&1 &
4.启动子节点
#./storm supervisor > /dev/null 2>&1 &
#jps
supervisor
5.启动主节点监控页面
#./storm ui > /dev/null 2>&1 &
6.在主节点上运行例子
#./storm jar /home/lifeCycle.jar cn.topology.TopoMain
#Jps
QuorumPeerMain
Jps
Core
Nimbus
#./storm list
storm list 查看再运行的作业列表
7.在从节点上查看jps
#jps
Worker
supervisor
public static void main(String[] args)
throws Exception
{
TopologyBuilder builder = new TopologyBuilder();
//指定spout的并行数量2
builder.setSpout("random", new RandomWordSpout(), Integer.valueOf(2));
//指定bolt的并行数4
builder.setBolt("transfer", new TransferBolt(), Integer.valueOf(4)).shuffleGrouping("random");
builder.setBolt("writer", new WriterBolt(), Integer.valueOf(4)).fieldsGrouping("transfer", new Fields(new String[] { "word" }));
Config conf = new Config();
//指定worker的数量
conf.setNumWorkers(2);
conf.setDebug(true);
log.warn("$$$$$$$$$$$ submitting topology...");
StormSubmitter.submitTopology("life-cycle", conf, builder.createTopology());
log.warn("$$$$$$$4$$$ topology submitted !");
}
//指定spout的并行数量2
builder.setSpout("random", new RandomWordSpout(), Integer.valueOf(2));
//指定bolt的并行数4
builder.setBolt("transfer", new TransferBolt(), Integer.valueOf(4)).shuffleGrouping("random");
builder.setBolt("writer", new WriterBolt(), Integer.valueOf(4)).fieldsGrouping("transfer", new Fields(new String[] { "word" }));
Config conf = new Config();
//指定worker的数量
conf.setNumWorkers(2);
conf.setDebug(true);
log.warn("$$$$$$$$$$$ submitting topology...");
StormSubmitter.submitTopology("life-cycle", conf, builder.createTopology());
log.warn("$$$$$$$4$$$ topology submitted !");
}
8.停作业
#./storm kill life-cycle
环境装备:
org.twitter4j
twitter4j-core
[2.2,)
org.twitter4j
twitter4j-stream
[2.2,)
例子:统计文本中每个单词出现的次数
WordCountTopo:
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.BoltDeclarer;
import backtype.storm.topology.TopologyBuilder;
import cn.storm.bolt.WordCounter;
import cn.storm.bolt.WordSpliter;
import cn.storm.spout.WordReader;
import java.io.PrintStream;
public class WordCountTopo
{
public static void main(String[]args)
{
if (args.length != 2) {
System.err.println("Usage: inputPaht timeOffset");
System.err.println("such as : java -jar WordCount.jar D://input/ 2");
System.exit(2);
}
TopologyBuilder builder =new TopologyBuilder();
builder.setSpout("word-reader",new WordReader());
builder.setBolt("word-spilter",new WordSpliter()).shuffleGrouping("word-reader");
builder.setBolt("word-counter",new WordCounter()).shuffleGrouping("word-spilter");
String inputPaht = args[0];
String timeOffset = args[1];
Config conf = new Config();
conf.put("INPUT_PATH",inputPaht);
conf.put("TIME_OFFSET",timeOffset);
conf.setDebug(false);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("WordCount",conf,builder.createTopology());
}
}
WordReader
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.FileFilterUtils;
public class WordReader extends BaseRichSpout
{
private static final long serialVersionUID = 2197521792014017918L;
private String inputPath;
private SpoutOutputCollector collector;
public void open(Map conf,TopologyContext context,SpoutOutputCollector collector)
{
this.collector =collector;
this.inputPath = ((String)conf.get("INPUT_PATH"));
}
public void nextTuple()
{
Collection files = FileUtils.listFiles(new File(this.inputPath), FileFilterUtils.notFileFilter(FileFilterUtils.suffixFileFilter(".bak")),null);
for (File f : files)
try {
List lines = FileUtils.readLines(f,"UTF-8");
for (String line : lines) {
this.collector.emit(new Values(new Object[] { line }));
}
FileUtils.moveFile(f,new File(f.getPath() + System.currentTimeMillis() +".bak"));
} catch (IOExceptione) {
e.printStackTrace();
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer)
{
declarer.declare(new Fields(new String[] {"line" }));
}
}
WordSpliter
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import org.apache.commons.lang.StringUtils;
public class WordSpliter extends BaseBasicBolt
{
private static final long serialVersionUID = -5653803832498574866L;
public void execute(Tuple input,BasicOutputCollector collector)
{
String line = input.getString(0);
String[] words = line.split(" ");
for (String word : words) {
word = word.trim();
if (StringUtils.isNotBlank(word)) {
word = word.toLowerCase();
collector.emit(new Values(new Object[] {word }));
}
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer)
{
declarer.declare(new Fields(new String[] {"word" }));
}
}
WordCounter
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;
import java.io.PrintStream;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
public class WordCounterextends BaseBasicBolt
{
private static final long serialVersionUID = 5683648523524179434L;
private HashMapcounters =new HashMap();
public void prepare(Map stormConf,TopologyContext context)
{
final long timeOffset = Long.parseLong(stormConf.get("TIME_OFFSET").toString());
new Thread(new Runnable()
{
public void run() {
while (true) {
for (Map.Entry entry : WordCounter.this.counters.entrySet()) {
System.out.println((String)entry.getKey() +" : " +entry.getValue());
}
System.out.println("---------------------------------------");
try {
Thread.sleep(timeOffset * 1000L);
} catch (InterruptedExceptione) {
e.printStackTrace();
}
}
}
}).start();
}
public void execute(Tuple input,BasicOutputCollector collector)
{
String str = input.getString(0);
if (!this.counters.containsKey(str)) {
this.counters.put(str, Integer.valueOf(1));
} else {
Integer c = Integer.valueOf(((Integer)this.counters.get(str)).intValue() + 1);
this.counters.put(str,c);
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer)
{
}
}
运行
例如nick.txt 中的内容是:
Strom strom hive haddop
统计结果:
Hadoop: 1
Hive 1
Storm 2