大数据数据处理最常用的是两类模式:批处理和流计算。在open source领域,批处理最有名的组件自然是大名鼎鼎的Hadoop MapReduce,而流计算则是Storm。 Storm是一个分布式的、容错的实时计算系统,目前是Apache 的一个incubator project (http://storm.incubator.apache.org/)。介绍Storm基本概念的文章已经很多了,本文就不再赘述。在此仅仅速记一下笔者实际安装 Storm的步骤。
详细步骤:
1、安装Zookeeper3.4.5
2、安装zeromq 2.1.4(http://download.zeromq.org/zeromq-2.1.4.tar.gz)
To build on UNIX-like systems
If you have free choice, themost comfortable OS for developing with ZeroMQ is probably Ubuntu.
Makesure that libtool, autoconf,automake are installed.
Checkwhether uuid-dev package, uuid/e2fsprogsRPM or equivalent on your system is installed.
Debian/Ubuntu
sudo apt-get install uuid-dev
Redhat/Fedora
yum install uuid-devel
or
yuminstall libuuid-devel.x86_64
Unpackthe .tar.gz source archive.
Run./configure, followed by make.
Toinstall ZeroMQ system-wide run sudo makeinstall.
OnLinux, run sudo ldconfigafter installing ZeroMQ.
To see configuration options,run ./configure --help. Read INSTALLfor more details.
3、安装JZMQ(https://github.com/nathanmarz/jzmq)
./autogen.sh ./configure make make install
执行Make的时候碰到一个Error:
Socket.cpp:178:80: error: invalidconversion from 'const jbyte* {aka const signed char*}' to 'jbyte*{aka signed char*}' [-fpermissive]
env->SetByteArrayRegion(array, 0, optvallen, (const jbyte*) optval);
解决方法:
在./src/Socket.cpp文件的178行,将“const”去掉
4、安装Java 6
5、安装 Python2.6.6
6、安装unzip
7、安装Storm
wget https://github.com/downloads/nathanmarz/storm/storm-0.8.1.zip
配置conf/:
###########These MUST be filled in for a storm configuration
storm.zookeeper.servers:
-"host1"
#storm.zookeeper.port: 2181
storm.local.dir:"/opt/XXX/storm-0.8.1/workdir"
nimbus.host:"host1"
supervisor.slots.ports:
-6700
-6701
-6702
-6703
#
#
###### These may optionally be filled in:
#
##List of custom serializations
#topology.kryo.register:
# - org.mycompany.MyType
# - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
##List of custom kryo decorators
#topology.kryo.decorators:
# - org.mycompany.MyDecorator
#
##Locations of the drpc servers
#drpc.servers:
# - "server1"
# - "server2"
ui.port:18091
启动:
Nimbus:在Storm主控节点上运行"bin/stormnimbus >/dev/null 2>&1 &"启动Nimbus后台程序,并放到后台执行;log在 logs/nimbus.log
Supervisor:在Storm各个工作节点上运行"bin/stormsupervisor >/dev/null 2>&1 &"启动Supervisor后台程序,并放到后台执行;log 在logs/supervisor.log
UI:在Storm主控节点上运行"bin/stormui >/dev/null 2>&1&"启动UI后台程序,并放到后台执行,启动后可以通过http://{nimbushost}:18091观察集群的worker资源使用情况、Topologies的运行状态等信息。log在 logs/ui.log
WeGUI:
8、编译运行incubator-storm项目
gitclone git://git.apache.org/incubator-storm.git
更改./examples/storm-starter/src/jvm/storm/starter/WordCountTopology.java,目的是为了将Topology提交到 cluster,而非localcluster,这样也能在webgui上面看到topology
3. 编译
mvnclean install -DskipTests=true
4. 更改bin/storm,将CONF_DIR的路径设置正确,否则提交topology的时候可能会抛错
5. 运行:
命令:bin/stormjar storm-starter-0.9.3-incubating-SNAPSHOT-jar-with-dependencies.jarstorm.starter.WordCountTopology
在 console 中的 summaries :
详细信息:
可以在logs/worker-6700.log看到这个topology的output——RandomSentenceSpout()不断地生成随即输入到topology,驱动着整个流计算:
6. Kill Topology:
可以用命令“bin/storm list” 查看topology的状态
可以用命令“bin/storm kill wordcount”杀掉名为 wordcount的topology