1.背景知识
在不修改Storm任何源代码的情况下,让Storm运行在YARN上,最简单的实现方法是将Storm的各个服务组件(包括Nimbus和Supervisor),作为单独的任务运行在YARN上,当前比较有名的“Storm On YARN”实现是由yahoo!开源的,它基本实现了上述描述的功能,下面具体进行说明:
(1) YARN-Storm Client
提供了一系列Shell命令供用户控制YARN上的Storm服务,比如构建一个Storm集群命令如下:
storm-yarn launch <storm-yarn-config>
其中,<storm-yarn-config>是Storm配置信息,包括启动的Supervisor个数、Storm ApplicationMaster占用的内存等。
启动Storm之后,可通过以下命令控制Storm:
storm-yarn [command] -appId [appId] -output [file] [-supervisors [n]]
“-supervisors”为需增加的Supervisor服务个数,该参数只对命令“addSupervisors”有效。
Command |
参数含义 |
setStormConfig |
重置集群配置,集群重新启动 |
getStormConfig |
获取当前集群配置,Json格式。 |
addSupervisors |
增加supervisor个数 |
startNimbus/stopNimbus |
启动、停止Nimbus |
startUI/stopUI |
启动停止Web UI |
startSupervisors/stopSupervisors |
启动停止所有supervisor |
shutdown |
关闭集群 |
(2) YARN-Storm ApplicationMaster
Storm ApplicationMaster初始化时,将在同一个Container中启动Storm Nimbus和Storm Web UI两个服务,然后根据待启动的Supervisor数目向ResourceManager申请资源,在目前实现中,ApplicationMaster将请求一个节点上所有资源然后启动Supervisor服务,也就是说,当前Supervisor将独占节点而不会与其他服务共享节点资源,这种情况下可避免其他服务对Storm集群的干扰。
除了运行Storm Nimbus和Web UI外,Storm ApplicationMaster还会启动一个Thrift Server以处理来自YARN-Storm Client端的各种请求。
2.安装环境
a. Hadoop 2.2.0
b. jdk1.7.0_60
c. apache-maven-3.0.5
3.Storm on Yarn的安装准备
注意:所有节点上都需要安装storm;可以只安装一个storm on yarn客户端。
a. 从GitHub上下载Storm on Yarn
wget https://github.com/yahoo/storm-yarn/archive/master.zip
b. storm on yarn 需要编译
unzip storm-yarn-master.zip
cd storm-yarn-master
c. Edit pom.xml,修改Hadoop的版本号,改成对应的版本号
<properties> <storm.version>0.9.0-wip21</storm.version> <hadoop.version>2.2.0</hadoop.version> <!--hadoop.version>2.1.0.2.0.5.0-67</hadoop.version--> </properties>
d.mvn编译
mvn package -DskipTests
编译好后解压storm-yarn-master/lib/storm-0.9.0-wip21.zip,得到 storm-0.9.0-wip21目录。
将得到 storm-0.9.0-wip21 目录移动到 和 storm-yarn-master同级。
/home/ebupt/storm/
|-- storm-0.9.0-wip21
`-- storm-yarn-master
4.配置Storm的工作环境
a. 添加storm-0.9.0-wip21和storm-yarn-master的bin到Path环境变量
vi ~/.bash_profile
export STORM_HOME=$HOME/storm
export PATH=$PATH:$STORM_HOME/storm-yarn-master/bin:$STORM_HOME/storm-0.9.0-wip21/bin
b. 添加Storm工程需要的额外Jar包(storm.zip)。上传至HDFS的指定目录中(非常重要,集群中通过访问hdfs中的storm.zip获取工作环境)
zip -r storm.zip storm-0.9.0-wip21/
hadoop fs –put storm.zip /lib/storm/0.9.0-wip21/
5. 安装并运行Storm
a. 修改storm.yaml文件
vi storm-0.9.0-wip21/conf/storm.yaml
storm.zookeeper.servers:
- "eb170"
- "eb171"
storm.zookeeper.port: 2182
storm.local.dir: "/home/ebupt/storm/localstorm"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6704
b. 提交运行storm on yarn,并得到一个ApplicationId
storm-yarn launch ~/storm/storm-0.9.0-wip21/conf/storm.yaml
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/ebupt/eb/storm-yarn/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/07/04 15:37:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/04 15:37:45 INFO client.RMProxy: Connecting to ResourceManager at eb170/10.1.69.170:8032 14/07/04 15:37:46 INFO yarn.StormOnYarn: Copy App Master jar from local filesystem and add to local environment 14/07/04 15:37:47 INFO yarn.StormOnYarn: Set the environment for the application master 14/07/04 15:37:47 INFO yarn.StormOnYarn: YARN CLASSPATH COMMAND = [[yarn, classpath]] 14/07/04 15:37:47 INFO yarn.StormOnYarn: YARN CLASSPATH = [/home/ebupt/eb/hadoop-2.2.0/etc/hadoop:/home/ebupt/eb/hadoop-2.2.0/etc/hadoop:/home/ebupt/eb/hadoop-2.2.0/etc/hadoop:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/common/lib/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/common/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/hdfs:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/hdfs/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/yarn/lib/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/yarn/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/mapreduce/*:/home/ebupt/hadoop/contrib/capacity-scheduler/*.jar:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/yarn/*:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/yarn/lib/*] 14/07/04 15:37:47 INFO yarn.StormOnYarn: Using JAVA_HOME = [/home/ebupt/eb/jdk1.7.0_60] 14/07/04 15:37:47 INFO yarn.StormOnYarn: Setting up app master command:[/home/ebupt/eb/jdk1.7.0_60/bin/java, -Dstorm.home=./storm/storm-0.9.0-wip21/, -Dlogfile.name=<LOG_DIR>/master.log, com.yahoo.storm.yarn.MasterServer, 1><LOG_DIR>/stdout, 2><LOG_DIR>/stderr] 14/07/04 15:37:47 INFO impl.YarnClientImpl: Submitted application application_1402648970753_0025 to ResourceManager at eb170/10.1.69.170:8032 application_1402648970753_0025
注:因为storm是作为一个yarn程序运行在集群上的,所以会有一个AppId,如上图所示:application_1402648970753_0025。
6.storm提交任务
a.获取集群配置
storm-yarn getStormConfig -appId application_1402648970753_0025 -output ~/.storm/storm.yaml
b. 通过以下命令得到Nimbus host
cat ~/.storm/storm.yaml | grep nimbus.host
c. 提交Topology
storm jar ~/storm/storm-yarn-master/lib/storm-starter-0.0.1-SNAPSHOT.jar storm.starter.WordCountTopology WordCountTopology -c nimbus.host=<your nimbus host>
d. 监控Topology
查看Storm ui,地址是:http://<your nimbus host>:7070
e. 关闭Topology
storm kill [Topology_name]
f. 关闭Storm on yarn集群
storm-yarn shutdown -appId [applicationId]
g.查看storm进程状态:nimbus、supervisor、core、logviewer、worker
[ebupt@eb171 ~]$ jps
8700 JournalNode
8939 NodeManager
8805 DFSZKFailoverController
31802 worker
8501 NameNode
5189 Jps
31616 supervisor
8592 DataNode
28865 logviewer
31793 worker
31475 MasterServer
31795 worker
5841 HRegionServer
31509 nimbus
31510 core
31577 QuorumPeerMain
7.他山之石
a.有时候发现supervisor启动不来,会发现是内存资源不够。在虚拟机环境的同志要注意这点。(现实中的环境也是如此)
b.nimbus.host是你提交storm到yarn后,yarn会给你分配一个地址,得自己找。
8.参考资料
https://github.com/yahoo/storm-yarn
http://dongxicheng.org/mapreduce-nextgen/storm-on-yarn/
http://blog.csdn.net/jiushuai/article/details/26693311
http://ghost-face.iteye.com/blog/2017374
9.遇到的问题及解决
① 无法加载storm集群
[ebupt@eb170 conf]$ storm-yarn launch storm.yaml
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ebupt/eb/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ebupt/eb/storm-yarn/storm-0.9.0-wip21/lib/logback-classic-1.0.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.UnsupportedClassVersionError: backtype/storm/utils/Utils : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at com.yahoo.storm.yarn.Config.readStormConfig(Config.java:48)
at com.yahoo.storm.yarn.LaunchCommand.process(LaunchCommand.java:59)
at com.yahoo.storm.yarn.Client.execute(Client.java:142)
at com.yahoo.storm.yarn.Client.main(Client.java:148)
原因:java版本问题。(J2SE 7 = 51)实验室环境的java环境是jdk1.6,storm on yarn要求jdk1.7。
解决:升级jdk版本为1.7。
②yarn上提交storm on yarn,任务fail,日志如下。
User: huangq
Name: Storm-on-Yarn
Application Type: YARN
State: FAILED
FinalStatus: FAILED
Started: 4-Jul-2014 10:14:15
Elapsed: 4 sec
Tracking URL: History
Diagnostics:
Application application_1402648970753_0013 failed 2 times due to AM Container for appattempt_1402648970753_0013_000002 exited with exitCode: 126 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
.Failing this attempt.. Failing the application.
原因:yarn给storm on yarn分配的运行机器的jdk版本不是1.7,修改jdk版本后无报错。
解决:所有yarn部署的集群的jdk版本都要统一成jdk1.7。
③ 加载storm集群后,storm-yarn launch ~/storm/storm-0.9.0-wip21/conf/storm.yaml ,在storm UI界面无法看到supervisor的进程。
原因:为什么storm启动后,加载了nimbus、core进程,未能加载supervisor进程。
这是由于“Supervisor将独占节点而不会与其他服务共享节点资源”,正好实验室hadoop2.0测试集群只有2台机器,supervisor已经没有可以独占的节点,导致未能启动。
解决:扩展集群节点数量。另一个测试折衷的办法:手动启动supervisor:storm supervisor & ;坏处是需要自己手动管理supervisor进程,自己kill释放资源。
④mvn编译的时候,报了域名无法解析,下载pom失败的问题。在240的maven上可以编译通过,170集群上出现了这个问题。暂时没解决。
原因:原来怀疑是域名有问题,更改了maven仓库的域名仍然没有解决;第二次尝试:将240的本地仓库拷贝到170机器上,170上仍然没解决。
解决:原因是eb170的机器无法连接外网,因此无法下载外网资源。
⑤supervisor无法启动
原因:
1.yarn集群的配置有问题或者内存不够用,导致无法启动
2.未能够解决,可能是内存不够用。有待研究清楚。
解决:todo。