官方文档
http://spark.apache.org/docs/latest/monitoring.html
下载地址http://flume.apache.org/download.html
hadoop HA安装步骤
http://blog.csdn.net/haoxiaoyan/article/details/52623393
zookeeper安装步骤
http://blog.csdn.net/haoxiaoyan/article/details/52523866
安装Scala(在master上)
[hadoop@masternode1 hadoop]# tar -xvzf scala-2.11.8.tgz
[hadoop@masternode1 hadoop]# mv scala-2.11.8 scala
1. 设置环境变量
[hadoop@masternode1 hadoop]# vi /etc/profile
#set scala
export SCALA_HOME=/opt/hadoop/scala
export PATH=$PATH:$SCALA_HOME/bin
"/etc/profile" 96L, 2426C written
[hadoop@masternode1 hadoop]# source /etc/profile
[hadoop@masternode1 hadoop]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
2. 将/opt/hadoop/scala从master复制到另外8台机器1上。
[hadoop@masternode1 hadoop]$ for i in {31,32,33,34,35,36,37,38,39};do scp -r scala [email protected]1.2$i:/opt/hadoop/ ; done
3. 将环境变量文件复制到其他机器上
[hadoop@masternode1 hadoop]$ for i in {31,32,33,34,35,36,37,38,39};do scp ~/.bash_profile hadoop@192.168.231.2$i:~/.bash_profile ; done
4. 安装spark
① 解压软件
[hadoop@masternode1 hadoop]# tar -xzvf spark-1.6.1-bin-hadoop2.6.tgz
[hadoop@masternode1 hadoop]# mv spark-1.6.1-bin-hadoop2.6 spark
[hadoop@masternode1 conf]# pwd
/opt/hadoop/spark/conf
[hadoop@masternode1 conf]# cp spark-env.sh.template spark-env.sh
[hadoop@masternode1 conf]# cp spark-defaults.conf.template spark-defaults.conf
② 配置spark-env.sh
vi spark-env.sh
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2
export SCALA_HOME=/opt/hadoop/scala
export JAVA_HOME=/usr/java/jdk1.7.0_79
###the max memory size of worker
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_WORK_DIR=/opt/hadoop/spark/work
export SPARK_LOCAL_DIRS=/opt/hadoop/spark/tmp
export SPARK_DAEMON_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 -Dspark.deploy.zookeeper.dir=/spark"
③ 修改slaves文件
[hadoop@masternode1 conf]# cp slaves.template slaves
vi slaves
添加如下
slavenode1
slavenode2
slavenode3
slavenode4
slavenode5
slavenode6
slavenode7
slavenode8
④ 修改spark-defaults.conf
vi spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://cluster-ha/spark/logs
[hadoop@masternode1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark
[hadoop@masternode1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark/logs
⑤ 拷贝spark目录和环境变量设置文件到其他机器上
[hadoop@masternode1 conf]$ for i in {31,32,33,34,35,36,37,38,39};do scp -r spark 192.168.231.2$i:/opt/hadoop/ ; done
[hadoop@masternode1 hadoop]#for i in {31,32,33,34,35,36,37,38,39};do scp ~/.bash_profile hadoop@192.168.231.2$i:~/.bash_profile ; done
⑥ 启动spark
[hadoop@masternode1 hadoop]# cd spark
[hadoop@masternode1 spark]# cd sbin/
[hadoop@masternode1 sbin]$ ./start-all.sh
[hadoop@masternode1 sbin]$ jps
9836 RunJar
3428 DFSZKFailoverController
2693 NameNode
5851 RunJar
3635 ResourceManager
14061 Jps
13859 Master
3924 HMaster
进到masternode2(192.168.237.231)节点,把start-master.sh 启动,当masternode1(192.168.237.230)挂掉时,masternode2顶替当master
[hadoop@masternode2 sbin]$ ./start-master.sh
[hadoop@masternode2 sbin]$ jps
11867 Master
3704 DFSZKFailoverController
2683 NameNode
3923 ResourceManager
12064 Jps
4500 HMaster
[hadoop@masternode1 bin]# ./spark-shell
SQL context available as sqlContext.
scala>
出现scala>时说明成功。
⑦ 测试HA是否生效
(1)先查看一下两个节点的运行情况,现在masternode1运行了master,masternode2是待命状态
进入Spark的Web管理页面:
http://192.168.237.231:8080/
http://192.168.237.230:8080/
(2)在masternode1上把master服务停掉
[hadoop@masternode1 sbin]$ ./stop-master.sh
用浏览器访问masternode1的8080端口,看是否还活着。以下可以看出,master已经挂掉
再用浏览器访问查看masternode2的状态,从下图看出,masternode2已经被切换当master了
注意如果8080端口被其他程序占用,spark会自动往上加一
采用集群方式启动spark-shell
[hadoop@masternode1 sbin]$ spark-shell --master spark://masternode1:7077 &
http://192.168.237.230:4040/jobs/
http://spark.apache.org/docs/latest/configuration.html