Spark集群基于Zookeeper的HA搭建部署

官方文档

http://spark.apache.org/docs/latest/monitoring.html

下载地址http://flume.apache.org/download.html

hadoop HA安装步骤

http://blog.csdn.net/haoxiaoyan/article/details/52623393

zookeeper安装步骤

http://blog.csdn.net/haoxiaoyan/article/details/52523866

安装Scala(在master上)

[hadoop@masternode1 hadoop]# tar -xvzf scala-2.11.8.tgz 

[hadoop@masternode1 hadoop]# mv scala-2.11.8 scala

1. 设置环境变量

[hadoop@masternode1 hadoop]# vi /etc/profile

#set scala

export SCALA_HOME=/opt/hadoop/scala

export PATH=$PATH:$SCALA_HOME/bin

"/etc/profile" 96L, 2426C written

[hadoop@masternode1 hadoop]# source /etc/profile

[hadoop@masternode1 hadoop]# scala -version

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

2. 将/opt/hadoop/scala从master复制到另外8台机器1上。 

[hadoop@masternode1 hadoop]$ for i in {31,32,33,34,35,36,37,38,39};do scp -r scala  [email protected]1.2$i:/opt/hadoop/ ; done

3. 将环境变量文件复制到其他机器上

[hadoop@masternode1 hadoop]$ for i in {31,32,33,34,35,36,37,38,39};do scp ~/.bash_profile hadoop@192.168.231.2$i:~/.bash_profile ; done

4. 安装spark

① 解压软件

[hadoop@masternode1 hadoop]# tar -xzvf spark-1.6.1-bin-hadoop2.6.tgz 

[hadoop@masternode1 hadoop]# mv spark-1.6.1-bin-hadoop2.6 spark

[hadoop@masternode1 conf]# pwd

/opt/hadoop/spark/conf

[hadoop@masternode1 conf]# cp spark-env.sh.template spark-env.sh

[hadoop@masternode1 conf]# cp spark-defaults.conf.template spark-defaults.conf

② 配置spark-env.sh

vi spark-env.sh

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.2

export SCALA_HOME=/opt/hadoop/scala

export JAVA_HOME=/usr/java/jdk1.7.0_79

###the max memory size of worker

export SPARK_WORKER_MEMORY=1g

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_WORK_DIR=/opt/hadoop/spark/work

export SPARK_LOCAL_DIRS=/opt/hadoop/spark/tmp

export SPARK_DAEMON_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=slavenode1:2181,slavenode2:2181,slavenode3:2181,slavenode4:2181,slavenode5:2181,slavenode6:2181,slavenode7:2181 -Dspark.deploy.zookeeper.dir=/spark"

③ 修改slaves文件    

[hadoop@masternode1 conf]# cp slaves.template slaves

vi slaves 

添加如下

slavenode1

slavenode2

slavenode3

slavenode4

slavenode5

slavenode6

slavenode7

slavenode8

 

④ 修改spark-defaults.conf

vi spark-defaults.conf

spark.eventLog.enabled           true

spark.eventLog.dir               hdfs://cluster-ha/spark/logs

 

[hadoop@masternode1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark

[hadoop@masternode1 sbin]$ hdfs dfs -mkdir hdfs://cluster-ha/spark/logs

⑤ 拷贝spark目录和环境变量设置文件到其他机器上

[hadoop@masternode1 conf]$ for i in {31,32,33,34,35,36,37,38,39};do scp -r spark 192.168.231.2$i:/opt/hadoop/ ; done

[hadoop@masternode1 hadoop]#for i in {31,32,33,34,35,36,37,38,39};do scp  ~/.bash_profile hadoop@192.168.231.2$i:~/.bash_profile ; done

⑥ 启动spark

[hadoop@masternode1 hadoop]# cd spark

[hadoop@masternode1 spark]# cd sbin/

[hadoop@masternode1 sbin]$ ./start-all.sh 

[hadoop@masternode1 sbin]$ jps

9836 RunJar

3428 DFSZKFailoverController

2693 NameNode

5851 RunJar

3635 ResourceManager

14061 Jps

13859 Master

3924 HMaster

 

masternode2(192.168.237.231)节点,把start-master.sh 启动,当masternode1(192.168.237.230)挂掉时,masternode2顶替当master

[hadoop@masternode2 sbin]$ ./start-master.sh

[hadoop@masternode2 sbin]$ jps

11867 Master

3704 DFSZKFailoverController

2683 NameNode

3923 ResourceManager

12064 Jps

4500 HMaster

 

[hadoop@masternode1 bin]# ./spark-shell 

SQL context available as sqlContext.

scala> 

出现scala>时说明成功。

⑦ 测试HA是否生效

(1)先查看一下两个节点的运行情况,现在masternode1运行了master,masternode2是待命状态

进入Spark的Web管理页面:

http://192.168.237.231:8080/

 

 

http://192.168.237.230:8080/

 

(2)在masternode1上把master服务停掉

[hadoop@masternode1 sbin]$ ./stop-master.sh 

用浏览器访问masternode1的8080端口,看是否还活着。以下可以看出,master已经挂掉

 

再用浏览器访问查看masternode2的状态,从下图看出,masternode2已经被切换当master了

 

注意如果8080端口被其他程序占用,spark会自动往上加一

采用集群方式启动spark-shell

[hadoop@masternode1 sbin]$ spark-shell --master spark://masternode1:7077 &

 

http://192.168.237.230:4040/jobs/

 

http://spark.apache.org/docs/latest/configuration.html


你可能感兴趣的:(学习笔记,经验,hadoop,大数据+机器学习+oracle)