利用CDH进行spark集群部署

1. 规划:

master: brain01

worker: brain02、brain03、brain04

2. 准备工作:

2.1 安装Java 1.7

2.2 设置主机名,编辑/etc/hosts

2.3 关闭iptables: 

service iptables stop

chkconfig iptables off

2.4 关闭selinux,修改文件/etc/selinux/config,然后重启操作系统

SELINUX=disabled

3.部署

3.1 brain01~brain04:

yum install spark-core spark-master spark-worker spark-history-server spark-python -y

3.2 brain01上,修改/etc/spark/conf/spark-env.sh

vi /etc/spark/conf/spark-env.sh

export STANDALONE_SPARK_MASTER_HOST=brain01

3.3 brain01上,修改/etc/spark/conf/spark-defaults.conf

spark.master                     spark://brain01:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://brain01:8020/user/spark/eventlog
spark.yarn.historyServer.address http://brain01:18081
spark.executor.memory            12g
spark.logConf                    true
spark.yarn.jar hdfs://brain01:8020/user/spark/share/lib/spark-assembly.jar

3.4 brain01上,修改/etc/default/spark
export SPARK_HISTORY_SERVER_LOG_DIR=hdfs://ctdn01:8020/user/spark/eventlog

3.5 scp  上述各文件 brain02/3/4:/上述各文件(含目录)

3.6 hdfs操作:

sudo -u hdfs hadoop fs -mkdir /user/spark 
sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory 
sudo -u hdfs hadoop fs -mkdir /user/spark/eventlog 
sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark 
sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory
sudo -u hdfs hadoop fs -chmod 1777 /user/spark/eventlog

3.7 优化:向HDFS上传spark-assembly.jar文件,从而提高集群加载该依赖文件的速度;上传spark-examples.jar文件是为了提高cluster模式下加载应用程序的速度

vi /etc/spark/conf/spark-defaults.conf

spark.yarn.jar hdfs://brain01:8020/user/spark/share/lib/spark-assembly.jar

执行如下命令:

sudo -u hdfs hadoop fs -mkdir -p /user/spark/share/lib 
sudo -u hdfs hadoop fs -put /usr/lib/spark/lib/spark-assembly.jar /user/spark/share/lib/spark-assembly.jar
sudo -u hdfs hadoop fs -put /usr/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.8.0-hadoop2.6.0-cdh5.8.0.jar /user/spark/share/lib/spark-examples.jar
sudo -u hdfs hadoop fs -chown -R root:spark /user/spark/share/lib

4. 启动服务

brain01:

sudo service spark-master start
sudo service spark-history-server start
sudo service spark-worker start

brain02/3/4:

sudo service spark-worker start

5. 测试

5.1 yarn-client执行模式:  连接到YARN集群,driver在client运行,而executor在cluster中运行

spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client --master yarn --driver-library-path /usr/lib/hadoop/lib/native/ --driver-class-path /usr/lib/hadoop/lib/ /usr/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.8.0-hadoop2.6.0-cdh5.8.0.jar 10


5.2 yarn-cluster执行模式:  连接到YARN集群,driver和executor都在cluster中运行
spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn --driver-library-path /usr/lib/hadoop/lib/native/ --driver-class-path /usr/lib/hadoop/lib/ hdfs://brain01:8020/user/spark/share/lib/spark-examples.jar 10

你可能感兴趣的:(hadoop/hive)