1. 规划:
master: brain01
worker: brain02、brain03、brain04
2. 准备工作:
2.1 安装Java 1.7
2.2 设置主机名,编辑/etc/hosts
2.3 关闭iptables:
service iptables stop
chkconfig iptables off
2.4 关闭selinux,修改文件/etc/selinux/config,然后重启操作系统
SELINUX=disabled
3.部署
3.1 brain01~brain04:
yum install spark-core spark-master spark-worker spark-history-server spark-python -y
3.2 brain01上,修改/etc/spark/conf/spark-env.sh
vi /etc/spark/conf/spark-env.sh
export STANDALONE_SPARK_MASTER_HOST=brain01
3.3 brain01上,修改/etc/spark/conf/spark-defaults.conf
spark.master spark://brain01:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://brain01:8020/user/spark/eventlog
spark.yarn.historyServer.address http://brain01:18081
spark.executor.memory 12g
spark.logConf true
spark.yarn.jar hdfs://brain01:8020/user/spark/share/lib/spark-assembly.jar
3.4 brain01上,修改/etc/default/spark
export SPARK_HISTORY_SERVER_LOG_DIR=hdfs://ctdn01:8020/user/spark/eventlog
3.5 scp 上述各文件 brain02/3/4:/上述各文件(含目录)
3.6 hdfs操作:
sudo -u hdfs hadoop fs -mkdir /user/spark
sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory
sudo -u hdfs hadoop fs -mkdir /user/spark/eventlog
sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark
sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory
sudo -u hdfs hadoop fs -chmod 1777 /user/spark/eventlog
3.7 优化:向HDFS上传spark-assembly.jar文件,从而提高集群加载该依赖文件的速度;上传spark-examples.jar文件是为了提高cluster模式下加载应用程序的速度
vi /etc/spark/conf/spark-defaults.conf
spark.yarn.jar hdfs://brain01:8020/user/spark/share/lib/spark-assembly.jar
执行如下命令:
sudo -u hdfs hadoop fs -mkdir -p /user/spark/share/lib
sudo -u hdfs hadoop fs -put /usr/lib/spark/lib/spark-assembly.jar /user/spark/share/lib/spark-assembly.jar
sudo -u hdfs hadoop fs -put /usr/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.8.0-hadoop2.6.0-cdh5.8.0.jar /user/spark/share/lib/spark-examples.jar
sudo -u hdfs hadoop fs -chown -R root:spark /user/spark/share/lib
4. 启动服务
brain01:
sudo service spark-master start
sudo service spark-history-server start
sudo service spark-worker start
brain02/3/4:
sudo service spark-worker start
5. 测试
5.1 yarn-client执行模式: 连接到YARN集群,driver在client运行,而executor在cluster中运行
spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client --master yarn --driver-library-path /usr/lib/hadoop/lib/native/ --driver-class-path /usr/lib/hadoop/lib/ /usr/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.8.0-hadoop2.6.0-cdh5.8.0.jar 10
5.2 yarn-cluster执行模式: 连接到YARN集群,driver和executor都在cluster中运行
spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn --driver-library-path /usr/lib/hadoop/lib/native/ --driver-class-path /usr/lib/hadoop/lib/ hdfs://brain01:8020/user/spark/share/lib/spark-examples.jar 10