(1)Local
多用于本地测试,如在eclipse,idea中写程序测试等。
(2)Standalone
Standalone是Spark自带的一个资源调度框架,它支持完全分布式。
(3)Yarn
Hadoop生态圈里面的一个资源调度框架,Spark是可以基于Yarn来计算的,最流行。
(4) Mesos
一种资源调度框架,支持docker,前景最好
这里我用5台机器,1个Master资源调度,3个Worker处理任务,1个Cient提交任务
NameNode | DataNode | Zookeeper | DFSZKFC | JournalNode | Master | Worker | Client | |
node01 | 1 | 1 | 1 | |||||
node02 | 1 | 1 | 1 | 1 | ||||
node03 | 1 | 1 | 1 | 1 | ||||
node04 | 1 | 1 | 1 | 1 | ||||
node05 | 1 | 1 | 1 |
(1)下载解压
下载 http://spark.apache.org/downloads.html
解压 tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
改名 mv spark-2.2.0-bin-hadoop2.7.tgz spark-2.2.0
(2)配置
以node01为例
进入到 /opt/bigdata/spark-2.2.0/conf/下
复制spark环境变量 cp spark-env.sh.template spark-env.sh
vim spark-env.sh,配置以下内容,其他保持默认即可
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
export SPARK_MASTER_HOST=node01 #主节点
export SPARK_MASTER_PORT=7077 #spark-master任务提交端口,默认7077
exprot SPARK_MASTER_WEBUI_PORT=8080 #master http访问端口,默认8080
export SPARK_WORKER_CORES=2 #设置每个worker工作核心数量
export SPARK_WORKER_MEMORY=1g #设置每个worker占用内存
export HADOOP_CONF_DIR=/opt/bigdata/hadoop-2.7.4/etc/hadoop #hadoop配置路径,master、worker可以不设置,但client必须设置
export JAVA_HOME=/usr/local/jdk1.8
复制spark环境变量 cp slaves.template slaves
vim slaves,配置以下内容
# A Spark Worker will be started on each of the machines listed below.
#localhost
node02
node03
node04
复制 cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf,配置以下内容
spark.yarn.jars = hdfs://mycluster/spark/jars/*
创建存放jar目录 hdfs dfs -mkdir /spark/jars
上传jar包 hdfs dfs -put /opt/bigdata/spark2.2.0/jars/* /spark/jars
此项为非必配置项,上传jar包为了每次提交任务时不再上传集群jar包,节省时间和资源
此配置在client节点配置即可,其他节点无需配置
(3)分发配置
在/opt/bigdata/spark2.2.0,分发2、3、4、5节点
scp spark2.2.0 node02:`pwd`
scp spark2.2.0 node03:`pwd`
scp spark2.2.0 node04:`pwd`
scp spark2.2.0 node05:`pwd`
在 opt/hadoop/spark-2.2.0/sbin下
(1) ./start-all.sh启动
(2)./stop-all.sh停止
注意,client不在集群中,不占用集群资源,所以提交时要在client上提交
(1)Standalone-client提交(适用于测试)
nohup ./spark-submit --master spark://node01:7077 --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
或者
nohup ./spark-submit --master spark://node01:7077 --deploy-mode client --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
(2)Standalone-cluster提交(适用于生产)
nohup ./spark-submit --master spark://node01:7077 --deploy-mode cluster --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
(3)YARN-client提交(适用于测试)
yarn提交是把任务提交到了hadoop集群的yarn来管理,所以要启动hadoop集群
此时已经不依赖spark集群,所以spark集群可以停掉,只需在client机提交任务即可(下同)
nohup ./spark-submit --master yarn --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
或者
nohup ./spark-submit --master yarn-client --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
再或者
nohup ./spark-submit --master yarn --deploy-mode client --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
(3)YARN-cluster提交(适用于生产)
nohup ./spark-submit --master yarn-cluster --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
或者
nohup ./spark-submit --master yarn --deploy-mode cluster --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &