spark2.2.0集群配置

1、集群模式简介

 

(1)Local

多用于本地测试,如在eclipse,idea中写程序测试等。

(2)Standalone

Standalone是Spark自带的一个资源调度框架,它支持完全分布式。

(3)Yarn

Hadoop生态圈里面的一个资源调度框架,Spark是可以基于Yarn来计算的,最流行。

(4) Mesos

一种资源调度框架,支持docker,前景最好

 

2、资源分配

 

这里我用5台机器,1个Master资源调度,3个Worker处理任务,1个Cient提交任务

  NameNode DataNode Zookeeper DFSZKFC JournalNode Master Worker Client
node01 1     1   1    
node02   1 1   1   1  
node03   1 1   1   1  
node04   1 1   1   1  
node05 1     1       1


3、集群配置

(1)下载解压

    下载 http://spark.apache.org/downloads.html

    解压 tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz

    改名 mv  spark-2.2.0-bin-hadoop2.7.tgz  spark-2.2.0

(2)配置

    以node01为例

    进入到 /opt/bigdata/spark-2.2.0/conf/下

 

  • 配置spark-env.sh

    复制spark环境变量 cp spark-env.sh.template spark-env.sh

    vim  spark-env.sh,配置以下内容,其他保持默认即可

# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
export SPARK_MASTER_HOST=node01  #主节点
export SPARK_MASTER_PORT=7077    #spark-master任务提交端口,默认7077 
exprot SPARK_MASTER_WEBUI_PORT=8080 #master http访问端口,默认8080
export SPARK_WORKER_CORES=2      #设置每个worker工作核心数量
export SPARK_WORKER_MEMORY=1g    #设置每个worker占用内存
export HADOOP_CONF_DIR=/opt/bigdata/hadoop-2.7.4/etc/hadoop  #hadoop配置路径,master、worker可以不设置,但client必须设置
export JAVA_HOME=/usr/local/jdk1.8
  • 配置slaves

    复制spark环境变量 cp slaves.template slaves

    vim  slaves,配置以下内容

# A Spark Worker will be started on each of the machines listed below.
#localhost
node02
node03
node04
  • 配置spark-defaults.conf

    复制 cp spark-defaults.conf.template  spark-defaults.conf

    vim spark-defaults.conf,配置以下内容

spark.yarn.jars = hdfs://mycluster/spark/jars/*

    创建存放jar目录  hdfs dfs -mkdir /spark/jars

    上传jar包  hdfs dfs -put /opt/bigdata/spark2.2.0/jars/* /spark/jars

    此项为非必配置项,上传jar包为了每次提交任务时不再上传集群jar包,节省时间和资源

    此配置在client节点配置即可,其他节点无需配置

(3)分发配置

    在/opt/bigdata/spark2.2.0,分发2、3、4、5节点

scp spark2.2.0 node02:`pwd`
scp spark2.2.0 node03:`pwd`
scp spark2.2.0 node04:`pwd`
scp spark2.2.0 node05:`pwd`

4、集群启动与停止

 

在 opt/hadoop/spark-2.2.0/sbin下

(1) ./start-all.sh启动

(2)./stop-all.sh停止

5、提交任务

注意,client不在集群中,不占用集群资源,所以提交时要在client上提交

(1)Standalone-client提交(适用于测试)

nohup ./spark-submit --master spark://node01:7077 --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
或者
nohup ./spark-submit --master spark://node01:7077 --deploy-mode client --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

(2)Standalone-cluster提交(适用于生产)

nohup ./spark-submit --master spark://node01:7077 --deploy-mode cluster --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

(3)YARN-client提交(适用于测试)

    yarn提交是把任务提交到了hadoop集群的yarn来管理,所以要启动hadoop集群

    此时已经不依赖spark集群,所以spark集群可以停掉,只需在client机提交任务即可(下同)

nohup ./spark-submit --master yarn --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
或者
nohup ./spark-submit --master yarn-client --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
再或者
nohup ./spark-submit --master yarn --deploy-mode client --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

(3)YARN-cluster提交(适用于生产)

nohup ./spark-submit --master yarn-cluster --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
或者
nohup ./spark-submit --master yarn --deploy-mode cluster --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

 

 

 

你可能感兴趣的:(Bigdata,Spark)