1.Spark Submit任务提交

本文基于http://www.louisvv.com/archives/1340.html修改,感谢原作者。
Spark源码版本: 2.1.0


Spark在集群上的运行方式.png

Spark在集群上的运行方式及相关概念

Spark应用程序在集群上以独立的进程集运行,整个的任务执行过程如下:

  1. 用户提交编写的程序(Driver Program),Driver初始化SparkContext对象,SparkContext负责将应用程序在集群上启动运行;

  2. SparkContext连接到Cluster Manager,申请资源,注册Application;

  3. SparkContext连接到Cluster Manager后,Cluster Manager根据申请的资源,在集群中的Worker节点上创建并启动Executor;

  4. 启动Executor后,Executor将注册信息发送给Driver,告诉Driver,Executor已添加;

  5. SparkContext初始化过程中会创建并启动DAGScheduler,DAGScheduler将用户编写的程序转化为Task,DAGScheduler将Task发送给指定Executor,进行任务计算 ;

  6. Executor将Task计算结果返回给Driver,Spark任务计算完毕,一系列处理关闭Spark任务。


提交Spark程序:

从Spark任务的第一步开始,如何提交用户编写的程序?Spark提交程序
使用$SPARK_HOME/bin目录下的 spark-submit 脚本去提交用户的程序

spark-sumbit脚本提交示例.png
# Run application locally on 8 cores
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000

# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

未完待续。。。

你可能感兴趣的:(1.Spark Submit任务提交)