linux上运行spark程序

参考官网

spark-submit

在spark安装目录的bin目录下有一个spark-submit脚本,可以用来提交运行spark程序

如果配置了spark的path可以直接使用spark-submit命令

编译构建spark程序

使用sbt 或者maven构建程序生成jar包

spark-submit的使用

spark-submit \
  --class <main-class> \   --master  \   --deploy-mode  \   --conf = \   ... # other options
   \
  [application-arguments]

--class: 要运行的jar包里的类,比如 test.spark.examples

--master: master的地址 比如 spark://23.195.26.187:7077

--deploy-mode: 部署模式

--conf: 运行时的一些配置 “key=value”类型

application-jar: 要运行的jar包路径,可以是hdfs:// 开头或者 file:// 开头。比如:/root/program/spark/test.jar

application-arguments: 要传给运行类主方法的参数,没有可以不传

例子

# 本地运行,使用8个核心,传入参数100
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \   --master local[8] \  /path/to/examples.jar \
  100

# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \   --master spark://207.184.161.138:7077 \   --executor-memory 20G \   --total-executor-cores 100 \  /path/to/examples.jar \
  1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \   --master spark://207.184.161.138:7077 \   --deploy-mode cluster \   --supervise \   --executor-memory 20G \   --total-executor-cores 100 \  /path/to/examples.jar \
  1000

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \   --master yarn \   --deploy-mode cluster \ # can be client for client mode   --executor-memory 20G \   --num-executors 50 \  /path/to/examples.jar \
  1000

# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \   examples/src/main/python/pi.py \
  1000

# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \   --master mesos://207.184.161.138:7077 \   --deploy-mode cluster \   --supervise \   --executor-memory 20G \   --total-executor-cores 100 \   http://path/to/examples.jar \
  1000

例子:

程序:

linux上运行spark程序_第1张图片

路径:

/root/worspace/test-1.0.jar

命令:

spark-submit --class SparkSQLExample --master local /root/worspace/test-1.0.jar

结果:

部分输出如下

17/10/09 17:58:20 INFO DAGScheduler: ResultStage 9 (show at SparkSQLExample.scala:104) finished in 0.027 s
17/10/09 17:58:20 INFO DAGScheduler: Job 7 finished: show at SparkSQLExample.scala:104, took 0.044894 s +--------------------+----+-------+
| _corrupt_record| age| name| +--------------------+----+-------+
|                null|null|Michael|
|                null|  30|   Andy|
|                null|  19| Justin|
|spark-submit --cl...|null|   null|
| 100|null| null| +--------------------+----+-------+

你可能感兴趣的:(笔记,spark学习,spark学习)