执行Spark任务的两种方式:spark-submit和spark-shell

1.spark-submit方式:将jar上传到集群,然后到/bin目录下通过spark-submit的方式,执行spark任务:
格式:

spark-submit  --master  spark的地址  --class   全类名   jar包地址  参数 

举个栗子:运行spark自带的测试程序,计算pi的值

./spark-submit --master spark://node3:7077 --class org.apache.spark.examples.SparkPi /usr/local/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar 500

运行结果:

Pi is roughly 3.1414508628290174

2.spark-shell方式:相当于REPL工具,命令行工具,本身也是一个Application
2.1本地模式:不需要连接到Spark集群,在本地直接运行,用于测试

启动命令:bin/spark-shell 后面不写任何参数,代表本地模式:

[root@bigdata111 bin]# ./spark-shell 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/06/18 17:52:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/06/18 17:52:27 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
19/06/18 17:52:27 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
19/06/18 17:52:29 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.226.111:4040
Spark context available as 'sc' (master = local[*], app id = local-1560851538355).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

scala> [root@bigdata111 bin]# 

2.2集群模式
启动命令:bin/spark-shell --master spark://.....

[root@bigdata111 spark-2.1.0-bin-hadoop2.7]# ./bin/spark-shell --master spark://bigdata111:7077

启动之后:

[root@bigdata111 spark-2.1.0-bin-hadoop2.7]# ./bin/spark-shell --master spark://bigdata111:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/06/18 22:47:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/06/18 22:48:07 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.226.111:4040
Spark context available as 'sc' (master = spark://bigdata111:7077, app id = app-20190618224755-0000).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

说明:
Spark context available as 'sc' (master = spark://bigdata111:7077, app id = app-20190618224755-0000).
Spark session available as 'spark'.

Spark session : Spark2.0以后提供的,利用session可以访问所有spark组件(core sql..)

'spark' 'sc' 两个对象,可以直接使用

举个栗子:在Spark shell中 开发一个wordCount程序
(*)读取一个本地文件,将结果打印到屏幕上。
注意:示例必须只有一个worker 且本地文件与worker在同一台服务器上。

scala> sc.textFile("/usr/local/tmp_files/test_WordCount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect

结果:

res0: Array[(String, Int)] = Array((is,1), (love,2), (capital,1), (Beijing,2), (China,2), (hehehehehe,1), (I,2), (of,1), (the,1))

(*)读取一个hdfs文件,进行WordCount操作,并将结果写回hdfs

scala> sc.textFile("hdfs://bigdata111:9000/word.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://bigdata111:9000/result")

说明:这里textFile()里的地址是HDFS上地址
spark任务执行完成之后,会把结果存放在hdfs上的result文件夹里:


执行Spark任务的两种方式:spark-submit和spark-shell_第1张图片
image.png

查看:

[root@bigdata111 opt]# hdfs dfs -ls /result/
Found 3 items
-rw-r--r--   3 root supergroup          0 2019-06-18 23:02 /result/_SUCCESS
-rw-r--r--   3 root supergroup         73 2019-06-18 23:02 /result/part-00000
-rw-r--r--   3 root supergroup         22 2019-06-18 23:02 /result/part-00001
[root@bigdata111 opt]# hdfs dfs -cat /result/*
(shuai,1)
(are,1)
(b,1)
(best,1)
(zouzou,1)
(word,1)
(hello,1)
(world,1)
(you,1)
(a,1)
(the,1)

你可能感兴趣的:(执行Spark任务的两种方式:spark-submit和spark-shell)