官网:
http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/launcher/package-summary.html
参照这个例子我写出了launcher,可以用java 命令行执行spark编写的业务程序
今天又搜索了一下看到一篇文章,以下是网友的网上的原文:
Sometimes we need to start our spark application from the another scala/java application. So we can use SparkLauncher. we have an example in which we make spark application and run it with another scala application.
Let see our spark application code.
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object SparkApp extends App{
val conf=new SparkConf().setMaster("local[*]").setAppName("spark-app")
val sc=new SparkContext(conf)
val rdd=sc.parallelize(Array(2,3,2,1))
rdd.saveAsTextFile("result")
sc.stop()
}
This is our simple spark application, make a jar of this application using sbt assembly, now we make a scala application through which we start this spark application as follows:
import org.apache.spark.launcher.SparkLauncher
object Launcher extends App {
val spark = new SparkLauncher()
.setSparkHome("/home/knoldus/spark-1.4.0-bin-hadoop2.6")
.setAppResource("/home/knoldus/spark_launcher-assembly-1.0.jar")
.setMainClass("SparkApp")
.setMaster("local[*]")
.launch();
spark.waitFor();
}
In the above code we use SparkLauncher object and set values for its like
setSparkHome(“/home/knoldus/spark-1.4.0-bin-hadoop2.6”) is use to set spark home which is use internally to call spark submit.
.setAppResource(“/home/knoldus/spark_launcher-assembly-1.0.jar”) is use to specify jar of our spark application.
.setMainClass(“SparkApp”) the entry point of the spark program i.e driver program.
.setMaster(“local[*]”) set the address of master where its start here now we run it on loacal machine.
.launch() is simply start our spark application.
Its a minimal requirement you can also set many other configurations like pass arguments, add jar , set configurations etc.
For source code you can check out following git repo:
Spark_laucher is our spark application
launcher_app is our scala application which start spark application
Change path according to you and make a jar of Spark_laucher, run launcher_app and see result RDD in this directory as a result of spark application because we simple save it as a text file.
https://github.com/phalodi/Spark-launcher
大概的意思是,你写一个myspark.jar这样的文件,这个文件按照正常的spark流程写。
然后写一个launcher,我的理解就是写一个类似spark-class这样的程序启动或者调用你上面写的那个myspark.jar文件
关键地方:setAppResource,setMainClass,setMaster,分别设置你的myspark.jar和这个jar中要运行的类名,还有运行的模式。经过测试setMaster好像只支持“yarn-client”.(实验中测试的结果,猜测因为我的代码里面有交换的东西,所以这里得到只支持yanr-client.理论上支持yarn-cluster模式。可能是我的程序问题)
用java使用的方法。
java -jar spakr-launcher 就可以不在使用脚本(spark-submit这样的方式)运行你的myspark.jar文件了(这样的方式运行,看不到输出到屏幕的结果,其实官网上有这样的例子,还有输出到屏幕的例子,好像用的outputstream,忘记了。。。。。)
为什么使用这样的方式:因为myspark业务程序需要结合web容器运行,如果能在java的环境运行,spark运行在web容器里。(还没有测试.......)