Spark学习笔记之-Spark 命令及程序入口

Spark有几种不同的提交任务的脚本,可参考以下这个文章,写的简洁明了
http://blog.csdn.net/lovehuangjiaju/article/details/48768371

实际上可以看出spark各种脚本,spark-shell、spark-sql实现方式都是通过调用spark-submit脚本来实现的,而spark-submit又是通过 spark-class脚本来实现的,spark-class脚本最终执行org.apache.spark.launcher.Main,作为整个 Spark程序的主入口

org.apache.spark.launcher.Main中会判断调用的是哪个方法
try {
if (className.equals("org.apache.spark.deploy.SparkSubmit")) {
builder = new SparkSubmitCommandBuilder(args);//调用org.apache.spark.deploy.SparkSubmit为例
} else {
builder = new SparkClassCommandBuilder(className, args);//其他的类
}
printLaunchCommand = !isEmpty(System.getenv("SPARK_PRINT_LAUNCH_COMMAND"));
printUsage = false;
} catch (IllegalArgumentException e) {
builder = new UsageCommandBuilder(e.getMessage());
printLaunchCommand = false;
printUsage = true;
}
这样就进入了 org.apache.spark.deploy.SparkSubmit方法
入口函数
def main(args: Array[String]): Unit = {
val appArgs = new SparkSubmitArguments(args)
if (appArgs.verbose) {
printStream.println(appArgs)
}
appArgs.action match {
case SparkSubmitAction.SUBMIT => submit(appArgs) // 入口
case SparkSubmitAction.KILL => kill(appArgs)
case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
}
}
submit主要实现:
/**
* Submit the application using the provided parameters.
*
* This runs in two steps. First, we prepare the launch environment by setting up
* the appropriate classpath, system properties, and application arguments for
* running the child main class based on the cluster manager and the deploy mode.
* Second, we use this launch environment to invoke the main method of the child
* main class.
*/
private def submit(args: SparkSubmitArguments): Unit = {
val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args)

def doRunMain(): Unit = {
if (args.proxyUser != null) {
val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
UserGroupInformation.getCurrentUser())
try {
proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
override def run(): Unit = {
runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)//runMain函数入口
}
})
} catch {
case e: Exception =>
// Hadoop's AuthorizationException suppresses the exception's stack trace, which
// makes the message printed to the output by the JVM not very helpful. Instead,
// detect exceptions with empty stack traces here, and treat them differently.
if (e.getStackTrace().length == 0) {
printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
exitFn()
} else {
throw e
}
}
} else {
runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
}
}



你可能感兴趣的:(Spark学习笔记之-Spark 命令及程序入口)