部署zeppelin时遇到的spark on yarn的submit方式问题

部署zeppelin时候遇到的一个跟spark submit application模式相关的问题
具体stacktrace 打印如下:

org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't running on a cluster. 
Deployment to YARN is not supported directly by SparkContext. Please use spark-submit.
    at org.apache.spark.SparkContext.(SparkContext.scala:411)
    at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:339)
    at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
    at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:465)

根据信息猜大概是按照yarn-cluster方式提交,而当前方式下并不支持.

跟踪SparkInterpreter.java的createSparkContext()方法

 try { // in case of spark 1.1x, spark 1.2x
      Method classServer = interpreter.intp().getClass().getMethod("classServer"); //interpreter是org.apache.spark.repl.SparkILoop的一个子类
      HttpServer httpServer = (HttpServer) classServer.invoke(interpreter.intp()); //call main函数创建HttpServer实现交互式编程
      classServerUri = httpServer.uri();
    } catch (NoSuchMethodException | SecurityException | IllegalAccessException
        | IllegalArgumentException | InvocationTargetException e) {
      // continue
    }

    if (classServerUri == null) {
      try { // for spark 1.3x
        Method classServer = interpreter.intp().getClass().getMethod("classServerUri");
        classServerUri = (String) classServer.invoke(interpreter.intp());
      } catch (NoSuchMethodException | SecurityException | IllegalAccessException
          | IllegalArgumentException | InvocationTargetException e) {
        // continue instead of: throw new InterpreterException(e);
        // Newer Spark versions (like the patched CDH5.7.0 one) don't contain this method
        logger.warn(String.format("Spark method classServerUri not available due to: [%s]",
          e.getMessage()));
      }
    }

而上述的方式在Yarn上只能支持yarn-client.
stackoverflow的这个问题有对嵌入在web app上的
spark提交方式更详细的说明

The reason yarn-cluster mode isn't supported is that yarn-cluster means bootstrapping 
the driver-program itself (e.g. the program calling using a SparkContext) onto a YARN container.
Guessing from your statement about submitting from a django web app, 
it sounds like you want the python code that contains the SparkContext to be embedded in the web app itself, 
rather than shipping the driver code onto a YARN container which then handles a separate spark job.

This means your case most closely fits with yarn-client mode instead of yarn-cluster; 
in yarn-client mode, you can run your SparkContext code anywhere (like inside your web app), 
while it talks to YARN for the actual mechanics of running jobs.

Fundamentally, if you're sharing any in-memory state between your web app and your Spark code,
that means you won't be able to chop off the Spark portion to run inside a YARN container, 
which is what yarn-cluster tries to do. If you're not sharing state, 
then you can simply invoke a subprocess which actually does call spark-submit to bundle an independent 
PySpark job to run in yarn-cluster mode.

To summarize:

If you want to embed your Spark code directly in your web app, 
you need to use yarn-client mode instead: SparkConf().setMaster("yarn-client")
If the Spark code is loosely coupled enough that yarn-cluster is actually viable, 
you can issue a Python subprocess to actually invoke spark-submit in yarn-cluster mode.

你可能感兴趣的:(Spark)