spark学习-55-源代码:SparkSession的的创建

1。首先我们在自己的程序中创建SparkSession

  spark= SparkSession.builder()  
                    .appName("lcc_java_habase_local")  
                    .master("local[4]")
                    .getOrCreate();  

2。我们看看这一句 做了什么

   /**
   * Creates a [[SparkSession.Builder]] for constructing a [[SparkSession]].
   *
   * @since 2.0.0
   */
  def builder(): Builder = new Builder

3。看看Builder这个类,这个类是SparkSession的伴生对象object SparkSession内部的一个内部类class Builder extends Logging

加载了一些外部的规则

    // spark的一些外部的扩展点(分析规则,检查分析规则,优化器规则,规划策略,自定义解析器,(外部)目录侦听器)
    private[this] val extensions = new SparkSessionExtensions

主要方法1:启用Hive支持

 /**
     * Enables Hive support, including connectivity to a persistent Hive metastore, support for
     * Hive serdes, and Hive user-defined functions.
      *
      * 启用hive支持,包括连接到一个持久的hive元数据,支持hive serdes,以及hive的定义函数。
     *  这里是否启用hive
     * @since 2.0.0
     */
    def enableHiveSupport(): Builder = synchronized {
      if (hiveClassesArePresent) {
        config(CATALOG_IMPLEMENTATION.key, "hive")
      } else {
        throw new IllegalArgumentException(
          "Unable to instantiate SparkSession with Hive support because " +
            "Hive classes are not found.")
      }
    }

主要方法2:创建SparkSession

/**
     * Gets an existing [[SparkSession]] or, if there is no existing one, creates a new
     * one based on the options set in this builder.
     *
     * This method first checks whether there is a valid thread-local SparkSession,
     * and if yes, return that one. It then checks whether there is a valid global
     * default SparkSession, and if yes, return that one. If no valid global default
     * SparkSession exists, the method creates a new SparkSession and assigns the
     * newly created SparkSession as the global default.
     *
     * In case an existing SparkSession is returned, the config options specified in
     * this builder will be applied to the existing SparkSession.
      *
      * 得到一个已经存在的[[SparkSession]] 或者 如果没有一个已经存在的[[SparkSession]],那么就根据设置的选项重新创建一个。
      *
      * 这个方法首先检查是否有一个有效的线程本地SparkSession,如果是,返回那个。
      * 然后它检查是否有一个有效的全局默认SparkSession,如果是,返回那个。
      * 如果没有有效的全局缺省SparkSession,该方法将创建一个新的SparkSession,并将新创建的SparkSession分配为全局默认值。
      *
      * 如果返回现有的SparkSession,则该构建器中指定的配置选项将应用于现有的SparkSession。
     *
     * @since 2.0.0
     */
    def getOrCreate(): SparkSession = synchronized {
      // Get the session from current thread's active session. 直接获取该线程对应的SparkSession
      var session = activeThreadSession.get()
      // 如果SparkSecssion不为空,或者 SparkSession没有停止 ,那么就直接返回当前的SparkSession
      if ((session ne null) && !session.sparkContext.isStopped) {
        options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
        if (options.nonEmpty) {
          logWarning("Using an existing SparkSession; some configuration may not take effect.")
        }
        return session
      }

      // Global synchronization so we will only set the default session once. 全局同步,因此我们只设置一次默认会话。
      SparkSession.synchronized {
        // If the current thread does not have an active session, get it from the global session.
        // 如果当前线程不存在一个活动的SparkSession,那么就从全局获取一个
        session = defaultSession.get()
        if ((session ne null) && !session.sparkContext.isStopped) {
          options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
          if (options.nonEmpty) {
            logWarning("Using an existing SparkSession; some configuration may not take effect.")
          }
          return session
        }

        // No active nor global default session. Create a new one.
        // 没有活动的和全局的默认的SparkSession,那么就创建一个
        val sparkContext = userSuppliedContext.getOrElse {
          // set app name if not given
          val randomAppName = java.util.UUID.randomUUID().toString
          // 初始化Spark的配置类
          val sparkConf = new SparkConf()
          options.foreach { case (k, v) => sparkConf.set(k, v) }
          // 如果Spark没有指定名称,那么我们就默认一个随机的
          if (!sparkConf.contains("spark.app.name")) {
            sparkConf.setAppName(randomAppName)
          }

          /** 直接调用 SparkContext的伴生类创建sc */
          val sc = SparkContext.getOrCreate(sparkConf)

          // maybe this is an existing SparkContext, update its SparkConf which maybe used
          // by SparkSession
          // 也许这是一个现有的SparkContext,更新它的SparkConf,可能是SparkSession使用的
          options.foreach { case (k, v) => sc.conf.set(k, v) }
          if (!sc.conf.contains("spark.app.name")) {
            sc.conf.setAppName(randomAppName)
          }
          sc
        }

        // Initialize extensions if the user has defined a configurator class.
        // 如果用户定义了配置器类,则初始化扩展。
        val extensionConfOption = sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
        if (extensionConfOption.isDefined) {
          val extensionConfClassName = extensionConfOption.get
          try {
            val extensionConfClass = Utils.classForName(extensionConfClassName)
            val extensionConf = extensionConfClass.newInstance()
              .asInstanceOf[SparkSessionExtensions => Unit]
            extensionConf(extensions)
          } catch {
            // Ignore the error if we cannot find the class or when the class has the wrong type.
            case e @ (_: ClassCastException |
                      _: ClassNotFoundException |
                      _: NoClassDefFoundError) =>
              logWarning(s"Cannot use $extensionConfClassName to configure session extensions.", e)
          }
        }

        session = new SparkSession(sparkContext, None, None, extensions)
        options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
        defaultSession.set(session)

        // Register a successfully instantiated context to the singleton. This should be at the
        // end of the class definition so that the singleton is updated only if there is no
        // exception in the construction of the instance.
        // 向singleton注册一个成功的实例化context。这应该在类定义的末尾,只有在实例的构造中没有异常时才更新singleton。

        sparkContext.addSparkListener(new SparkListener {
          override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = {
            defaultSession.set(null)
            sqlListener.set(null)
          }
        })
      }

      return session
    }
  }

你可能感兴趣的:(大数据-spark)