Spark 之 format

spark sql 默认写的文件格式
如果是hive 表,走的是这里
'  def getDefaultStorage(conf: SQLConf): CatalogStorageFormat = {
    // To respect hive-site.xml, it peeks Hadoop configuration from existing Spark session,
    // as an easy workaround. See SPARK-27555.
    val defaultFormatKey = "hive.default.fileformat"
    val defaultValue = {
      val defaultFormatValue = "textfile"
      SparkSession.getActiveSession.map { session =>
        session.sessionState.newHadoopConf().get(defaultFormatKey, defaultFormatValue)
      }.getOrElse(defaultFormatValue)
    }
    val defaultStorageType = conf.getConfString("hive.default.fileformat", defaultValue)
    val defaultHiveSerde = sourceToSerDe(defaultStorageType)
    CatalogStorageFormat.empty.copy(
      inputFormat = defaultHiveSerde.flatMap(_.inputFormat)
        .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
      outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
        .orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
      serde = defaultHiveSerde.flatMap(_.serde)
        .orElse(Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")))
  }
hive 的配置,加个前缀 “spark.”
  def appendSparkHiveConfigs(
      srcMap: Map[String, String],
      destMap: HashMap[String, String]): Unit = {
    // Copy any "spark.hive.foo=bar" system properties into destMap as "hive.foo=bar"
    for ((key, value) <- srcMap if key.startsWith("spark.hive.")) {
      destMap.put(key.substring("spark.".length), value)
    }
  }

你可能感兴趣的:(spark,spark,大数据,分布式)