Spark-Avro学习5之使用AvroReadSpecifyName存储AVRO文件时指定name和namespace

更多Spark学习examples代码请见:https://github.com/xubo245/SparkLearning


1.制定avro存储时的name和namespace


2.代码:

/**
 * @author xubo
 * @time 20160502
 * ref https://github.com/databricks/spark-avro
 */
package org.apache.spark.avro.learning

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import java.text.SimpleDateFormat
import java.util.Date
import com.databricks.spark.avro._

/**
 * specify the record name and namespace
 */
object AvroReadSpecifyName {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("AvroReadSpecifyName").setMaster("local")
    val sc = new SparkContext(conf)
    // import needed for the .avro method to be added

    val sqlContext = new SQLContext(sc)
    import sqlContext.implicits._

    val df = sqlContext.read.avro("file/data/avro/input/episodes.avro")
    df.show

    val name = "AvroTest"
    val namespace = "com.databricks.spark.avro"
    val parameters = Map("recordName" -> name, "recordNamespace" -> namespace)

    val iString = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())
    df.write.options(parameters).avro("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)

    val dfread = sqlContext.read
      .format("com.databricks.spark.avro")
      .load("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
    dfread.show

    val dfread2 = sqlContext.read.avro("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
    dfread2.show
  }
}


3.结果:

+--------------------+----------------+------+
|               title|        air_date|doctor|
+--------------------+----------------+------+
|   The Eleventh Hour|    3 April 2010|    11|
|   The Doctor's Wife|     14 May 2011|    11|
| Horror of Fang Rock|3 September 1977|     4|
|  An Unearthly Child|23 November 1963|     1|
|The Mysterious Pl...|6 September 1986|     6|
|                Rose|   26 March 2005|     9|
|The Power of the ...| 5 November 1966|     2|
|          Castrolava|  4 January 1982|     5|
+--------------------+----------------+------+

+--------------------+----------------+------+
|               title|        air_date|doctor|
+--------------------+----------------+------+
|   The Eleventh Hour|    3 April 2010|    11|
|   The Doctor's Wife|     14 May 2011|    11|
| Horror of Fang Rock|3 September 1977|     4|
|  An Unearthly Child|23 November 1963|     1|
|The Mysterious Pl...|6 September 1986|     6|
|                Rose|   26 March 2005|     9|
|The Power of the ...| 5 November 1966|     2|
|          Castrolava|  4 January 1982|     5|
+--------------------+----------------+------+

+--------------------+----------------+------+
|               title|        air_date|doctor|
+--------------------+----------------+------+
|   The Eleventh Hour|    3 April 2010|    11|
|   The Doctor's Wife|     14 May 2011|    11|
| Horror of Fang Rock|3 September 1977|     4|
|  An Unearthly Child|23 November 1963|     1|
|The Mysterious Pl...|6 September 1986|     6|
|                Rose|   26 March 2005|     9|
|The Power of the ...| 5 November 1966|     2|
|          Castrolava|  4 January 1982|     5|
+--------------------+----------------+------+

4.文件内容:

Objavro.codecsnappyavro.schema�{"type":"record","name":"AvroTest","namespace":"com.databricks.spark.avro","fields":[{"name":"title","type":["string","null"]},{"name":"air_date","type":["string","null"]},{"name":"doctor","type":["int","null"]}]}


主要是文件内容里面指定了
"name":"AvroTest","namespace":"com.databricks.spark.avro"



你可能感兴趣的:(Spark-Avro学习5之使用AvroReadSpecifyName存储AVRO文件时指定name和namespace)