spark rdd 和 DF 转换

RDD   -》 DF





一、Inferring the Schema Using Reflection


将 RDD[t]   转为一个 object ,然后 to df


val peopleDF = spark.sparkContext
  .map(attributes => Person(attributes(0), attributes(1).trim.toInt))



rdd 也能直接装 DATASet  要  import 隐式装换 类 import spark.implicits._

 如果  转换的对象为  tuple .   转换后  下标为 _1  _2   .....




二、Programmatically Specifying the Schema


把 columnt meta  和  rdd   createDataFrame 在一起


val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt")

// The schema is encoded in a string
val schemaString = "name age"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)


val rowRDD = peopleRDD
  .map(attributes => Row(attributes(0), attributes(1).trim))

// Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)

// Creates a temporary view using the DataFrame







DF  to  RDd


val tt = teenagersDF.rdd




 rdd to  ds  会有  rdd[object] 没有TODS 的异常



val schema = new StructType()
  .add(StructField("client_date", StringType, true))
  .add(StructField("client_time", StringType, true))
  .add(StructField("server_date", StringType, true))
  .add(StructField("server_time", StringType, true))



import spark.implicits._
var cubesDF = spark.createDataFrame(cubesRDD, schema)

