RDD -》 DF
有两种方式
一、
一、Inferring the Schema Using Reflection
将 RDD[t] 转为一个 object ,然后 to df
val peopleDF = spark.sparkContext .textFile("examples/src/main/resources/people.txt") .map(_.split(",")) .map(attributes => Person(attributes(0), attributes(1).trim.toInt)) .toDF()
rdd 也能直接装 DATASet 要 import 隐式装换 类 import spark.implicits._
如果 转换的对象为 tuple . 转换后 下标为 _1 _2 .....
二、Programmatically Specifying the Schema
把 columnt meta 和 rdd createDataFrame 在一起
val peopleRDD = spark.sparkContext.textFile("examples/src/main/resources/people.txt") // The schema is encoded in a string val schemaString = "name age" // Generate the schema based on the string of schema val fields = schemaString.split(" ") .map(fieldName => StructField(fieldName, StringType, nullable = true)) val schema = StructType(fields)
val rowRDD = peopleRDD .map(_.split(",")) .map(attributes => Row(attributes(0), attributes(1).trim)) // Apply the schema to the RDD val peopleDF = spark.createDataFrame(rowRDD, schema) // Creates a temporary view using the DataFrame peopleDF.createOrReplaceTempView("people")
DF to RDd
val tt = teenagersDF.rdd
rdd to ds 会有 rdd[object] 没有TODS 的异常
保险搞法
val schema = new StructType() .add(StructField("client_date", StringType, true)) .add(StructField("client_time", StringType, true)) .add(StructField("server_date", StringType, true)) .add(StructField("server_time", StringType, true))
。。。。。。
val schema = new StructType()
.add(StructField("client_date", StringType, true)) .add(StructField("client_time", StringType, true)) .add(StructField("server_date", StringType, true)) .add(StructField("server_time", StringType, true))
。。。。。。
然后
import spark.implicits._ var cubesDF = spark.createDataFrame(cubesRDD, schema)