Scala DataFrame生成技巧

case1:List()到DataFrame()的简单转化
//step1:我们首先创建一个case class
case class resultset(masterhotel:Int,
quantity:Double,
date:String,
rank:Int,
frcst_cii:Double,
hotelid:Int)

//step2
//初始化resultset类,有很多方法,如从关系型数据库中获取数据定义resultset类、
//直接定义一个resultset的List等
val x1=List(resultset(1001,12,"2016-10-01", 1, 13.44,1001),
resultset(1002,12,"2016-10-01", 3, 13.44,1002),
resultset(1004,15,"2016-10-02", 10, 18.44,1004),
resultset(1005,5,"2016-10-02", 40, 5.67,1005)
)
val dataset1=sqlContext.createDataFrame(x1)
scala> dataset1.show()
+-----------+--------+----------+----+---------+-------+
|masterhotel|quantity|      date|rank|frcst_cii|hotelid|
+-----------+--------+----------+----+---------+-------+
|       1001|    12.0|2016-10-01|   1|    13.44|   1001|
|       1002|    12.0|2016-10-01|   3|    13.44|   1002|
|       1004|    15.0|2016-10-02|  10|    18.44|   1004|
|       1005|     5.0|2016-10-02|  40|     5.67|   1005|
+-----------+--------+----------+----+---------+-------+

case2:元组元素的数组如何转化为列表,再利用类转化为DataFrame()
val x2=Array((1001,12,"2016-10-01", 1, 13.44,1001),
(1002,12,"2016-10-01", 3, 13.44,1002),
(1004,15,"2016-10-02", 10, 18.44,1004),
(1005,5,"2016-10-02", 40, 5.67,1005)
)
val x3=(0 until x2.length).map(i => resultset(x2(i)._1,x2(i)._2,x2(i)._3,x2(i)._4,x2(i)._5,x2(i)._6))//元组的访问方式
val x4=x3.toList
val dataset2=sqlContext.createDataFrame(x4)
scala> dataset2.show()
+-----------+--------+----------+----+---------+-------+
|masterhotel|quantity|      date|rank|frcst_cii|hotelid|
+-----------+--------+----------+----+---------+-------+
|       1001|    12.0|2016-10-01|   1|    13.44|   1001|
|       1002|    12.0|2016-10-01|   3|    13.44|   1002|
|       1004|    15.0|2016-10-02|  10|    18.44|   1004|
|       1005|     5.0|2016-10-02|  40|     5.67|   1005|
+-----------+--------+----------+----+---------+-------+

case3:数组与列表之间可以相互转化
scala> val a=Array(1,2).toList
a: List[Int] = List(1, 2)
scala> a.toArray

res69: Array[Int] = Array(1, 2)


case4:DataFrame()可以直接转化为cae2中的数组Array

你可能感兴趣的:(scala)