SparkSQL实现类似flatmap

RDD中flatmap可以将多信息化的列拍平,那么Spark SQL如何实现这个功能?

如下:

scala> val sentenceDataFrame = spark.createDataFrame(Seq(
     |   (0, "Hi I heard about Spark"),
     |   (1, "I wish Java could use case classes"),
     |   (2, "Logistic,regression,models,are,neat")
     | )).toDF("id", "sentence")
sentenceDataFrame: org.apache.spark.sql.DataFrame = [id: int, sentence: string]
//将sentence列拍平
scala> sentenceDataFrame.show(false)
+---+-----------------------------------+
|id |sentence                           |
+---+-----------------------------------+
|0  |Hi I heard about Spark             |
|1  |I wish Java could use case classes |
|2  |Logistic,regression,models,are,neat|
+---+-----------------------------------+

//使用explode
scala> val aaa=sentenceDataFrame.explode("sentence","_sentence"){s: String => s.split( " " )}
scala> aaa.show(false) //结果_sentence
+---+-----------------------------------+-----------------------------------+   
|id |sentence                           |_sentence                          |
+---+-----------------------------------+-----------------------------------+
|0  |Hi I heard about Spark             |Hi                                 |
|0  |Hi I heard about Spark             |I                                  |
|0  |Hi I heard about Spark             |heard                              |
|0  |Hi I heard about Spark             |about                              |
|0  |Hi I heard about Spark             |Spark                              |
|1  |I wish Java could use case classes |I                                  |
|1  |I wish Java could use case classes |wish                               |
|1  |I wish Java could use case classes |Java                               |
|1  |I wish Java could use case classes |could                              |
|1  |I wish Java could use case classes |use                                |
|1  |I wish Java could use case classes |case                               |
|1  |I wish Java could use case classes |classes                            |
|2  |Logistic,regression,models,are,neat|Logistic,regression,models,are,neat|
+---+-----------------------------------+-----------------------------------+

 

你可能感兴趣的:(A并行计算及分布式)