SparkStreaming-DStream与DataFrame SQL联合操作

查询使用的SparkSession 可由StreamingContext中的SparkContext来创建,以此用来进行DataFrame Sql操作。

val words: DStream[String] = ...

words.foreachRDD { rdd =>

  // 获取单例SparkSession
  val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate()
  import spark.implicits._

  // RDD[String] 转成 DataFrame
  val wordsDataFrame = rdd.toDF("word")

  // 临时视图
  wordsDataFrame.createOrReplaceTempView("words")

  // 统计并打印
  val wordCountsDataFrame = 
    spark.sql("select word, count(*) as total from words group by word")
  wordCountsDataFrame.show()
}

你可能感兴趣的:(Spark)