10.4 spark2 structured streaming 实时计算hdfs文件输入流cdh

继上一篇spark2.4 cdh

10.4 spark2 structured streaming 实时计算hdfs文件输入流cdh_第1张图片

 

演示:实时监控hdfs

a.文件1

10.4 spark2 structured streaming 实时计算hdfs文件输入流cdh_第2张图片

10.4 spark2 structured streaming 实时计算hdfs文件输入流cdh_第3张图片

b.添加文件

10.4 spark2 structured streaming 实时计算hdfs文件输入流cdh_第4张图片

 

10.4 spark2 structured streaming 实时计算hdfs文件输入流cdh_第5张图片

 

代码

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType

object FileInputStructuredStreaming {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder
      .master("local")
      .appName("StructuredNetworkWordCount")
      .getOrCreate()

    spark.sparkContext.setLogLevel("WARN")

    import spark.implicits._
    val userSchema = new StructType().add("name", "string").add("age", "integer")
    val lines = spark.readStream
      .option("sep", ";")
      .schema(userSchema)
      .csv("hdfs://192.168.50.135:8020/user/hdfs/yanke_data/data3/")

    val query = lines.writeStream
      .outputMode("append")
      .format("console")
      .start()

    query.awaitTermination()
  }
}

 


kafka

你可能感兴趣的:(10.4 spark2 structured streaming 实时计算hdfs文件输入流cdh)