flink数据源六种加载方式以及sink输出方式

目录

一、数据源加载方式

1.  fromElements 

2.加载集合数据源

3.加载文件数据源

4.加载端口号socket

5.加载kafka数据源

6.自定义数据源

二、sink输出方式

1、直接输出在控制台

2、以csv格式输出

3、自定义输出方式


一、数据源加载方式

1.  fromElements 

val stream1: DataStream[Any] = env.fromElements(1, 2, 3.5, "hello") 

2.加载集合数据源

   val dataList=List(
       SensorReading("sensor 1",1684201960,36.8),
       SensorReading("sensor 4",1684202060,35.8),
       SensorReading("sensor 4",1684202160,34.8),
       SensorReading("sensor 4",1684202260,33.8)
     )
        val stream2: DataStream[SensorReading] = env.fromCollection(dataList)

case class SensorReading(id: String, timestamp: Long, tempersture: Double)

3.加载文件数据源

    val path="X:\\Spark\\flinkstu\\resources\\sensor.txt"
    val stream3: DataStream[String] = env.readTextFile(path)

4.加载端口号socket

 val stream4: DataStream[String] = env.socketTextStream("192.168.91.180", 7777)

5.加载kafka数据源

val properties = new Properties()

properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.91.180:9092")

properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"sensorgroup1")

val stream5: DataStream[String] = env.addSource(new FlinkKafkaConsumer[String]("sensor", new SimpleStringSchema(), properties))

6.自定义数据源

val stream6: DataStream[SensorReading] = env.addSource(new MySensorSource)


class MySensorSource() extends SourceFunction[SensorReading] {

var flag=true

  override def run(sourceContext: SourceFunction.SourceContext[SensorReading]): Unit = {
    val random = new Random()
    while (flag){
      val i: Int = random.nextInt()
      sourceContext.collect(SensorReading("生产:"+i,1,1))
      Thread.sleep(500)
    }

  }

//cancel方法用于关闭run方法,可以设置flag,将flag设为false
  override def cancel(): Unit = {
      return flag=false
 }
}

二、sink输出方式

环境代码

val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val path="X:\\Spark\\flinkstu\\resources\\sensor.txt"
    val stream1: DataStream[String] = env.readTextFile(path)

    val datastream: DataStream[SensorReading] = stream1.map(data => {
      val arr: Array[String] = data.split(",")
      SensorReading(arr(0).trim, arr(1).toLong, arr(2).trim.toDouble)
    })

1、直接输出在控制台

datastream.print()

2、以csv格式输出

datastream.writeAsCsv("X:\\Spark\\flinkstu\\resources\\out.txt")

3、自定义输出方式

   datastream.addSink(StreamingFileSink.forRowFormat(
     new Path("X:\\Spark\\flinkstu\\resources\\out1.txt"),
     new SimpleStringEncoder[SensorReading]()
   ).build()
   )


case class SensorReading(id: String, timestamp: Long, tempersture: Double)

你可能感兴趣的:(flink,大数据,kafka,java)