flink soruce详解

数据处理的过程基本可以分为三个阶段分别是,数据从来哪里,做什么业务逻辑,落地到哪里去。

flink也如此。

SourceFunction 简介

flink自定义数据源需要实现SourceFunction,内置的SourceFunction实现类有:SocketTextStreamFunction、FromElementsFunction、FlinkKafkaConsumer 等等

SourceFunction 定义了2个方法 run 和cancel 。如下图

run方法的主体就是实现数据的生产逻辑。比如从Redis里面获取数据,或者自己模拟产生数据逻辑。下面会举例说明

cancel方法就是在任务取消的时候调用,作一些状态赋值或者链接关闭之类的。

自定义flink source

首先根据并行度来区分,可分为单并行度(并行度为1)和多并行度的source。单并行度的source之后的算子中不能再通过setParallelism()来改变并行度,多并行度默认同任务的并行度

然后可以根据是否为RichFunction来区分。RichFunction接口中有open,close,getRuntimeContext和setRuntimeContext等方法来获取状态,缓存系统内部数据等

单并行度source  实现  SourceFunction

import java.text.SimpleDateFormat

import java.util.Date

import org.apache.flink.streaming.api.functions.source.SourceFunction

import org.apache.flink.streaming.api.scala._

import org.apache.flink.streaming.api.windowing.time.Time

class NoParalleSource extends SourceFunction[String]{

private  var isrunning =true

  override def run(sourceContext: SourceFunction.SourceContext[String]):Unit = {

while (isrunning){

val time =new SimpleDateFormat("HH:mm:ss").format(new Date())

sourceContext.collect(Thread.currentThread().getId +"_"+time)

Thread.sleep(1000*1)

}

}

override def cancel():Unit = {

isrunning =false

  }

}

object NoParalleSourceTest{

def main(args: Array[String]):Unit = {

val env = StreamExecutionEnvironment.getExecutionEnvironment

    val stream = env.addSource(new NoParalleSource())/*.setParallelism(2)*/

    val reduce = stream.timeWindowAll(Time.seconds(5)).reduce(_+"~"+_)

reduce.print()

env.execute(NoParalleSourceTest.getClass.getName)

}

}

多并行度source 实现  ParallelSourceFunction

import java.text.SimpleDateFormat

import java.util.Date

import org.apache.flink.streaming.api.functions.source.{ParallelSourceFunction, SourceFunction}

import org.apache.flink.streaming.api.scala._

import org.apache.flink.streaming.api.windowing.time.Time

/**

* 不设置并发数,那就任务的默认并发数

*/

class ParalleSource extends  ParallelSourceFunction[String]{

private var isrunning =true

  override def run(sourceContext: SourceFunction.SourceContext[String]):Unit = {

while (isrunning){

val time =new SimpleDateFormat("HH:mm:ss").format(new Date())

sourceContext.collect(Thread.currentThread().getId +"_"+time)

Thread.sleep(1000*1)

}

}

override def cancel():Unit = {

isrunning =false

  }

}

object ParalleSourceTest{

def main(args: Array[String]):Unit = {

val env = StreamExecutionEnvironment.getExecutionEnvironment

    val stream = env.addSource(new ParalleSource()).setParallelism(4)

val reduce = stream.timeWindowAll(Time.seconds(5)).reduce(_+"~"+_)

reduce.print()

env.execute(ParalleSourceTest.getClass.getName)

}

}

rich 单并行度source 实现  RichSourceFunction 

rich 多并行度source 实现  RichParallelSourceFunction

自定义source实现接口的继承关系

你可能感兴趣的:(flink soruce详解)