Flink二:IngestionTime与ProcessingTime

IngestionTime进入Flink系统的时间;

ProcessingTime Flink算子操作的时间。

有个程序描述下它们的差异:

//订单对象(userid、消费总额total)
case class Order(userid: Long, total: Long)
case class OrderSummary(startTime: String, endTime: String, userid: Long, total: Long)

object IngestionOrProcessTime extends App {

  val streamenv = StreamExecutionEnvironment.createLocalEnvironment()
  streamenv.setParallelism(1)
  //第一:设置IngestionTime,很明显这四条数据,将同一时间进入到Flink系统,将在同一个窗口中计算,按userid为key做聚合
//  streamenv.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)

  //第二:设置ProcessingTime,在aggregate方法中,加一个针对时间的操作,否则将无数据
  streamenv.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)

  //基于key做聚合即可
  streamenv.fromElements(Order(100L, 1000L), Order(101L, 2000L), Order(100L, 3000L), Order(101L, 4000L))
    .keyBy(_.userid)
    .timeWindow(Time.seconds(10))
    .aggregate(new AggregateFunction[Order, (Long, Long), (Long, Long)] {
      //创建累加器
      override def createAccumulator() = {
        //Thread.sleep(5000)  //每天event睡5秒,若是设置IngestionTime,此处不影响,若为ProcessingTime才有影响
        Thread.sleep(10000)  //对窗口的影响
        (0L, 0L)
      }
      //累加器内累加
      override def add(value: Order, accumulator: (Long, Long)) = { (value.userid, accumulator._2 + value.total)}
      override def getResult(accumulator: (Long, Long)) = accumulator
      //合并累加器
      override def merge(a: (Long, Long), b: (Long, Long)) = { (a._1, a._2 + b._2)}
    }
      ,
      new WindowFunction[(Long, Long), OrderSummary, Long, TimeWindow]() {
        override def apply(key: Long, window: TimeWindow, inputs: Iterable[(Long, Long)], out: Collector[OrderSummary]) {
          val date1 = new java.util.Date(); date1.setTime(window.getStart)
          val date2 = new java.util.Date(); date2.setTime(window.getEnd)
          val simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS")
          val winStartTime = simpleDateFormat.format(date1); val winEndTime = simpleDateFormat.format(date2)
          for (value <- inputs) {
            //inputs已经按key集合,有几个key,就应该循环几次
            out.collect(OrderSummary("winStartTime :" +  winStartTime, "winEndTime :" + winEndTime, value._1, value._2))
          }
        }
      }
    )
    .print()

  streamenv.execute("IngestionOrProcessTime_starting")
}

设置setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)的输出:

四条数据同一时间进入flink,所以只有一个窗口。

TimeCharacteristic.ProcessingTime sleep5秒的输出:

生成了两个窗口,因为处理时长是5秒。第一条、第二条在一窗口,后面两条在二窗口。

TimeCharacteristic.ProcessingTime sleep10秒的输出:

生成了四个窗口,因为处理时长是10秒(sleep10秒),每个窗口恰好只能处理一条数据。

根据如上的例子,可以看到设置ProcessingTime或是IngestionTime对生成窗口的影响。谢谢大家!

 

设置为ProcessingTime时,输出可能会变化,经测试,应该是本机计算的效率有关系。我在虚拟机上测试,不太稳定。

所以基于ProcessingTime、IngestionTime的结果不稳定,用EventTime才行!!

你可能感兴趣的:(Flink)