Flink DataStream API 之 windows聚合分类

 

基本介绍

增量聚合

•窗口中每进入一条数据,就进行一次计算

•reduce(reduceFunction)

•aggregate(aggregateFunction)

•sum(),min(),max()

 

全量聚合

•等属于窗口的数据到齐,才开始进行聚合计算【可以实现对窗口内的数据进行排序等需求】

•apply(windowFunction)

•process(processWindowFunction)

processWindowFunction比windowFunction提供了更多的上下文信息。

增量聚合状态变化过程-累加求和(窗口中每进入一条数据,就进行一次计算)

Flink DataStream API 之 windows聚合分类_第1张图片

ReduceFunction(增量聚合函数)

DataStream> input = ...;

input
    .keyBy()
    .window()
    .reduce(new ReduceFunction> {
      public Tuple2 reduce(Tuple2 v1, Tuple2 v2) {
        return new Tuple2<>(v1.f0, v1.f1 + v2.f1);
      }
    });

AggregateFunction(增量聚合函数)

/**
 * The accumulator is used to keep a running sum and a count. The {@code getResult} method
 * computes the average.
 */
private static class AverageAggregate
    implements AggregateFunction, Tuple2, Double> {
  @Override
  public Tuple2 createAccumulator() {
    return new Tuple2<>(0L, 0L);
  }

  @Override
  public Tuple2 add(Tuple2 value, Tuple2 accumulator) {
    return new Tuple2<>(accumulator.f0 + value.f1, accumulator.f1 + 1L);
  }

  @Override
  public Double getResult(Tuple2 accumulator) {
    return ((double) accumulator.f0) / accumulator.f1;
  }

  @Override
  public Tuple2 merge(Tuple2 a, Tuple2 b) {
    return new Tuple2<>(a.f0 + b.f0, a.f1 + b.f1);
  }
}

DataStream> input = ...;

input
    .keyBy()
    .window()
    .aggregate(new AverageAggregate());

全量聚合状态变化过程-求最大值(等属于窗口的数据到齐,才开始进行聚合计算

Flink DataStream API 之 windows聚合分类_第2张图片

apply

DataStream> input = ...;

input
    .keyBy()
    .window()
    .apply(new MyWindowFunction());

process 

private static class MyProcessWindowFunction
    extends ProcessWindowFunction, String, TimeWindow> {

  public void process(String key,
                    Context context,
                    Iterable minReadings,
                    Collector> out) {
      SensorReading min = minReadings.iterator().next();
      out.collect(new Tuple2(window.getStart(), min));
  }
}

 

你可能感兴趣的:(Flink)