flink中min和minby区别 同理 max和maxby区别

flink中min和minby区别

  • 一.min和minby介绍
    • 二.min和minby介绍代码演示
      • 三.同理max和maxby

一.min和minby介绍

max、min、sum 会分别返回最大值、最小值和汇总值;而 minBy 和 maxBy 则会把最小或者最大的元素全部返回

Aggregations 为聚合函数的总称,常见的聚合函数包括但不限于 sum、max、min 等。Aggregations 也需要指定一个 key 进行聚合

例子:可以指定key的位置,也可以指定key的名称

keyedStream.sum(0);
keyedStream.sum("key");
keyedStream.min(0);
keyedStream.min("key");
keyedStream.max(0);
keyedStream.max("key");
keyedStream.minBy(0);
keyedStream.minBy("key");
keyedStream.maxBy(0);
keyedStream.maxBy("key");

二.min和minby介绍代码演示

1.min案例代码

import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.ArrayList;
import java.util.List;

/**
 * Author : Jackson
 * Version : 2020/4/24 & 1.0
 */

public class ReduceDemo {
    public static void main(String[] args) throws Exception {
        //获取运行环境的上下文
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //获取数据源
        List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
        data.add(new Tuple3<>(0, 1, 0));
        data.add(new Tuple3<>(0, 1, 1));
        data.add(new Tuple3<>(0, 2, 2));
        data.add(new Tuple3<>(0, 1, 3));
        data.add(new Tuple3<>(1, 2, 5));
        data.add(new Tuple3<>(1, 2, 9));
        data.add(new Tuple3<>(1, 2, 11));
        data.add(new Tuple3<>(1, 2, 13));

        DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
        items.keyBy(0).min(2).printToErr();
        
       //一定要触发执行,不然没结果输出
       env.execute();

打印结果:

6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)

Process finished with exit code 0

2.minby案例代码

import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.ArrayList;
import java.util.List;

/**
 * Author : Jackson
 * Version : 2020/4/24 & 1.0
 */

public class ReduceDemo {
    public static void main(String[] args) throws Exception {
        //获取运行环境的上下文
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //获取数据源
        List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
        data.add(new Tuple3<>(0, 1, 0));
        data.add(new Tuple3<>(0, 1, 1));
        data.add(new Tuple3<>(0, 2, 2));
        data.add(new Tuple3<>(0, 1, 3));
        data.add(new Tuple3<>(1, 2, 5));
        data.add(new Tuple3<>(1, 2, 9));
        data.add(new Tuple3<>(1, 2, 11));
        data.add(new Tuple3<>(1, 2, 13));

        DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
        items.keyBy(0).minby(2).printToErr();
        
       //一定要触发执行,不然没结果输出
       env.execute();

打印结果:

6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)

Process finished with exit code 0

3.分析对比

数据源

    data.add(new Tuple3<>(0, 1, 0));
    data.add(new Tuple3<>(0, 1, 1));
    data.add(new Tuple3<>(0, 2, 2));
    data.add(new Tuple3<>(0, 1, 3));
    data.add(new Tuple3<>(1, 2, 5));
    data.add(new Tuple3<>(1, 2, 9));
    data.add(new Tuple3<>(1, 2, 11));
    data.add(new Tuple3<>(1, 2, 13));

min结果:

6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)

minby结果:

6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (0,1,0)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)
6> (1,2,5)

结果:

都会选用最小值,第二位产生的结果不一定准确

三.同理max和maxby

1.max案例

import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.ArrayList;
import java.util.List;

/**
 * Author : Jackson
 * Version : 2020/4/24 & 1.0
 */

public class ReduceDemo {
    public static void main(String[] args) throws Exception {
        //获取运行环境的上下文
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //获取数据源
        List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
        data.add(new Tuple3<>(0, 1, 0));
        data.add(new Tuple3<>(0, 1, 1));
        data.add(new Tuple3<>(0, 2, 2));
        data.add(new Tuple3<>(0, 1, 3));
        data.add(new Tuple3<>(1, 2, 5));
        data.add(new Tuple3<>(1, 2, 9));
        data.add(new Tuple3<>(1, 2, 11));
        data.add(new Tuple3<>(1, 2, 13));

        DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
        items.keyBy(0).max(2).printToErr();
        
       //一定要触发执行,不然没结果输出
       env.execute();

结果:

6> (0,1,0)
6> (0,1,1)
6> (0,1,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)

Process finished with exit code 0

2.maxby案例

import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.ArrayList;
import java.util.List;

/**
 * Author : Jackson
 * Version : 2020/4/24 & 1.0
 */

public class ReduceDemo {
    public static void main(String[] args) throws Exception {
        //获取运行环境的上下文
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //获取数据源
        List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
        data.add(new Tuple3<>(0, 1, 0));
        data.add(new Tuple3<>(0, 1, 1));
        data.add(new Tuple3<>(0, 2, 2));
        data.add(new Tuple3<>(0, 1, 3));
        data.add(new Tuple3<>(1, 2, 5));
        data.add(new Tuple3<>(1, 2, 9));
        data.add(new Tuple3<>(1, 2, 11));
        data.add(new Tuple3<>(1, 2, 13));

        DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
        items.keyBy(0).max(2).printToErr();
        
       //一定要触发执行,不然没结果输出
       env.execute();

结果:

6> (0,1,0)
6> (0,1,1)
6> (0,2,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)

Process finished with exit code 0

3.分析对比

数据源

        data.add(new Tuple3<>(0, 1, 0));
        data.add(new Tuple3<>(0, 1, 1));
        data.add(new Tuple3<>(0, 2, 2));
        data.add(new Tuple3<>(0, 1, 3));
        data.add(new Tuple3<>(1, 2, 5));
        data.add(new Tuple3<>(1, 2, 9));
        data.add(new Tuple3<>(1, 2, 11));
        data.add(new Tuple3<>(1, 2, 13));

max结果:

6> (0,1,0)
6> (0,1,1)
6> (0,1,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)

maxby结果:

6> (0,1,0)
6> (0,1,1)
6> (0,2,2)
6> (0,1,3)
6> (1,2,5)
6> (1,2,9)
6> (1,2,11)
6> (1,2,13)

结论:

max、min、sum 会分别返回最大值、最小值和汇总值;而 minBy 和 maxBy 则会把最小或者最大的元素全部返回

min 和 minBy 都会返回整个元素,只是 min 会根据用户指定的字段取最小值,并且把这个值保存在对应的位置,而对于其他的字段,并不能保证其数值正确。max 和 maxBy 同理。

事实上,对于 Aggregations 函数,Flink 帮助我们封装了状态数据,这些状态数据不会被清理,所以在实际生产环境中应该尽量避免在一个无限流上使用 Aggregations。而且,对于同一个 keyedStream ,只能调用一次 Aggregation 函数。

四.reduce案例

package flink42.day04;

import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.ArrayList;
import java.util.List;

/**
 * Author : Jackson
 * Version : 2020/4/24 & 1.0
 *
 */

public class ReduceDemo {
    public static void main(String[] args) throws Exception {
        //获取运行环境的上下文
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();


        //获取数据源
        List data = new ArrayList<Tuple3<Integer, Integer, Integer>>();
        data.add(new Tuple3<>(0, 1, 0));
        data.add(new Tuple3<>(0, 1, 1));
        data.add(new Tuple3<>(0, 2, 2));
        data.add(new Tuple3<>(0, 1, 3));
        data.add(new Tuple3<>(1, 2, 5));
        data.add(new Tuple3<>(1, 2, 9));
        data.add(new Tuple3<>(1, 2, 11));
        data.add(new Tuple3<>(1, 2, 13));

        DataStreamSource<Tuple3<Integer, Integer, Integer>> items = env.fromCollection(data);
        //reduce
        SingleOutputStreamOperator<Tuple3<Integer, Integer, Integer>> reduceres = items.keyBy(0).reduce(new ReduceFunction<Tuple3<Integer, Integer, Integer>>() {
            @Override
            public Tuple3<Integer, Integer, Integer> reduce(Tuple3<Integer, Integer, Integer> t1, Tuple3<Integer, Integer, Integer> t2) throws Exception {

                Tuple3<Integer, Integer, Integer> tuple3 = new Tuple3<>();

                tuple3.setFields(0, 0, (Integer) t1.getField(2) + (Integer) t2.getField(2));
                return tuple3;
            }
        });
        reduceres.printToErr().setParallelism(1);
        env.execute()
   }
}

结果:

最后一行,才是真正想要的运行结果

(0,1,0)
(0,0,1)
(0,0,3)
(0,0,6)
(1,2,5)
(0,0,14)
(0,0,25)
(0,0,38)

Process finished with exit code 0
                   ————保持饥饿,保持学习
                         Jackson_MVP

你可能感兴趣的:(Flink)