Flink转换算子之分流(Split/Side)

一、分流(Split...Select...)

分流可以将一个流拆分成多个流。基于Split...Select...

package com.lxk.test;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.collector.selector.OutputSelector;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SplitStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.ArrayList;

public class TestSplitSelect {
    public static void main(String[] args) throws Exception{
        //运行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //输入数据源
        DataStreamSource> source = env.fromElements(
                new Tuple3<>(1, "1", "AAA"),
                new Tuple3<>(2, "2", "AAA"),
                new Tuple3<>(3, "3", "AAA"),
                new Tuple3<>(1, "1", "BBB"),
                new Tuple3<>(2, "2", "BBB"),
                new Tuple3<>(3, "3", "BBB")
        );

        //1、定义拆分逻辑
        SplitStream> splitStream = source.split(new OutputSelector>() {
            @Override
            public Iterable select(Tuple3 value) {
                ArrayList output = new ArrayList<>();
                if (value.f2.equals("AAA")) {
                    output.add("A");

                } else if (value.f2.equals("BBB")) {
                    output.add("B");
                }
                return output;
            }
        });

        //2、将流真正拆分出来
        //splitStream.print();
        //splitStream.select("A").print("输出A:");
        splitStream.select("B").print("输出B:");

        env.execute();
    }
}

测试结果: 

Flink转换算子之分流(Split/Side)_第1张图片

注意:

  1. Split...Select...中Split只是对流中的数据打上标记,并没有将流真正拆分。可通过Select算子将流真正拆分出来。
  2. Split...Select...不能连续分流。即不能Split...Select...Split,但可以如Split...Select...Filter...Split。
  3. Split...Select...已经过时,推荐使用更灵活的侧路输出(Side-Output),如下。

二、基于Side-Output

package com.lxk.test;

import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;

public class TestSideOutPut {
    public static void main(String[] args) throws Exception {

        //运行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //输入数据源
        DataStreamSource> source = env.fromElements(
                new Tuple3<>(1, "1", "AAA"),
                new Tuple3<>(2, "2", "AAA"),
                new Tuple3<>(3, "3", "AAA"),
                new Tuple3<>(1, "1", "BBB"),
                new Tuple3<>(2, "2", "BBB"),
                new Tuple3<>(3, "3", "BBB")
        );

        //1、定义OutputTag
        OutputTag> ATag = new OutputTag>("A-tag") {};
        OutputTag> BTag = new OutputTag>("B-tag") {};
        
        // 其他非元组类型优先考虑这种方式
        OutputTag A_TAG = new OutputTag("A",TypeInformation.of(String.class));
        OutputTag B_TAG = new OutputTag("B",TypeInformation.of(String.class));

        //2、在ProcessFunction中处理主流和分流
        SingleOutputStreamOperator> processedStream =
                source.process(new ProcessFunction, Tuple3>() {
                    @Override
                    public void processElement(Tuple3 value, Context ctx, Collector> out) throws Exception {

                        //侧流-只输出特定数据
                        if (value.f2.equals("AAA")) {
                            ctx.output(ATag, value);
                            //主流
                        } else {
                            out.collect(value);
                        }

                    }
                });

        //获取主流
        processedStream.print("主流输出B:");
        //获取侧流
        processedStream.getSideOutput(ATag).print("分流输出A:");

        env.execute();
    }
}

测试结果:  

Flink转换算子之分流(Split/Side)_第2张图片

注意:

  1. Side-Output是从Flink 1.3.0开始提供的功能,支持了更灵活的多路输出。
  2. Side-Output可以以侧流的形式,以不同于主流的数据类型,向下游输出指定条件的数据、异常数据、迟到数据等等。
  3. Side-Output通过ProcessFunction将数据发送到侧路OutputTag。
     

你可能感兴趣的:(Flink基础)