聊聊flink DataStream的connect操作

本文主要研究一下flink DataStream的connect操作

DataStream.connect

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/DataStream.java

@Public
public class DataStream {

    //......

    public  ConnectedStreams connect(DataStream dataStream) {
        return new ConnectedStreams<>(environment, this, dataStream);
    }

    @PublicEvolving
    public  BroadcastConnectedStream connect(BroadcastStream broadcastStream) {
        return new BroadcastConnectedStream<>(
                environment,
                this,
                Preconditions.checkNotNull(broadcastStream),
                broadcastStream.getBroadcastStateDescriptor());
    }

    //......
}
  • DataStream的connect操作创建的是ConnectedStreams或BroadcastConnectedStream,它用了两个泛型,即不要求两个dataStream的element是同一类型

ConnectedStreams

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/ConnectedStreams.java

@Public
public class ConnectedStreams {

    protected final StreamExecutionEnvironment environment;
    protected final DataStream inputStream1;
    protected final DataStream inputStream2;

    protected ConnectedStreams(StreamExecutionEnvironment env, DataStream input1, DataStream input2) {
        this.environment = requireNonNull(env);
        this.inputStream1 = requireNonNull(input1);
        this.inputStream2 = requireNonNull(input2);
    }

    public StreamExecutionEnvironment getExecutionEnvironment() {
        return environment;
    }

    public DataStream getFirstInput() {
        return inputStream1;
    }

    public DataStream getSecondInput() {
        return inputStream2;
    }

    public TypeInformation getType1() {
        return inputStream1.getType();
    }

    public TypeInformation getType2() {
        return inputStream2.getType();
    }

    public ConnectedStreams keyBy(int keyPosition1, int keyPosition2) {
        return new ConnectedStreams<>(this.environment, inputStream1.keyBy(keyPosition1),
                inputStream2.keyBy(keyPosition2));
    }

    public ConnectedStreams keyBy(int[] keyPositions1, int[] keyPositions2) {
        return new ConnectedStreams<>(environment, inputStream1.keyBy(keyPositions1),
                inputStream2.keyBy(keyPositions2));
    }

    public ConnectedStreams keyBy(String field1, String field2) {
        return new ConnectedStreams<>(environment, inputStream1.keyBy(field1),
                inputStream2.keyBy(field2));
    }

    public ConnectedStreams keyBy(String[] fields1, String[] fields2) {
        return new ConnectedStreams<>(environment, inputStream1.keyBy(fields1),
                inputStream2.keyBy(fields2));
    }

    public ConnectedStreams keyBy(KeySelector keySelector1, KeySelector keySelector2) {
        return new ConnectedStreams<>(environment, inputStream1.keyBy(keySelector1),
                inputStream2.keyBy(keySelector2));
    }

    public  ConnectedStreams keyBy(
            KeySelector keySelector1,
            KeySelector keySelector2,
            TypeInformation keyType) {
        return new ConnectedStreams<>(
            environment,
            inputStream1.keyBy(keySelector1, keyType),
            inputStream2.keyBy(keySelector2, keyType));
    }

    public  SingleOutputStreamOperator map(CoMapFunction coMapper) {

        TypeInformation outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
            coMapper,
            CoMapFunction.class,
            0,
            1,
            2,
            TypeExtractor.NO_INDEX,
            getType1(),
            getType2(),
            Utils.getCallLocationName(),
            true);

        return transform("Co-Map", outTypeInfo, new CoStreamMap<>(inputStream1.clean(coMapper)));

    }

    public  SingleOutputStreamOperator flatMap(
            CoFlatMapFunction coFlatMapper) {

        TypeInformation outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
            coFlatMapper,
            CoFlatMapFunction.class,
            0,
            1,
            2,
            TypeExtractor.NO_INDEX,
            getType1(),
            getType2(),
            Utils.getCallLocationName(),
            true);

        return transform("Co-Flat Map", outTypeInfo, new CoStreamFlatMap<>(inputStream1.clean(coFlatMapper)));
    }

    @PublicEvolving
    public  SingleOutputStreamOperator process(
            CoProcessFunction coProcessFunction) {

        TypeInformation outTypeInfo = TypeExtractor.getBinaryOperatorReturnType(
            coProcessFunction,
            CoProcessFunction.class,
            0,
            1,
            2,
            TypeExtractor.NO_INDEX,
            getType1(),
            getType2(),
            Utils.getCallLocationName(),
            true);

        return process(coProcessFunction, outTypeInfo);
    }

    @Internal
    public  SingleOutputStreamOperator process(
            CoProcessFunction coProcessFunction,
            TypeInformation outputType) {

        TwoInputStreamOperator operator;

        if ((inputStream1 instanceof KeyedStream) && (inputStream2 instanceof KeyedStream)) {
            operator = new KeyedCoProcessOperator<>(inputStream1.clean(coProcessFunction));
        } else {
            operator = new CoProcessOperator<>(inputStream1.clean(coProcessFunction));
        }

        return transform("Co-Process", outputType, operator);
    }

    @PublicEvolving
    public  SingleOutputStreamOperator transform(String functionName,
            TypeInformation outTypeInfo,
            TwoInputStreamOperator operator) {

        // read the output type of the input Transforms to coax out errors about MissingTypeInfo
        inputStream1.getType();
        inputStream2.getType();

        TwoInputTransformation transform = new TwoInputTransformation<>(
                inputStream1.getTransformation(),
                inputStream2.getTransformation(),
                functionName,
                operator,
                outTypeInfo,
                environment.getParallelism());

        if (inputStream1 instanceof KeyedStream && inputStream2 instanceof KeyedStream) {
            KeyedStream keyedInput1 = (KeyedStream) inputStream1;
            KeyedStream keyedInput2 = (KeyedStream) inputStream2;

            TypeInformation keyType1 = keyedInput1.getKeyType();
            TypeInformation keyType2 = keyedInput2.getKeyType();
            if (!(keyType1.canEqual(keyType2) && keyType1.equals(keyType2))) {
                throw new UnsupportedOperationException("Key types if input KeyedStreams " +
                        "don't match: " + keyType1 + " and " + keyType2 + ".");
            }

            transform.setStateKeySelectors(keyedInput1.getKeySelector(), keyedInput2.getKeySelector());
            transform.setStateKeyType(keyType1);
        }

        @SuppressWarnings({ "unchecked", "rawtypes" })
        SingleOutputStreamOperator returnStream = new SingleOutputStreamOperator(environment, transform);

        getExecutionEnvironment().addOperator(transform);

        return returnStream;
    }
}
  • ConnectedStreams提供了keyBy方法用于指定两个stream的keySelector,提供了map、flatMap、process、transform操作,其中前三个操作最后都是调用transform操作
  • transform操作接收TwoInputStreamOperator类型的operator,然后转换为SingleOutputStreamOperator
  • map操作接收CoMapFunction,flatMap操作接收CoFlatMapFunction,process操作接收CoProcessFunction

CoMapFunction

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/co/CoMapFunction.java

@Public
public interface CoMapFunction extends Function, Serializable {

    OUT map1(IN1 value) throws Exception;

    OUT map2(IN2 value) throws Exception;
}
  • CoMapFunction继承了Function,它定义了map1、map2方法

CoFlatMapFunction

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/co/CoFlatMapFunction.java

@Public
public interface CoFlatMapFunction extends Function, Serializable {

    void flatMap1(IN1 value, Collector out) throws Exception;

    void flatMap2(IN2 value, Collector out) throws Exception;
}
  • CoFlatMapFunction继承了Function,它定义了map1、map2方法,与CoMapFunction不同的是,CoFlatMapFunction的map1、map2方法多了Collector参数

CoProcessFunction

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/co/CoProcessFunction.java

@PublicEvolving
public abstract class CoProcessFunction extends AbstractRichFunction {

    private static final long serialVersionUID = 1L;

    public abstract void processElement1(IN1 value, Context ctx, Collector out) throws Exception;

    public abstract void processElement2(IN2 value, Context ctx, Collector out) throws Exception;

    public void onTimer(long timestamp, OnTimerContext ctx, Collector out) throws Exception {}

    public abstract class Context {

        public abstract Long timestamp();

        public abstract TimerService timerService();

        public abstract  void output(OutputTag outputTag, X value);
    }

    public abstract class OnTimerContext extends Context {
        /**
         * The {@link TimeDomain} of the firing timer.
         */
        public abstract TimeDomain timeDomain();
    }
}
  • CoProcessFunction继承了AbstractRichFunction,它定义了processElement1、processElement2方法,与CoFlatMapFunction不同的是,它定义的这两个方法多了Context参数
  • CoProcessFunction定义了Context及OnTimerContext,在processElement1、processElement2方法可以访问到Context,Context提供了timestamp、timerService、output方法
  • CoProcessFunction与CoFlatMapFunction不同的另外一点是它可以使用TimerService来注册timer,然后在onTimer方法里头实现响应的逻辑

小结

  • DataStream的connect操作创建的是ConnectedStreams或BroadcastConnectedStream,它用了两个泛型,即不要求两个dataStream的element是同一类型
  • ConnectedStreams提供了keyBy方法用于指定两个stream的keySelector,提供了map、flatMap、process、transform操作,其中前三个操作最后都是调用transform操作;transform操作接收TwoInputStreamOperator类型的operator,然后转换为SingleOutputStreamOperator;map操作接收CoMapFunction,flatMap操作接收CoFlatMapFunction,process操作接收CoProcessFunction
  • CoFlatMapFunction与CoMapFunction不同的是,CoFlatMapFunction的map1、map2方法多了Collector参数;CoProcessFunction定义了processElement1、processElement2方法,与CoFlatMapFunction不同的是,它定义的这两个方法多了Context参数;CoProcessFunction与CoFlatMapFunction不同的另外一点是它可以使用TimerService来注册timer,然后在onTimer方法里头实现响应的逻辑

doc

  • DataStream Transformations

你可能感兴趣的:(聊聊flink DataStream的connect操作)