Flink DataStream 广播状态模式

Flink DataStream 广播状态模式

我们使用State描述了Operator State,在恢复时,可以修改并行度重新分配Operator State(偶分裂再分配方式),或者使用Union的方式(联合重新分发)恢复并行任务。

Operator State还有一种广播状态模式(Broadcast State)

引入广播状态是为了支持这样的用例,其中来自一个流的一些数据需要被广播到所有下游任务,其中它被本地存储并用于处理另一个流上的所有传入元素。作为广播状态可以作为自然拟合出现的示例,可以想象包含一组规则的低吞吐量流,我们希望针对来自另一个流的所有元素进行评估

考虑到上述类型的用例,广播状态与其他运营商状态的不同之处在于:

  • 他是一个MapState

  • 它仅适用于具有特定的Operator作为输入一个广播流和一个非广播流

  • 这样的运营商可以具有不同名称的多个广播状态

Keyed StreamNon-Keyed Stream与一个BroadcastStream连接,非广播流可以通过调用connect()来完成,并将其BroadcastStream作为参数。这将返回一个BroadcastConnectedStream,我们可以process()方法来处理我们的逻辑。如果是Keyed Stream连接广播流,process()里面的参数需是KeyedBroadcastProcessFunction;如果是Non-Keyed Stream连接广播流,process()里面的参数是BroadcastProcessFunction

1、Keyed Stream连接广播流示例:

public class KeyedBroadcastStream {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        Properties p = new Properties();
        p.setProperty("bootstrap.servers", "localhost:9092");

        SingleOutputStreamOperator<User> user = env
                .addSource(new FlinkKafkaConsumer010<String>("user", new SimpleStringSchema(), p))
                .map((MapFunction<String, User>) value -> new Gson().fromJson(value, User.class));

        user.print("user: ");

        KeyedStream<Order, String> order = env
                .addSource(new FlinkKafkaConsumer010<String>("order", new SimpleStringSchema(), p))
                .map((MapFunction<String, Order>) value -> new Gson().fromJson(value, Order.class))
                .keyBy((KeySelector<Order, String>) value -> value.userId);

        order.print("order: ");

        MapStateDescriptor<String, User> descriptor = new MapStateDescriptor<String, User>("user", String.class, User.class);
        org.apache.flink.streaming.api.datastream.BroadcastStream<User> broadcast = user.broadcast(descriptor);

        order
                .connect(broadcast)
                .process(new KeyedBroadcastProcessFunction<String, Order, User, String>() {
                    @Override
                    public void processElement(Order value, ReadOnlyContext ctx, Collector<String> out) throws Exception {
                        ReadOnlyBroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
                        // 从广播中获取对应的key的value
                        User user = broadcastState.get(value.userId);
                        if (user != null) {
                            Tuple8<String, String, String, Long, String, String, String, Long> result = new Tuple8<>(
                                    value.userId,
                                    value.orderId,
                                    value.price,
                                    value.timestamp,
                                    user.name,
                                    user.age,
                                    user.sex,
                                    user.createTime
                            );
                            String s = result.toString();
                            out.collect(s);
                        }
                    }

                    @Override
                    public void processBroadcastElement(User value, Context ctx, Collector<String> out) throws Exception {
                        BroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
                        broadcastState.put(value.userId, value);
                    }
                })
                .print("");

        env.execute("broadcast: ");
    }
}

2、Non-Keyed Stream连接广播流

public class BroadcastStream {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        Properties p = new Properties();
        p.setProperty("bootstrap.servers", "localhost:9092");

        SingleOutputStreamOperator<User> user = env
                .addSource(new FlinkKafkaConsumer010<String>("user", new SimpleStringSchema(), p))
                .map(new MapFunction<String, User>() {
                    @Override
                    public User map(String value) throws Exception {
                        return new Gson().fromJson(value, User.class);
                    }
                });

        user.print("user: ");

        SingleOutputStreamOperator<Order> order = env
                .addSource(new FlinkKafkaConsumer010<String>("order", new SimpleStringSchema(), p))
                .map(new MapFunction<String, Order>() {
                    @Override
                    public Order map(String value) throws Exception {
                        return new Gson().fromJson(value, Order.class);
                    }
                });

        order.print("order: ");

        MapStateDescriptor<String, User> descriptor = new MapStateDescriptor<String, User>("user", String.class, User.class);
        org.apache.flink.streaming.api.datastream.BroadcastStream<User> broadcast = user.broadcast(descriptor);
        BroadcastConnectedStream<Order, User> connect = order.connect(broadcast);

        connect
                .process(new BroadcastProcessFunction<Order, User, String>() {
                    @Override
                    public void processElement(Order value, ReadOnlyContext ctx, Collector<String> out) throws Exception {
                        ReadOnlyBroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
                        // 从广播中获取对应的key的value
                        User user = broadcastState.get(value.userId);
                        if (user != null) {
                            Tuple8<String, String, String, Long, String, String, String, Long> result = new Tuple8<>(
                                    value.userId,
                                    value.orderId,
                                    value.price,
                                    value.timestamp,
                                    user.name,
                                    user.age,
                                    user.sex,
                                    user.createTime
                            );
                            String s = result.toString();
                            out.collect(s);
                        }
                    }

                    @Override
                    public void processBroadcastElement(User value, Context ctx, Collector<String> out) throws Exception {
                        BroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
                        broadcastState.put(value.userId, value);
                    }
                })
                .print("result: ");

        env.execute("broadcast: ");
    }
}

你可能感兴趣的:(大数据)