我们使用State描述了Operator State,在恢复时,可以修改并行度重新分配Operator State(偶分裂再分配方式),或者使用Union的方式(联合重新分发)恢复并行任务。
Operator State还有一种广播状态模式(Broadcast State)。
引入广播状态是为了支持这样的用例,其中来自一个流的一些数据需要被广播到所有下游任务,其中它被本地存储并用于处理另一个流上的所有传入元素。作为广播状态可以作为自然拟合出现的示例,可以想象包含一组规则的低吞吐量流,我们希望针对来自另一个流的所有元素进行评估
考虑到上述类型的用例,广播状态与其他运营商状态的不同之处在于:
他是一个MapState
它仅适用于具有特定的Operator作为输入一个广播流和一个非广播流
这样的运营商可以具有不同名称的多个广播状态。
将Keyed Stream
或Non-Keyed Stream
与一个BroadcastStream
连接,非广播流可以通过调用connect()
来完成,并将其BroadcastStream
作为参数。这将返回一个BroadcastConnectedStream
,我们可以process()
方法来处理我们的逻辑。如果是Keyed Stream
连接广播流,process()
里面的参数需是KeyedBroadcastProcessFunction
;如果是Non-Keyed Stream
连接广播流,process()
里面的参数是BroadcastProcessFunction
。
1、Keyed Stream连接广播流示例:
public class KeyedBroadcastStream {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
Properties p = new Properties();
p.setProperty("bootstrap.servers", "localhost:9092");
SingleOutputStreamOperator<User> user = env
.addSource(new FlinkKafkaConsumer010<String>("user", new SimpleStringSchema(), p))
.map((MapFunction<String, User>) value -> new Gson().fromJson(value, User.class));
user.print("user: ");
KeyedStream<Order, String> order = env
.addSource(new FlinkKafkaConsumer010<String>("order", new SimpleStringSchema(), p))
.map((MapFunction<String, Order>) value -> new Gson().fromJson(value, Order.class))
.keyBy((KeySelector<Order, String>) value -> value.userId);
order.print("order: ");
MapStateDescriptor<String, User> descriptor = new MapStateDescriptor<String, User>("user", String.class, User.class);
org.apache.flink.streaming.api.datastream.BroadcastStream<User> broadcast = user.broadcast(descriptor);
order
.connect(broadcast)
.process(new KeyedBroadcastProcessFunction<String, Order, User, String>() {
@Override
public void processElement(Order value, ReadOnlyContext ctx, Collector<String> out) throws Exception {
ReadOnlyBroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
// 从广播中获取对应的key的value
User user = broadcastState.get(value.userId);
if (user != null) {
Tuple8<String, String, String, Long, String, String, String, Long> result = new Tuple8<>(
value.userId,
value.orderId,
value.price,
value.timestamp,
user.name,
user.age,
user.sex,
user.createTime
);
String s = result.toString();
out.collect(s);
}
}
@Override
public void processBroadcastElement(User value, Context ctx, Collector<String> out) throws Exception {
BroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
broadcastState.put(value.userId, value);
}
})
.print("");
env.execute("broadcast: ");
}
}
2、Non-Keyed Stream连接广播流
public class BroadcastStream {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
Properties p = new Properties();
p.setProperty("bootstrap.servers", "localhost:9092");
SingleOutputStreamOperator<User> user = env
.addSource(new FlinkKafkaConsumer010<String>("user", new SimpleStringSchema(), p))
.map(new MapFunction<String, User>() {
@Override
public User map(String value) throws Exception {
return new Gson().fromJson(value, User.class);
}
});
user.print("user: ");
SingleOutputStreamOperator<Order> order = env
.addSource(new FlinkKafkaConsumer010<String>("order", new SimpleStringSchema(), p))
.map(new MapFunction<String, Order>() {
@Override
public Order map(String value) throws Exception {
return new Gson().fromJson(value, Order.class);
}
});
order.print("order: ");
MapStateDescriptor<String, User> descriptor = new MapStateDescriptor<String, User>("user", String.class, User.class);
org.apache.flink.streaming.api.datastream.BroadcastStream<User> broadcast = user.broadcast(descriptor);
BroadcastConnectedStream<Order, User> connect = order.connect(broadcast);
connect
.process(new BroadcastProcessFunction<Order, User, String>() {
@Override
public void processElement(Order value, ReadOnlyContext ctx, Collector<String> out) throws Exception {
ReadOnlyBroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
// 从广播中获取对应的key的value
User user = broadcastState.get(value.userId);
if (user != null) {
Tuple8<String, String, String, Long, String, String, String, Long> result = new Tuple8<>(
value.userId,
value.orderId,
value.price,
value.timestamp,
user.name,
user.age,
user.sex,
user.createTime
);
String s = result.toString();
out.collect(s);
}
}
@Override
public void processBroadcastElement(User value, Context ctx, Collector<String> out) throws Exception {
BroadcastState<String, User> broadcastState = ctx.getBroadcastState(descriptor);
broadcastState.put(value.userId, value);
}
})
.print("result: ");
env.execute("broadcast: ");
}
}