基于flink和kafka进行整合,当程序出现异常时自动重启数据恢复。
当做checkPoint时程序挂掉,然后程序自动重启,那距上次ck时,这段时间读的数据,岂不是重复读取重复计算了吗?
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(30000);
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, 5000));
env.setStateBackend(new FsStateBackend("hdfs://linux01:9000/it"));
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig
.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
可参考官网:https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/kafka.html
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-connector-kafka_2.11artifactId>
<version>1.10.0version>
dependency>
// 二、配置kafka的参数
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "linux01:9092,linux02:9092,linux03:9092");//kafka集群机器
properties.setProperty("group.id", "g001");
properties.setProperty("auto.offset.reset", "earliest");
//不自动提交偏移量,如果不设置,默认为true
// properties.setProperty("enable.auto.commit", "false");
FlinkKafkaConsumer<String> flinkKafkaConsumer = new FlinkKafkaConsumer<>("wordCount", new SimpleStringSchema(), properties);
// ③
flinkKafkaConsumer.setCommitOffsetsOnCheckpoints(false);
DataStreamSource<String> lines = env.addSource(flinkKafkaConsumer);
得到的Kafka Source 是可并行的、可checkPoint的source,因为查看源码我们可以清晰的得到 →
// lambda表达式的使用
SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = lines.flatMap((String line, Collector<Tuple2<String, Integer>> out) -> {
Arrays.stream(line.split(" ")).forEach(w -> out.collect(Tuple2.of(w, 1)));
})
.returns(Types.TUPLE(Types.STRING, Types.INT));
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = wordAndOne.keyBy(0).sum(1);
可参考官网:https://bahir.apache.org/docs/flink/current/flink-streaming-redis/
将sum聚合之后的数据,通过调用sink,将数据写出到Redis数据库中
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-connector-redis_2.11artifactId>
<version>1.1.5version>
dependency>
// 以内部类的形式或者单独一个类
public static class RedisWordCountMapper implements RedisMapper<Tuple2<String, Integer>> {
//指定写入Redis中的方法和最外面的大key的名称
@Override
public RedisCommandDescription getCommandDescription() {
return new RedisCommandDescription(RedisCommand.HSET, "wc");
}
@Override
public String getKeyFromData(Tuple2<String, Integer> stringIntegerTuple2) {
return stringIntegerTuple2.f0;//将数据中的哪个字段作为key写入
}
@Override
public String getValueFromData(Tuple2<String, Integer> stringIntegerTuple2) {
return stringIntegerTuple2.f1.toString();//将数据中的哪个字段作为value写入
}
}
//创建Jedis连接的配置信息
FlinkJedisPoolConfig conf = new FlinkJedisPoolConfig.Builder()
.setHost("192.168.23.103")//redis所在机器的地址
.setDatabase(5)//选择哪个数据库
.setPassword("123456")
.build();
sum.addSink(new RedisSink<>(conf, new RedisWordCountMapper()));
env.execute("FlinkKafka");