Flink学习10---DataStream之Sink简介及RichSinkFunction

功能就是负责把 Flink 处理后的数据输出到外部系统中。

一、Flink针对DataStream提供了大量的已经实现的数据下沉(sink)方式,具体有:

1. writeAsText(): 将元素以字符串形式逐行写入,这些字符串通过调用每个元素的toString()方法来获取。

2. print() / printToErr(): 打印每个元素的toString()方法的值到标准输出或者标准错误输出流中。

3. 自定义输出:addSink可以实现把数据输出到第三方存储介质中。

Flink通过内置的Connector和Apache Bahir组件提供了对应sink的支持。

详细参考:https://blog.csdn.net/zhuzuwei/article/details/107137295

 

二、Sink组件容错性保证

Sink 语义保证 备注
HDFS Exactly-once  
Elasticsearch At-least-once  
Kafka Produce At-least-once / At-most-once Kafka 0.9和0.10提供At-least-once
Kafka 0.11提供Exactly_once
File At-least-once  
Redis At-least-once  

 

三、实例演示

     之前都是print()的sink方式,此处演示sink到Txt文件和Redis数据库。

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class AddSinkReivew {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource lines = env.socketTextStream("192.168.***.***", 8888);

        SingleOutputStreamOperator> words = lines.flatMap(new FlatMapFunction>() {
            @Override
            public void flatMap(String s, Collector> collector) throws Exception {
                String[] words = s.split(",");
                for (int i = 0; i < words.length; i++) {
                    collector.collect(Tuple2.of(words[i], 1));
                }
            }
        });

        SingleOutputStreamOperator> summed = words.keyBy(0).sum(1);

        summed.print();

        summed.writeAsText("C:\\Users\\Dell\\Desktop\\flinkTest\\sinkout1.txt", FileSystem.WriteMode.OVERWRITE);

        SingleOutputStreamOperator> words2 = lines.flatMap(new FlatMapFunction>() {
            @Override
            public void flatMap(String s, Collector> collector) throws Exception {
                String[] words = s.split(",");
                for (int i = 0; i < words.length; i++) {
                    collector.collect(Tuple3.of("wordscount",words[i], 1));
                }
            }
        });

        SingleOutputStreamOperator> summed2 = words2.keyBy(1).sum(2);

        String configPath = "C:\\Users\\Dell\\Desktop\\flinkTest\\config.txt";
        ParameterTool parameters = ParameterTool.fromPropertiesFile(configPath);
        //设置全局参数
        env.getConfig().setGlobalJobParameters(parameters);

        summed2.addSink(new MyRedisSinkFunction());

        env.execute("AddSinkReivew");
    }
}
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import redis.clients.jedis.Jedis;

public class MyRedisSinkFunction extends RichSinkFunction>{
    private transient Jedis jedis;


    @Override
    public void open(Configuration config) {
        ParameterTool parameters = (ParameterTool)getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
        String host = parameters.getRequired("redis.host");
        String password = parameters.get("redis.password", "");
        Integer port = parameters.getInt("redis.port", 6379);
        Integer timeout = parameters.getInt("redis.timeout", 5000);
        Integer db = parameters.getInt("redis.db", 0);
        jedis = new Jedis(host, port, timeout);
        jedis.auth(password);
        jedis.select(db);
    }

    @Override
    public void invoke(Tuple3 value, Context context) throws Exception {
        if (!jedis.isConnected()) {
            jedis.connect();
        }
        //保存
        jedis.hset(value.f0, value.f1, String.valueOf(value.f2));
    }

    @Override
    public void close() throws Exception {
        jedis.close();
    }
}

WriteAsText中指定的sinkout1.txt并不是一个文件,而是会生成同名文件夹。里面有4个文件对应并行度,保存不同subTask sink的结果。

Flink学习10---DataStream之Sink简介及RichSinkFunction_第1张图片

单个文件夹内保存的结果如下:

Flink学习10---DataStream之Sink简介及RichSinkFunction_第2张图片

Redis中也有了sink的结果

Flink学习10---DataStream之Sink简介及RichSinkFunction_第3张图片

你可能感兴趣的:(Flink)