【06】Flink 之 DataStream API(四):Data Sink 和 自定义Sink

1、DataStream API(四):Data Sink

1.1、Data Sink介绍

  • sink是程序的数据输出,可以通过StreamExecutionEnvironment.addSink(sinkFunction)来为程序添加一个sink。
  • flink提供了大量的已经实现好的sink方法,也可以自定义sink

1.2、Data Sink类型

  1. writeAsText():将元素以字符串形式逐行写入,这些字符串通过调用每个元素的toString()方法来获取
  2. print() / printToErr():打印每个元素的toString()方法的值到标准输出或者标准错误输出流中
  3. 自定义输出addSink【kafka、redis】

1.3、Source容错性保证

Sink 语义保证 备注
hdfs exactly once
elasticsearch at least once
kafka produce at least once/exactly once Kafka 0.9 and 0.10提供at least once Kafka 0.11提供exactly once
file at least once
redis at least once

1.4、Flink内置Connectors

  • A p a c h e K a f k a ( s o u r c e / s i n k ) \color{red}{Apache Kafka (source/sink)} ApacheKafka(source/sink)
  • Apache Cassandra (sink)
  • Elasticsearch (sink)
  • Hadoop FileSystem (sink)
  • RabbitMQ (source/sink)
  • Apache ActiveMQ (source/sink)
  • A R e d i s ( s i n k ) \color{red}{ARedis (sink)} ARedis(sink)

2、自定义Sink

2.1、实现方法即依赖

  • 实现自定义的sink
    (1)、实现SinkFunction接口
    (2)、或者继承RichSinkFunction
  • 参考org.apache.flink.streaming.connectors.redis.RedisSink

2.2、实例:Redis Sink

2.2.1、相关依赖

需要添加依赖
Maven仓库
搜索 flink-connector-redis 依赖:
【06】Flink 之 DataStream API(四):Data Sink 和 自定义Sink_第1张图片

    
      org.apache.bahir
      flink-connector-redis_${scala.version}
      1.0
    

2.2.2、Java代码实现

完整程序:

package com.Streaming.custormSink;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.redis.RedisSink;
import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisPoolConfig;
import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommand;
import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommandDescription;
import org.apache.flink.streaming.connectors.redis.common.mapper.RedisMapper;

/**
 * @Author: Henry
 * @Description: 接收socket数据,把数据保存到redis中(list格式)
 *               保存到Redis中数据一般采用两种格式:list或hashmap
 *
 *               lpush list_key value
 * @Date: Create in 2019/5/12 22:29
 **/
public class StreamingDemoToRedis {
    public static void main(String[] args)  throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource text = env.socketTextStream(
                "master", 9000, "\n");

        //input: l_words word , 其中 l_words 代表 list 类型
        //对数据进行组装,把string转化为tuple2
        DataStream> l_wordsData = text.map(
                new MapFunction>() {

            @Override
            public Tuple2 map(String value) throws Exception {
                return new Tuple2<>("l_words", value);
            }
        });

        //创建redis的配置
        FlinkJedisPoolConfig conf = new FlinkJedisPoolConfig.Builder()
                .setHost("master").setPort(6379).build();

        //创建redissink
        RedisSink> redisSink = new RedisSink<>(
                conf, new MyRedisMapper());

        l_wordsData.addSink(redisSink);
        env.execute("StreamingDemoToRedis");

    }

    public static class MyRedisMapper implements RedisMapper> {
        //表示从接收的数据中获取需要操作的redis key
        @Override
        public String getKeyFromData(Tuple2 data) {
            return data.f0;
        }
        //表示从接收的数据中获取需要操作的redis value
        @Override
        public String getValueFromData(Tuple2 data) {
            return data.f1;
        }
        @Override
        public RedisCommandDescription getCommandDescription() {
            return new RedisCommandDescription(RedisCommand.LPUSH);
        }
    }
}

2.2.3、Scala代码实现

完整代码如下:

package cn.Streaming.custormSink

import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.connectors.redis.RedisSink
import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisPoolConfig
import org.apache.flink.streaming.connectors.redis.common.mapper.{RedisCommand, RedisCommandDescription, RedisMapper}

/**
  * @Author: Henry
  * @Description:
  * @Date: Create in 2019/5/14 22:37 
  **/
object StreamingDataToRedisScala {

  def main(args: Array[String]): Unit = {

    //获取socket端口号
    val port = 9000

    //获取运行环境
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

    //链接socket获取输入数据
    val text = env.socketTextStream("master",port,'\n')

    //注意:必须要添加这一行隐式转行,否则下面的flatmap方法执行会报错
    import org.apache.flink.api.scala._

    val l_wordsData = text.map(line=>
      ("l_words_scala",line))

    val conf = new FlinkJedisPoolConfig.Builder()
      .setHost("master")
      .setPort(6379)
      .build()

    val redisSink = new RedisSink[Tuple2[String,String]](conf,new MyRedisMapper)

    l_wordsData.addSink(redisSink)

    //执行任务
    env.execute("Socket window count")


  }

  class MyRedisMapper extends RedisMapper[Tuple2[String,String]]{

    override def getKeyFromData(data: (String, String)) = {
      data._1
    }
    override def getValueFromData(data: (String, String)) = {
      data._2
    }
    override def getCommandDescription = {
      new RedisCommandDescription(RedisCommand.LPUSH)   // 具体操作命令
    }
  }
}

2.2.4、运行结果

先在一个终端启动redis server服务:

./src/redis-server

【06】Flink 之 DataStream API(四):Data Sink 和 自定义Sink_第2张图片
再在另一个终端连接服务:

./src/redis-cli     

开启socket终端:

nc  -l  9000

在IDEA中点击“Run”运行代码:
【06】Flink 之 DataStream API(四):Data Sink 和 自定义Sink_第3张图片
通过 nc 终端输入数据,查询 redis 数据库:
【06】Flink 之 DataStream API(四):Data Sink 和 自定义Sink_第4张图片

你可能感兴趣的:(Flink)