Flink KafkaSink

Kafka集成


 org.apache.flink
 flink-connector-kafka_2.11
 1.10.0

  • 方案1
import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema
import org.apache.kafka.clients.producer.ProducerRecord
class UserDefinedKafkaSerializationSchema extends
KafkaSerializationSchema[(String,Int)]{
 override def serialize(element: (String, Int), timestamp: lang.Long):
ProducerRecord[Array[Byte], Array[Byte]] = {
 return new
ProducerRecord("topic01",element._1.getBytes(),element._2.toString.getBytes())
 }
}
//1.创建流计算执⾏环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 env.setParallelism(4)
 //2.创建DataStream - 细化
 val text = env.readTextFile("hdfs://CentOS:9000/demo/words")
 val props = new Properties()
 props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "CentOS:9092")
 props.setProperty(ProducerConfig.BATCH_SIZE_CONFIG,"100")
 props.setProperty(ProducerConfig.LINGER_MS_CONFIG,"500")
 //Semantic.EXACTLY_ONCE:开启kafka幂等写特性
 //Semantic.AT_LEAST_ONCE:开启Kafka Retries机制
 val kafakaSink = new FlinkKafkaProducer[(String, Int)]("defult_topic",
 new
UserDefinedKafkaSerializationSchema, props,
 Semantic.AT_LEAST_ONCE)
 //3.执⾏DataStream的转换算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 counts.addSink(kafakaSink)
 //5.执⾏流计算任务
 env.execute("Window Stream WordCount")

以上的defult_topic没有任何意义

  • 方案2–老版
class UserDefinedKeyedSerializationSchema extends
KeyedSerializationSchema[(String,Int)]{
Int
 override def serializeKey(element: (String, Int)): Array[Byte] = {
 element._1.getBytes()
 }
 override def serializeValue(element: (String, Int)): Array[Byte] = {
 element._2.toString.getBytes()
 }
 //可以覆盖 默认是topic,如果返回null,则将数据写⼊到默认的topic中
 override def getTargetTopic(element: (String, Int)): String = {
 null
 }
}
//1.创建流计算执⾏环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
 env.setParallelism(4)
 //2.创建DataStream - 细化
 val text = env.readTextFile("hdfs://CentOS:9000/demo/words")
Operators
DataStream Transformations
DataStream → DataStream
Map
Takes one element and produces one element. A map function that doubles the values of the input stream:
FlatMap
Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to
words:
Filter
Evaluates a boolean function for each element and retains those for which the function returns true. A filter
that filters out zero values:
DataStream* → DataStream
 val props = new Properties()
 props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "CentOS:9092")
 props.setProperty(ProducerConfig.BATCH_SIZE_CONFIG,"100")
 props.setProperty(ProducerConfig.LINGER_MS_CONFIG,"500")
 //Semantic.EXACTLY_ONCE:开启kafka幂等写特性
 //Semantic.AT_LEAST_ONCE:开启Kafka Retries机制
 val kafakaSink = new FlinkKafkaProducer[(String, Int)]("defult_topic",
 new
UserDefinedKeyedSerializationSchema, props, Semantic.AT_LEAST_ONCE)
 //3.执⾏DataStream的转换算⼦
 val counts = text.flatMap(line=>line.split("\\s+"))
 .map(word=>(word,1))
 .keyBy(0)
 .sum(1)
 counts.addSink(kafakaSink)
 //5.执⾏流计算任务
 env.execute("Window Stream WordCount")

你可能感兴趣的:(Flink,flink,kafka,sink)