Flink入门案例WordCount

FLINK

大数据领域避不可谈的就是实时流数据处理了,Flink作为流处理的中坚力量,有着广泛的应用,下面来结合WordCount来简单理解如何使用Flink吧

流程

Flink处理数据流程如下

在这里插入图片描述

WordCount

注:需要建立一个Maven项目,而不是SpringBoot项目,项目结构如下

Flink入门案例WordCount_第1张图片

log4j
log4j.rootLogger=error, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

pom

properties

	<properties>
	  <project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
	  <project.reporting.outputEncoding>UTF-8project.reporting.outputEncoding>
	  <flink.version>1.13.0flink.version>
	  <java.version>1.8java.version>
	  <scala.binary.version>2.12scala.binary.version>
	  <slf4j.version>1.7.30slf4j.version>
	properties>

dependency

	
    <dependency>
      <groupId>org.apache.flinkgroupId>
      <artifactId>flink-javaartifactId>
      <version>${flink.version}version>
    dependency>
    <dependency>
      <groupId>org.apache.flinkgroupId>
      <artifactId>flink-streaming-java_${scala.binary.version}artifactId>
      <version>${flink.version}version>
    dependency>
    <dependency>
      <groupId>org.apache.flinkgroupId>
      <artifactId>flink-clients_${scala.binary.version}artifactId>
      <version>${flink.version}version>
    dependency>
	
	
    <dependency>
      <groupId>org.slf4jgroupId>
      <artifactId>slf4j-apiartifactId>
      <version>${slf4j.version}version>
    dependency>
    <dependency>
      <groupId>org.slf4jgroupId>
      <artifactId>slf4j-log4j12artifactId>
      <version>${slf4j.version}version>
    dependency>
    <dependency>
      <groupId>org.apache.logging.log4jgroupId>
      <artifactId>log4j-to-slf4jartifactId>
      <version>2.14.0version>
    dependency>

代码

package org.example.flink.level1;

import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

import java.util.Arrays;

public class BoundedStreamWordCount {
    public static void main(String[] args) throws Exception {
        // 1. 创建流式执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        
        // 2. 读取文件
        DataStreamSource<String> lineDSS = env.readTextFile("input/words.txt");
        
        // 3. 转换数据格式
        SingleOutputStreamOperator<Tuple2<String, Long>> wordAndOne = lineDSS
                .flatMap((String line, Collector<String> words) -> {
                    Arrays.stream(line.split(" ")).forEach(words::collect);
                })
                .returns(Types.STRING)
                .map(word -> Tuple2.of(word, 1L))
                .returns(Types.TUPLE(Types.STRING, Types.LONG));
                
        // 4. 分组
        KeyedStream<Tuple2<String, Long>, String> wordAndOneKS = wordAndOne
                .keyBy(t -> t.f0);
                
        // 5. 求和
        SingleOutputStreamOperator<Tuple2<String, Long>> result = wordAndOneKS
                .sum(1);
                
        // 6. 打印
        result.print();
        
        // 7. 执行
        env.execute();
    }
}


结果

Flink入门案例WordCount_第2张图片

注:文件放在项目根目录下

Flink入门案例WordCount_第3张图片

你可能感兴趣的:(Flink,flink,log4j,java)