Kafak+Flink实现词频统计demo

整个demo流程均在win10操作系统完成,过程中使用的所有组件都以单机模式安装在本地,整个流程可看作flink版本的helloworld。实现基本功能是在kafka生产者输入一串以空格分隔的字符串,最终计算后得到每个字符串的出现频次。


环境准备


  • zookeeper安装:

使用的v3.4.13版本,官网下载安装包,解压缩,修改conf下的zoo_sample.cfg文件,主要注意如下所示

# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=E:\BigData\zookeeper-data
# the port at which the clients will connect
clientPort=2181

将dataDir指向本地某个目录,设置客户端端口,这里可默认使用2181。修改完成保存后,cmd窗口中进入bin目录执行启动zk。

zkServer.cmd

 

  •  kafka安装:

使用的kafka_2.11-2.1.1版本,同样官网下在压缩包后解压缩,默认可不用修改配置。进入bin\windows目录,cmd窗口中依次执行指令:

kafka-server-start.bat ..\..\config\server.properties
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
kafka-console-producer.bat --broker-list localhost:9092 --topic test
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test --from-beginning

分别表示:启动kafka、创建topic test、发布消息到test、订阅test消息。此时,在打开的生存者窗口中输入一条消息,将会在消费者窗口展示。

  •  flink安装:

使用flink-1.7.2版本,同样官网下在压缩包后解压缩,默认可不用修改配置。进入bin目录,cmd窗口执行指令启动即可。

start-cluster.sh

注:所有启动后的cmd窗口都暂时不要关闭。


项目代码


  • pom文件


    4.0.0

    com.fighting.sz
    flink-demo
    1.0-SNAPSHOT

    
        UTF-8
        UTF-8
    

    
        
            org.apache.flink
            flink-connector-kafka_2.11
            1.7.2
        
        
            org.apache.flink
            flink-java
            1.7.2
        
        
            org.apache.flink
            flink-streaming-java_2.11
            1.7.2
        
        
            org.apache.flink
            flink-clients_2.11
            1.7.2
        
    

    
        
            
                org.apache.maven.plugins
                maven-compiler-plugin
                
                    1.8
                    1.8
                
            

            
                org.apache.maven.plugins
                maven-shade-plugin
                1.2.1
                
                    
                        package
                        
                            shade
                        
                        
                            
                                
                                    KafkaToFlink
                                
                            
                        
                    
                
            
        
    


  • java代码
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.util.Collector;

/**
 * Created by 79073 on 2019/7/15.
 */
public class KafkaToFlink {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.enableCheckpointing(5000);
        ParameterTool parameterTool = ParameterTool.fromArgs(args);
        DataStream dataStream = env.addSource(new FlinkKafkaConsumer(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
        DataStream windowCounts = dataStream.rebalance().flatMap(new FlatMapFunction() {
            public void flatMap(String value, Collector out) {
                System.out.println("接收到kafka数据:" + value);
                for (String word : value.split("\\s")) {
                    out.collect(new WordWithCount(word, 1L));
                }
            }
        }).keyBy("word")
                .timeWindow(Time.seconds(2))
                .reduce(new ReduceFunction() {
                    public WordWithCount reduce(WordWithCount a, WordWithCount b) {
                        return new WordWithCount(a.word, a.count + b.count);
                    }
                });
        windowCounts.print().setParallelism(1);
        env.execute("KafkaToFlink");
    }

    public static class WordWithCount {
        public String word;
        public long count;

        public WordWithCount() {
        }

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return word + " : " + count;
        }
    }
}

任务发布


将项目使用maven打成jar包,部署上flink

  • 任务发布:

在启动flink后,访问localhost:8081打开flink web-ui,上传jar包,并配置参数

Kafak+Flink实现词频统计demo_第1张图片

红框内输入:--bootstrap.servers localhost:9092 --topic test --group.id test-consumer-group 点提交。成功后如下

Kafak+Flink实现词频统计demo_第2张图片

此时,在打开的kafka-producer页面输入一串字符串,可看到flink-job窗口会显示对应的计算结果

Kafak+Flink实现词频统计demo_第3张图片

done!!!

你可能感兴趣的:(大数据时代)