Flink-初识

由于公司决定将使用Flink分布式计算框架作为后期产品的优先技术框架,于是最近花了点时间来学习Flink,本文使用kafka作为数据源。

  1. Flink的安装:先去Flink官网下载Flink组件,我下载的版本是‘Apache Flink 1.8.1 for Scala 2.11’。下载后解压到本地/usr/local/flink-1.8.1
  2. Flink启动:2.1. cd /usr/local/flink-1.8.1/conf; 2.2 less flink-conf.yaml,找到‘jobmanager.rpc.address: ’,将其替换为'jobmanager.rpc.address: localhost'; 2.3 cd /usr/local/flink-1.8.1/bin, start-cluster.sh,此时Flink便在本地启动了起来,我们在浏览器中打开Flink UI:http://localhost:8081Flink-初识_第1张图片
  3. Flink api的使用:新建一个Java maven项目,在pom文件中引入flink和kafka核心组件:
    <project xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0modelVersion>
    
        <groupId>com.damongroupId>
        <artifactId>flinkartifactId>
        <version>0.0.1-SNAPSHOTversion>
        <packaging>jarpackaging>
    
        <name>flinkname>
        <url>http://maven.apache.orgurl>
    
        <properties>
            <project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
            <flink.version>1.4.1flink.version>
            <deploy.dir>./target/flink/deploy.dir>
        properties>
    
        <dependencies>
            <dependency>
                <groupId>org.springframework.bootgroupId>
                <artifactId>spring-boot-starter-webartifactId>
                <version>1.5.10.RELEASEversion>
            dependency>
            <dependency>
                <groupId>org.apache.flinkgroupId>
                <artifactId>flink-javaartifactId>
                <version>${flink.version}version>
            dependency>
            <dependency>
                <groupId>org.apache.flinkgroupId>
                <artifactId>flink-coreartifactId>
                <version>${flink.version}version>
            dependency>
            <dependency>
                <groupId>org.apache.flinkgroupId>
                <artifactId>flink-streaming-java_2.11artifactId>
                <version>${flink.version}version>
            dependency>
            <dependency>
                <groupId>org.apache.flinkgroupId>
                <artifactId>flink-clients_2.11artifactId>
                <version>${flink.version}version>
            dependency>
            <dependency>
                <groupId>org.apache.flinkgroupId>
                <artifactId>flink-connector-kafka-0.9_2.11artifactId>
                <version>${flink.version}version>
            dependency>
            <dependency>
                <groupId>org.apache.flinkgroupId>
                <artifactId>flink-runtime_2.11artifactId>
                <version>${flink.version}version>
            dependency>
            <dependency>
                <groupId>junitgroupId>
                <artifactId>junitartifactId>
                <version>3.8.1version>
                <scope>testscope>
            dependency>
            <dependency>
                <groupId>com.google.code.gsongroupId>
                <artifactId>gsonartifactId>
                <version>2.8.5version>
            dependency>
            <dependency>
                <groupId>log4jgroupId>
                <artifactId>log4jartifactId>
                <version>1.2.17version>
            dependency>
            <dependency>
                <groupId>org.slf4jgroupId>
                <artifactId>slf4j-apiartifactId>
                <version>1.7.26version>
            dependency>
            <dependency>
                <groupId>org.slf4jgroupId>
                <artifactId>slf4j-log4j12artifactId>
                <version>1.7.25version>
                <scope>compilescope>
            dependency>
        dependencies>
    
    
        <build>
            <finalName>flinkpackagefinalName>
            <sourceDirectory>src/main/javasourceDirectory>
            <resources>
                
                <resource>
                    <directory>src/main/resourcesdirectory>
                    <targetPath>${project.build.directory}targetPath>
                resource>
            resources>
            <plugins>
                
                <plugin>
                    <groupId>org.apache.maven.pluginsgroupId>
                    <artifactId>maven-compiler-pluginartifactId>
                    <configuration>
    
                        <source>1.8source>
                        <target>1.8target>
                        <encoding>UTF-8encoding>
                    configuration>
                plugin>
                <plugin>
                    <groupId>org.apache.maven.pluginsgroupId>
                    <artifactId>maven-shade-pluginartifactId>
                    <version>1.2.1version>
                    <executions>
                        <execution>
                            <phase>packagephase>
                            <goals>
                                <goal>shadegoal>
                            goals>
                            <configuration>
                                <transformers>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <mainClass>com.damon.flink.AppmainClass>
                                    transformer>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>reference.confresource>
                                    transformer>
                                transformers>
                            configuration>
                        execution>
                    executions>
                plugin>
            plugins>
        build>
    project>

    在主类中添加flink设置数据源及消费kafka数据的代码:

    package com.damon.flink;
    
    import com.damon.flink.model.Student;
    import com.damon.flink.sink.StudentSink;
    import com.google.gson.Gson;
    import org.apache.flink.api.common.serialization.SimpleStringSchema;
    import org.apache.flink.streaming.api.TimeCharacteristic;
    import org.apache.flink.streaming.api.datastream.DataStream;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.util.Properties;
    
    public class App
    {
        private static Logger log = LoggerFactory.getLogger(App.class);
    
        private static Gson gson = new Gson();
    
        @SuppressWarnings({ "serial", "deprecation" })
        public static void main( String[] args ) throws Exception {
    
            String topic = "test.topic";
            final  StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.enableCheckpointing(5000);
            env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
            Properties properties = new Properties();
            properties.setProperty("bootstrap.servers","localhost:9092");
            properties.setProperty("zookeeper.connect","localhost:2181");
            properties.setProperty("group.id","test-consumer-group");
            FlinkKafkaConsumer09 consumer09 = new FlinkKafkaConsumer09(topic,new SimpleStringSchema(),properties);
    
            DataStream kafkaStream = env.addSource(consumer09);
    
            DataStream studentStream = kafkaStream.map(stuent->gson.fromJson(stuent,Student.class)).keyBy("gender");
    
            studentStream.addSink(new StudentSink());
    
            env.execute("Flink Streaming Java API Skeleton");
            log.debug("Flink Started ...");
    
        }
    
    }

    其中model Student类:

    package com.damon.flink.model;
    
    public class Student {
        private String id;
        private String name;
        private String gender;
        private int age;
        private int score;
    
        public String getId() {
            return id;
        }
    
        public void setId(String id) {
            this.id = id;
        }
    
        public String getName() {
            return name;
        }
    
        public void setName(String name) {
            this.name = name;
        }
    
        public String getGender() {
            return gender;
        }
    
        public void setGender(String gender) {
            this.gender = gender;
        }
    
        public int getAge() {
            return age;
        }
    
        public void setAge(int age) {
            this.age = age;
        }
    
        public int getScore() {
            return score;
        }
    
        public void setScore(int score) {
            this.score = score;
        }
    }

    用于消费kafka消息的自定义StudentSink:

    package com.damon.flink.sink;
    
    import com.damon.flink.model.Student;
    import org.apache.flink.configuration.Configuration;
    import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class StudentSink extends RichSinkFunction {
    
        private static Logger log = LoggerFactory.getLogger(StudentSink.class);
    
        @Override
        public void open(Configuration parameters) throws Exception {
            super.open(parameters);
        }
    
        @Override
        public void close() throws Exception
        {
            super.close();
        }
    
        @Override
        public void invoke(Student value, Context context) throws Exception {
            log.info("Student : "+value.getName()+", Score : "+value.getScore());
        }
    }

    启动此项目,然后使用kafka发送消息,可以看到在console中消费了kafka消息:

    16:32:24.386 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:29.387 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:34.388 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:39.386 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:43.967 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:43.043, Score : 67
    16:32:44.385 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:44.587 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:44.044, Score : 88
    16:32:45.413 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:45.045, Score : 93
    16:32:45.925 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:45.045, Score : 62
    16:32:46.339 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:46.046, Score : 51
    16:32:47.059 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:46.046, Score : 61
    16:32:47.370 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:47.047, Score : 86
    16:32:47.986 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:47.047, Score : 58
    16:32:48.392 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:48.048, Score : 66
    16:32:48.806 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:48.048, Score : 58
    16:32:49.320 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:49.049, Score : 50
    16:32:49.388 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager
    16:32:49.735 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:49.049, Score : 62
    16:32:50.150 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:50.050, Score : 50
  4. Flink上传项目:将上述demo项目打包成一个jar包,将其上传至本地flink平台,可以在之前的flink ui上可视化上传,也可以使用脚本上传,此处我们使用bin目录下的flink脚本上传:
    192:bin damon$ ./flink run -c com.damon.flink.App /Users/damon/Project/flink/flink/target/flink-0.0.1-SNAPSHOT.jar

    此时我们再打开flink ui界面,在'Running Jobs'就能看到我们的项目了:Flink-初识_第2张图片

    我们继续向kafka发送消息,可以看到其显示了消息的字节数和消息量,同时我们在'Task Managers'也能看到消息处理的具体log:

    

至此,我们一个本地flink+kafka的demo项目就算是完成了,后期会继续研究yarn模式下的flink部署。

 

转载于:https://www.cnblogs.com/DamonCoding/p/11259972.html

你可能感兴趣的:(Flink-初识)