由于公司决定将使用Flink分布式计算框架作为后期产品的优先技术框架,于是最近花了点时间来学习Flink,本文使用kafka作为数据源。
- Flink的安装:先去Flink官网下载Flink组件,我下载的版本是‘Apache Flink 1.8.1 for Scala 2.11’。下载后解压到本地/usr/local/flink-1.8.1
- Flink启动:2.1. cd /usr/local/flink-1.8.1/conf; 2.2 less flink-conf.yaml,找到‘jobmanager.rpc.address: ’,将其替换为'jobmanager.rpc.address: localhost'; 2.3 cd /usr/local/flink-1.8.1/bin, start-cluster.sh,此时Flink便在本地启动了起来,我们在浏览器中打开Flink UI:http://localhost:8081
- Flink api的使用:新建一个Java maven项目,在pom文件中引入flink和kafka核心组件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0modelVersion> <groupId>com.damongroupId> <artifactId>flinkartifactId> <version>0.0.1-SNAPSHOTversion> <packaging>jarpackaging> <name>flinkname> <url>http://maven.apache.orgurl> <properties> <project.build.sourceEncoding>UTF-8project.build.sourceEncoding> <flink.version>1.4.1flink.version> <deploy.dir>./target/flink/deploy.dir> properties> <dependencies> <dependency> <groupId>org.springframework.bootgroupId> <artifactId>spring-boot-starter-webartifactId> <version>1.5.10.RELEASEversion> dependency> <dependency> <groupId>org.apache.flinkgroupId> <artifactId>flink-javaartifactId> <version>${flink.version}version> dependency> <dependency> <groupId>org.apache.flinkgroupId> <artifactId>flink-coreartifactId> <version>${flink.version}version> dependency> <dependency> <groupId>org.apache.flinkgroupId> <artifactId>flink-streaming-java_2.11artifactId> <version>${flink.version}version> dependency> <dependency> <groupId>org.apache.flinkgroupId> <artifactId>flink-clients_2.11artifactId> <version>${flink.version}version> dependency> <dependency> <groupId>org.apache.flinkgroupId> <artifactId>flink-connector-kafka-0.9_2.11artifactId> <version>${flink.version}version> dependency> <dependency> <groupId>org.apache.flinkgroupId> <artifactId>flink-runtime_2.11artifactId> <version>${flink.version}version> dependency> <dependency> <groupId>junitgroupId> <artifactId>junitartifactId> <version>3.8.1version> <scope>testscope> dependency> <dependency> <groupId>com.google.code.gsongroupId> <artifactId>gsonartifactId> <version>2.8.5version> dependency> <dependency> <groupId>log4jgroupId> <artifactId>log4jartifactId> <version>1.2.17version> dependency> <dependency> <groupId>org.slf4jgroupId> <artifactId>slf4j-apiartifactId> <version>1.7.26version> dependency> <dependency> <groupId>org.slf4jgroupId> <artifactId>slf4j-log4j12artifactId> <version>1.7.25version> <scope>compilescope> dependency> dependencies> <build> <finalName>flinkpackagefinalName> <sourceDirectory>src/main/javasourceDirectory> <resources> <resource> <directory>src/main/resourcesdirectory> <targetPath>${project.build.directory}targetPath> resource> resources> <plugins> <plugin> <groupId>org.apache.maven.pluginsgroupId> <artifactId>maven-compiler-pluginartifactId> <configuration> <source>1.8source> <target>1.8target> <encoding>UTF-8encoding> configuration> plugin> <plugin> <groupId>org.apache.maven.pluginsgroupId> <artifactId>maven-shade-pluginartifactId> <version>1.2.1version> <executions> <execution> <phase>packagephase> <goals> <goal>shadegoal> goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>com.damon.flink.AppmainClass> transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.confresource> transformer> transformers> configuration> execution> executions> plugin> plugins> build> project>
在主类中添加flink设置数据源及消费kafka数据的代码:
package com.damon.flink; import com.damon.flink.model.Student; import com.damon.flink.sink.StudentSink; import com.google.gson.Gson; import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.streaming.api.TimeCharacteristic; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.Properties; public class App { private static Logger log = LoggerFactory.getLogger(App.class); private static Gson gson = new Gson(); @SuppressWarnings({ "serial", "deprecation" }) public static void main( String[] args ) throws Exception { String topic = "test.topic"; final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(5000); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); Properties properties = new Properties(); properties.setProperty("bootstrap.servers","localhost:9092"); properties.setProperty("zookeeper.connect","localhost:2181"); properties.setProperty("group.id","test-consumer-group"); FlinkKafkaConsumer09
consumer09 = new FlinkKafkaConsumer09 (topic,new SimpleStringSchema(),properties); DataStream kafkaStream = env.addSource(consumer09); DataStream studentStream = kafkaStream.map(stuent->gson.fromJson(stuent,Student.class)).keyBy("gender"); studentStream.addSink(new StudentSink()); env.execute("Flink Streaming Java API Skeleton"); log.debug("Flink Started ..."); } } 其中model Student类:
package com.damon.flink.model; public class Student { private String id; private String name; private String gender; private int age; private int score; public String getId() { return id; } public void setId(String id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getGender() { return gender; } public void setGender(String gender) { this.gender = gender; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } public int getScore() { return score; } public void setScore(int score) { this.score = score; } }
用于消费kafka消息的自定义StudentSink:
package com.damon.flink.sink; import com.damon.flink.model.Student; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class StudentSink extends RichSinkFunction
{ private static Logger log = LoggerFactory.getLogger(StudentSink.class); @Override public void open(Configuration parameters) throws Exception { super.open(parameters); } @Override public void close() throws Exception { super.close(); } @Override public void invoke(Student value, Context context) throws Exception { log.info("Student : "+value.getName()+", Score : "+value.getScore()); } } 启动此项目,然后使用kafka发送消息,可以看到在console中消费了kafka消息:
16:32:24.386 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager 16:32:29.387 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager 16:32:34.388 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager 16:32:39.386 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager 16:32:43.967 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:43.043, Score : 67 16:32:44.385 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager 16:32:44.587 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:44.044, Score : 88 16:32:45.413 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:45.045, Score : 93 16:32:45.925 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:45.045, Score : 62 16:32:46.339 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:46.046, Score : 51 16:32:47.059 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:46.046, Score : 61 16:32:47.370 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:47.047, Score : 86 16:32:47.986 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:47.047, Score : 58 16:32:48.392 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:48.048, Score : 66 16:32:48.806 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:48.048, Score : 58 16:32:49.320 [Sink: Unnamed (2/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:49.049, Score : 50 16:32:49.388 [flink-akka.actor.default-dispatcher-7] DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Sending heartbeat to JobManager 16:32:49.735 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:49.049, Score : 62 16:32:50.150 [Sink: Unnamed (7/12)] INFO com.damon.flink.sink.StudentSink - Student : DamonTest-2019-07-28 16:32:50.050, Score : 50
- Flink上传项目:将上述demo项目打包成一个jar包,将其上传至本地flink平台,可以在之前的flink ui上可视化上传,也可以使用脚本上传,此处我们使用bin目录下的flink脚本上传:
192:bin damon$ ./flink run -c com.damon.flink.App /Users/damon/Project/flink/flink/target/flink-0.0.1-SNAPSHOT.jar
我们继续向kafka发送消息,可以看到其显示了消息的字节数和消息量,同时我们在'Task Managers'也能看到消息处理的具体log:
至此,我们一个本地flink+kafka的demo项目就算是完成了,后期会继续研究yarn模式下的flink部署。