构建一个flink程序,从kafka读取然后写入MYSQL

  最近flink已经变得比较流行了,所以大家要了解flink并且使用flink。现在最流行的实时计算应该就是flink了,它具有了流计算和批处理功能。它可以处理有界数据和无界数据,也就是可以处理永远生产的数据。具体的细节我们不讨论,我们直接搭建一个flink功能。总体的思路是source -> transform -> sink,即从source获取相应的数据来源,然后进行数据转换,将数据从比较乱的格式,转换成我们需要的格式,转换处理后,然后进行sink功能,也就是将数据写入到相应的db里边或文件中用于存储和展现。

  接下来我们需要下载flink,kafka,mysql, zookeeper,  我直接下载了tar或tgz包,然后解压。

  下载好flink之后,然后启动一下,比如我下载了flink-1.9.1-bin-scala_2.11.tgz,然后解压一下。

tar -zxvf flink-1.9.1-bin-scala_2.11.tgz
cd flink-1.9.1
./bin/start-cluster.sh

  启动好了之后访问 http://localhost:8081 ,会看到截图。

构建一个flink程序,从kafka读取然后写入MYSQL_第1张图片

 

 

   下载zookeeper,解压之后,复制zookeeper/conf下的zoo_sample.cfg, 然后启动,命令如下,zookeeper是和kafka结合使用的,因为kafka要监听和发现所有broker的。

cp zoo_sample.cfg zoo.cfg
cd ../
./bin/zkServer.sh start

   接下来下载kafka和启动, 创建一个person的topic,由一个partition和一个备份构成。

./bin/kafka-server-start.sh config/server.properties
./bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic person

  mysql的话,大家可以自行安装了,安装好之后可以在数据库里创建一张表。

CREATE TABLE `Person` (
  `id` mediumint NOT NULL auto_increment,
  `name` varchar(255) NOT NULL,
  `age` int(11) DEFAULT NULL,
  `createDate` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci

  接下来我们该创建一个JAVA工程,采用的maven的方式。前提是大家一定要先安装好maven,可以执行mvn命令。直接执行一下maven的时候可能会卡住,下载不了,我先从

  http://repo.maven.apache.org/maven2/上下载一个  archetype-catalog.xml 文件,然后放到本地的maven对应的库,你们可以参考这个我的路径进行调整。    /Users/huangqingshi/.m2/repository/org/apache/maven/archetype/archetype-catalog/2.4
  
mvn archetype:generate \ 
    -DarchetypeGroupId=org.apache.flink \
    -DarchetypeArtifactId=flink-quickstart-java \
    -DarchetypeVersion=1.7.2 \
    -DgroupId=flink-project \
    -DartifactId=flink-project \
    -Dversion=0.1 \
    -Dpackage=myflink \
    -DinteractiveMode=false   #这个是创建项目时采用交互方式,上边指定了了相关的版本号和包名等信息,所以不需要交互方式进行。
    -DarchetypeCatalog=local  #这个是使用上边下载的文件,local也就是从本地文件获取,因为远程获取特别慢。导致工程生成不了。

  我的这个项目添加了一些依赖比如kafka的,数据库连接等,具体的pom文件内容为:


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0modelVersion>

    <groupId>flink-projectgroupId>
    <artifactId>flink-projectartifactId>
    <version>0.1version>
    <packaging>jarpackaging>

    <name>Flink Quickstart Jobname>
    <url>http://www.myorganization.orgurl>

    <properties>
        <project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
        <flink.version>1.7.2flink.version>
        <java.version>1.8java.version>
        <scala.binary.version>2.11scala.binary.version>
        <maven.compiler.source>${java.version}maven.compiler.source>
        <maven.compiler.target>${java.version}maven.compiler.target>
    properties>

    <repositories>
        <repository>
            <id>apache.snapshotsid>
            <name>Apache Development Snapshot Repositoryname>
            <url>https://repository.apache.org/content/repositories/snapshots/url>
            <releases>
                <enabled>falseenabled>
            releases>
            <snapshots>
                <enabled>trueenabled>
            snapshots>
        repository>
    repositories>

    <dependencies>
        
        
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-javaartifactId>
            <version>${flink.version}version>
            <scope>providedscope>
        dependency>
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-streaming-java_${scala.binary.version}artifactId>
            <version>${flink.version}version>
            <scope>providedscope>
        dependency>

        

        

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-connector-kafka-0.10_${scala.binary.version}artifactId>
            <version>${flink.version}version>
        dependency>

        
        
        <dependency>
            <groupId>org.slf4jgroupId>
            <artifactId>slf4j-log4j12artifactId>
            <version>1.7.7version>
            <scope>runtimescope>
        dependency>
        <dependency>
            <groupId>log4jgroupId>
            <artifactId>log4jartifactId>
            <version>1.2.17version>
            <scope>runtimescope>
        dependency>

        <dependency>
            <groupId>com.alibabagroupId>
            <artifactId>fastjsonartifactId>
            <version>1.2.62version>
        dependency>

        <dependency>
            <groupId>com.google.guavagroupId>
            <artifactId>guavaartifactId>
            <version>28.1-jreversion>
        dependency>

        <dependency>
            <groupId>redis.clientsgroupId>
            <artifactId>jedisartifactId>
            <version>3.1.0version>
        dependency>


        <dependency>
            <groupId>mysqlgroupId>
            <artifactId>mysql-connector-javaartifactId>
            <version>8.0.16version>
        dependency>

        <dependency>
            <groupId>com.alibabagroupId>
            <artifactId>druidartifactId>
            <version>1.1.20version>
        dependency>

    dependencies>

    <build>
        <plugins>

            
            <plugin>
                <groupId>org.apache.maven.pluginsgroupId>
                <artifactId>maven-compiler-pluginartifactId>
                <version>3.1version>
                <configuration>
                    <source>${java.version}source>
                    <target>${java.version}target>
                configuration>
            plugin>

            
            
            <plugin>
                <groupId>org.apache.maven.pluginsgroupId>
                <artifactId>maven-shade-pluginartifactId>
                <version>3.0.0version>
                <executions>
                    
                    <execution>
                        <phase>packagephase>
                        <goals>
                            <goal>shadegoal>
                        goals>
                        <configuration>
                            <artifactSet>
                                <excludes>
                                    <exclude>org.apache.flink:force-shadingexclude>
                                    <exclude>com.google.code.findbugs:jsr305exclude>
                                    <exclude>org.slf4j:*exclude>
                                    <exclude>log4j:*exclude>
                                excludes>
                            artifactSet>
                            <filters>
                                <filter>
                                    
                                    <artifact>*:*artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SFexclude>
                                        <exclude>META-INF/*.DSAexclude>
                                        <exclude>META-INF/*.RSAexclude>
                                    excludes>
                                filter>
                            filters>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>myflink.StreamingJobmainClass>
                                transformer>
                            transformers>
                        configuration>
                    execution>
                executions>
            plugin>
        plugins>

        <pluginManagement>
            <plugins>

                
                <plugin>
                    <groupId>org.eclipse.m2egroupId>
                    <artifactId>lifecycle-mappingartifactId>
                    <version>1.0.0version>
                    <configuration>
                        <lifecycleMappingMetadata>
                            <pluginExecutions>
                                <pluginExecution>
                                    <pluginExecutionFilter>
                                        <groupId>org.apache.maven.pluginsgroupId>
                                        <artifactId>maven-shade-pluginartifactId>
                                        <versionRange>[3.0.0,)versionRange>
                                        <goals>
                                            <goal>shadegoal>
                                        goals>
                                    pluginExecutionFilter>
                                    <action>
                                        <ignore/>
                                    action>
                                pluginExecution>
                                <pluginExecution>
                                    <pluginExecutionFilter>
                                        <groupId>org.apache.maven.pluginsgroupId>
                                        <artifactId>maven-compiler-pluginartifactId>
                                        <versionRange>[3.1,)versionRange>
                                        <goals>
                                            <goal>testCompilegoal>
                                            <goal>compilegoal>
                                        goals>
                                    pluginExecutionFilter>
                                    <action>
                                        <ignore/>
                                    action>
                                pluginExecution>
                            pluginExecutions>
                        lifecycleMappingMetadata>
                    configuration>
                plugin>
            plugins>
        pluginManagement>
    build>

    
    
    
    <profiles>
        <profile>
            <id>add-dependencies-for-IDEAid>

            <activation>
                <property>
                    <name>idea.versionname>
                property>
            activation>

            <dependencies>
                <dependency>
                    <groupId>org.apache.flinkgroupId>
                    <artifactId>flink-javaartifactId>
                    <version>${flink.version}version>
                    <scope>compilescope>
                dependency>
                <dependency>
                    <groupId>org.apache.flinkgroupId>
                    <artifactId>flink-streaming-java_${scala.binary.version}artifactId>
                    <version>${flink.version}version>
                    <scope>compilescope>
                dependency>
            dependencies>
        profile>
    profiles>

project>

  接下来,创建一个POJO对象用于保存数据等操作。

package myflink.pojo;

import java.util.Date;

/**
 * @author huangqingshi
 * @Date 2019-12-07
 */
public class Person {

    private String name;
    private int age;
    private Date createDate;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public Date getCreateDate() {
        return createDate;
    }

    public void setCreateDate(Date createDate) {
        this.createDate = createDate;
    }
}

   创建一个写入kafka的任务,用于将数据写入到kafka。

package myflink.kafka;

import com.alibaba.fastjson.JSON;
import myflink.pojo.Person;
import org.apache.commons.lang3.RandomUtils;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Date;
import java.util.Properties;
import java.util.concurrent.TimeUnit;

/**
 * @author huangqingshi
 * @Date 2019-12-07
 */
public class KafkaWriter {

    //本地的kafka机器列表
    public static final String BROKER_LIST = "localhost:9092";
    //kafka的topic
    public static final String TOPIC_PERSON = "PERSON";
    //key序列化的方式,采用字符串的形式
    public static final String KEY_SERIALIZER = "org.apache.kafka.common.serialization.StringSerializer";
    //value的序列化的方式
    public static final String VALUE_SERIALIZER = "org.apache.kafka.common.serialization.StringSerializer";

    public static void writeToKafka() throws Exception{
        Properties props = new Properties();
        props.put("bootstrap.servers", BROKER_LIST);
        props.put("key.serializer", KEY_SERIALIZER);
        props.put("value.serializer", VALUE_SERIALIZER);

        KafkaProducer producer = new KafkaProducer<>(props);

        //构建Person对象,在name为hqs后边加个随机数
        int randomInt = RandomUtils.nextInt(1, 100000);
        Person person = new Person();
        person.setName("hqs" + randomInt);
        person.setAge(randomInt);
        person.setCreateDate(new Date());
        //转换成JSON
        String personJson = JSON.toJSONString(person);

        //包装成kafka发送的记录
        ProducerRecord record = new ProducerRecord(TOPIC_PERSON, null,
                null, personJson);
        //发送到缓存
        producer.send(record);
        System.out.println("向kafka发送数据:" + personJson);
        //立即发送
        producer.flush();

    }

    public static void main(String[] args) {
        while(true) {
            try {
                //每三秒写一条数据
                TimeUnit.SECONDS.sleep(3);
                writeToKafka();
            } catch (Exception e) {
                e.printStackTrace();
            }

        }
    }

}

  创建一个数据库连接的工具类,用于连接数据库。使用Druid工具,然后放入具体的Driver,Url,数据库用户名和密码,初始化连接数,最大活动连接数,最小空闲连接数也就是数据库连接池,创建好之后返回需要的连接。

 
   
package myflink.db;

import com.alibaba.druid.pool.DruidDataSource;

import java.sql.Connection;

/**
* @author huangqingshi
* @Date 2019-12-07
*/
public class DbUtils {

private static DruidDataSource dataSource;

public static Connection getConnection() throws Exception {
dataSource = new DruidDataSource();
dataSource.setDriverClassName("com.mysql.cj.jdbc.Driver");
dataSource.setUrl("jdbc:mysql://localhost:3306/testdb");
dataSource.setUsername("root");
dataSource.setPassword("root");
//设置初始化连接数,最大连接数,最小闲置数
dataSource.setInitialSize(10);
dataSource.setMaxActive(50);
dataSource.setMinIdle(5);
//返回连接
return dataSource.getConnection();
}

}
 

  接下来创建一个MySQLSink,继承RichSinkFunction类。重载里边的open、invoke、close方法,在执行数据sink之前先执行open方法,然后开始调用invoke, 调用完成后最后执行close方法。也就是先在open里边创建数据库连接,创建好之后进行调用invoke,执行具体的数据库写入程序,执行完所有的数据库写入程序之后,最后没有数据之后会调用close方法,将数据库连接资源进行关闭和释放。具体参考如下代码。

package myflink.sink;

import myflink.db.DbUtils;
import myflink.pojo.Person;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.Timestamp;
import java.util.List;

/**
 * @author huangqingshi
 * @Date 2019-12-07
 */
public class MySqlSink extends RichSinkFunction> {

    private PreparedStatement ps;
    private Connection connection;

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        //获取数据库连接,准备写入数据库
        connection = DbUtils.getConnection();
        String sql = "insert into person(name, age, createDate) values (?, ?, ?); ";
        ps = connection.prepareStatement(sql);
    }

    @Override
    public void close() throws Exception {
        super.close();
        //关闭并释放资源
        if(connection != null) {
            connection.close();
        }

        if(ps != null) {
            ps.close();
        }
    }

    @Override
    public void invoke(List persons, Context context) throws Exception {
        for(Person person : persons) {
            ps.setString(1, person.getName());
            ps.setInt(2, person.getAge());
            ps.setTimestamp(3, new Timestamp(person.getCreateDate().getTime()));
            ps.addBatch();
        }

        //一次性写入
        int[] count = ps.executeBatch();
        System.out.println("成功写入Mysql数量:" + count.length);

    }
}

  创建从kafka读取数据的source,然后sink到数据库。配置连接kafka所需要的环境,然后从kafka里边读取数据然后transform成Person对象,这个就是上边所说的transform。收集5秒钟窗口从kafka获取的所有数据,最后sink到MySQL数据库。

package myflink;

import com.alibaba.fastjson.JSONObject;
import myflink.kafka.KafkaWriter;
import myflink.pojo.Person;
import myflink.sink.MySqlSink;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.shaded.guava18.com.google.common.collect.Lists;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.AllWindowFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.util.Collector;

import java.util.List;
import java.util.Properties;

/**
 * @author huangqingshi
 * @Date 2019-12-07
 */
public class DataSourceFromKafka {

    public static void main(String[] args) throws Exception{
        //构建流执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //kafka
        Properties prop = new Properties();
        prop.put("bootstrap.servers", KafkaWriter.BROKER_LIST);
        prop.put("zookeeper.connect", "localhost:2181");
        prop.put("group.id", KafkaWriter.TOPIC_PERSON);
        prop.put("key.serializer", KafkaWriter.KEY_SERIALIZER);
        prop.put("value.serializer", KafkaWriter.VALUE_SERIALIZER);
        prop.put("auto.offset.reset", "latest");

        DataStreamSource dataStreamSource = env.addSource(new FlinkKafkaConsumer010(
                KafkaWriter.TOPIC_PERSON,
                new SimpleStringSchema(),
                prop
                )).
                //单线程打印,控制台不乱序,不影响结果
                setParallelism(1);

        //从kafka里读取数据,转换成Person对象
        DataStream dataStream = dataStreamSource.map(value -> JSONObject.parseObject(value, Person.class));
        //收集5秒钟的总数
        dataStream.timeWindowAll(Time.seconds(5L)).
                apply(new AllWindowFunction, TimeWindow>() {

                    @Override
                    public void apply(TimeWindow timeWindow, Iterable iterable, Collector> out) throws Exception {
                        List persons = Lists.newArrayList(iterable);

                        if(persons.size() > 0) {
                            System.out.println("5秒的总共收到的条数:" + persons.size());
                            out.collect(persons);
                        }

                    }
                })
                //sink 到数据库
                .addSink(new MySqlSink());
                //打印到控制台
                //.print();


        env.execute("kafka 消费任务开始");
    }

}

  一切准备就绪,然后运行KafkaWriter的main方法往kafka的person主题里边写入数据。看到日志说明已经写入成功了。

   运行DataSourceFromKafka的main方法从kafka读取数据,然后写入数据库,提示如下:

 

   然后查询数据库,数据库里边写入数据库了, 说明成功了。

构建一个flink程序,从kafka读取然后写入MYSQL_第2张图片

 

   完工啦, 如果有什么地方不对的地方欢迎指出。

  

你可能感兴趣的:(构建一个flink程序,从kafka读取然后写入MYSQL)