Flink实时数仓---1,Flink SQL读取kafka实操,已经遇到的问题.....持续更新吧

 

1,第一步很重要 就是依赖的问题,因为是本地执行,所以最好有hadoop配置环境,没有的话会提示错误信息,自己百度一下,自己下载个winutils.exe ,然后配置环境变量

Flink实时数仓---1,Flink SQL读取kafka实操,已经遇到的问题.....持续更新吧_第1张图片

 

2,为了图方便 直接贴上pom文件依赖(自己看哈,就是hive跟 hadoop的依赖): 



    4.0.0

    pijiuya
    FlinkExample
    1.0-SNAPSHOT

    
        1.10.0
    


    
        
            org.apache.flink
            flink-java
            ${flink.version}
            
            
        
        
            org.apache.flink
            flink-streaming-java_2.11
            ${flink.version}
            
        
        
            org.apache.flink
            flink-scala_2.11
            ${flink.version}
            
        
        
            org.apache.flink
            flink-streaming-scala_2.11
            ${flink.version}
            
        

        
            org.apache.bahir
            flink-connector-redis_2.11
            1.0
        

        
            org.apache.flink
            flink-statebackend-rocksdb_2.11
            ${flink.version}
        

        
            org.apache.flink
            flink-connector-kafka-0.11_2.11
            ${flink.version}
        

        
            org.apache.kafka
            kafka-clients
            0.11.0.3
        

        
            org.slf4j
            slf4j-api
            1.7.25
        

        
            org.slf4j
            slf4j-log4j12
            1.7.25
        


        
            org.apache.flink
            flink-table
            ${flink.version}
            pom
        
        
            org.apache.flink
            flink-table-api-java-bridge_2.11
            ${flink.version}
        
        
        
            org.apache.flink
            flink-table-api-scala-bridge_2.11
            ${flink.version}
        
        
            org.apache.flink
            flink-table-common
            ${flink.version}
        
        
            org.apache.flink
            flink-table-api-java
            ${flink.version}
        

        
        
        
        
        

        
            org.apache.flink
            flink-table-api-scala_2.11
            ${flink.version}
        

        
            org.apache.flink
            flink-table-planner-blink_2.11
            ${flink.version}
        
        
            org.apache.flink
            flink-table-planner_2.11
            ${flink.version}
        
        
            org.apache.flink
            flink-jdbc_2.11
            ${flink.version}
        
        
            org.apache.flink
            flink-csv
            ${flink.version}
        

        
            org.apache.flink
            statefun-sdk
            2.0.0
        
        
            org.apache.flink
            statefun-flink-harness
            2.0.0
        


        
            com.alibaba
            fastjson
            1.2.60
        

        
            redis.clients
            jedis
            2.9.0
            
        
        
            org.apache.bahir
            flink-connector-redis_2.11
            1.0
        
        
        
            joda-time
            joda-time
            2.9.2
        

        
        
            org.apache.flink
            flink-connector-hive_2.11
            1.10.0
            
        

        
            org.apache.hive
            hive-exec
            1.1.0
            
        

        
        
            org.apache.hadoop
            hadoop-common
            2.6.0-cdh5.16.1
        

        
            org.apache.hadoop
            hadoop-hdfs
            2.6.0-cdh5.16.1
        

        
            org.apache.hadoop
            hadoop-client
            2.6.0-cdh5.16.1
        


    

    
        
            cloudera
            https://repository.cloudera.com/artifactory/cloudera-repos/
            
                true
            
            
                true
            
        
    

    
        
            
                org.apache.maven.plugins
                maven-shade-plugin
                3.1.0
                
                    false
                
                
                    
                        package
                        
                            shade
                        

                        
                            

                                
                                    
                                    
                                    
                                    
                                    
                                    
                                    batch.WordCount_demo
                                    developing_scala.kafka2RedisDemo_test
                                
                                
                                    reference.conf
                                
                            
                            
                                
                                    *:*:*:*
                                    
                                        META-INF/*.SF
                                        META-INF/*.DSA
                                        META-INF/*.RSA
                                    
                                
                            
                        
                    
                
            
            
                org.apache.maven.plugins
                maven-compiler-plugin
                
                    8
                    8
                    utf8
                
            
        
    


 

3,因为每个人的hive版本不一样,请参考官网信息 一目了然

 

https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/#connecting-to-hive

Flink实时数仓---1,Flink SQL读取kafka实操,已经遇到的问题.....持续更新吧_第2张图片

 

4,接下来把hive-site.xml配置文件给load下来放到一个路径,我这里是本地window10环境演示的,所以随便放了一个路径。

需要注意的就是有的是CDH  HDP环境,可能load下来的配置文件密码加密啊 有的属性没有,参考官网必须知道一下的属性:

 

 https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html


   
      javax.jdo.option.ConnectionURL
      jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
      metadata is stored in a MySQL server
   

   
      javax.jdo.option.ConnectionDriverName
      com.mysql.jdbc.Driver
      MySQL JDBC driver class
   

   
      javax.jdo.option.ConnectionUserName
      ...
      user name for connecting to mysql server
   

   
      javax.jdo.option.ConnectionPassword
      ...
      password for connecting to mysql server
   

   
       hive.metastore.uris
       thrift://localhost:9083
       IP address (or fully-qualified domain name) and port of the metastore host
   

   
       hive.metastore.schema.verification
       true
   

 

5,后面就是代码了。很简单的。百度找找 

 

package flink_sql

import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.table.api.{EnvironmentSettings, Table}
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.catalog.hive.HiveCatalog


/**
  * todo 从kafka读取数据创建catalog
  */
object Sql_source_kafka {
  def main(args: Array[String]): Unit = {

    import org.apache.flink.api.scala._

    val streamEnv = StreamExecutionEnvironment.getExecutionEnvironment
    streamEnv.setParallelism(1)
    //    streamEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    //    streamEnv.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
    val tableEnvSettings = EnvironmentSettings.newInstance()
      .useBlinkPlanner()
      .inStreamingMode()
      .build()

    val tableEnv = StreamTableEnvironment.create(streamEnv, tableEnvSettings)

    val catalog = new HiveCatalog(
      "rtdw", // catalog name
      "default", // default database
      "G:\\Flink SQL开发文件", // Hive config (hive-site.xml) directory
      "1.1.0" // Hive version
    )

    //todo 注册这个catalog
    tableEnv.registerCatalog("rtdw", catalog)

    //todo 使用这个catalog,这个表在内存
    tableEnv.useCatalog("rtdw")

    //todo 创建库
    //    val createDbSql1 = "CREATE DATABASE IF NOT EXISTS rtdw.default"
    //    val createDbSql1 = "USE DATABASE default"
    //    tableEnv.sqlUpdate(createDbSql1)

    //todo 存在哪些库
    val aa: Array[String] = tableEnv.listCatalogs()
    print(aa.toList)

    //todo 存在的表
    val tables = tableEnv.listTables()
    println(tables.toList)


    //todo kafka 数据
    val kafkaLogStr = "{\"eventType\": \"clickBuyNow\",\"userId\": \"97470180\",\"ts\": 1585136092541}"
    //    tableEnv.sqlUpdate("DROP TABLE Orders rtdw.ods.streaming_user_active_log2")

    val createTableSql_new =
      """CREATE TABLE flink_test_03 (
        |  eventType STRING,
        |  userId STRING,
        |  ts STRING
        |)
        | WITH
        |(
        |  'connector.type' = 'kafka',
        |  'connector.version' = '0.11',
        |  'connector.topic' = 'flink_test_topic',
        |  'connector.startup-mode' = 'earliest-offset',
        | 'connector.properties.zookeeper.connect' = 'node1:2181,node2:2181,node3:2181',
        | 'connector.properties.bootstrap.servers' = 'node11:9092,node2:9092,node3:9092',
        |  'connector.properties.group.id' = 'flink_test_1',
        |  'format.type' = 'json',
        |  'format.derive-schema' = 'true',
        |  'update-mode' = 'append'
        |)""".stripMargin

    tableEnv.sqlUpdate(createTableSql_new)

    val querySql =
      """SELECT eventType,
        |userId,
        |ts
        |FROM flink_test_03
      """.stripMargin
    val result: Table = tableEnv.sqlQuery(querySql)
    print("打印元数据信息")


    val rsss: DataStream[(String, String, String)] = tableEnv.toAppendStream[(String, String, String)](result)
    rsss.print()

    
    streamEnv.execute()

  }

}

 

6,执行之后我们可以在hive查看是否存在表信息,实际发现表存储,表字段不存在,通过mysql去查找也是:

Flink实时数仓---1,Flink SQL读取kafka实操,已经遇到的问题.....持续更新吧_第3张图片

Flink实时数仓---1,Flink SQL读取kafka实操,已经遇到的问题.....持续更新吧_第4张图片

Flink实时数仓---1,Flink SQL读取kafka实操,已经遇到的问题.....持续更新吧_第5张图片 

通过命令DESCRIBE FORMATTED flink_test_03; 发现可以打印这个变的信息,mysql库是不存在表字段信息的。Flink实时数仓---1,Flink SQL读取kafka实操,已经遇到的问题.....持续更新吧_第6张图片

 

 

7,如果是这种情况,我们只能是打印table的schme信息,手动去分析去存储,后续再研究怎么搞元数据的管理。

 

你可能感兴趣的:(Flink,实时数仓总结)