Doris Routine Load接入Kafka0.8.0实战

Doris Routine Load接入Kafka0.8.0实战


想要更全面了解Spark内核和应用实战,可以购买我的新书。
《图解Spark 大数据快速分析实战》(王磊)


1. 问题产生的背景

  1. Doris通过Routine Load支持将Kafka数据接入Doris

  2. Kafka目前(2022-07-18)版本已经到了3.2.0,但是客户反馈他们线上Kafka集群是0.8.0版本,想要测试兼容性。

  3. Doirs Kafka客户端通过librdkafka实现,librdkafka通过broker.version.fallback和api.version.request来兼容历史的Kafka版本。具体参数参照:https://docs.confluent.io/3.1.1/clients/librdkafka/CONFIGURATION_8md.html。具体解释为:

    api.version.request:取值范围[true,false],默认值为false。

    Request broker’s supported API versions to adjust functionality to available protocol features. If set to false the fallback version broker.version.fallback will be used. NOTE: Depends on broker version >=0.10.0. If the request is not supported by (an older) broker the broker.version.fallback fallback is used.
    Type: boolean

    broker.version.fallback:兼容历史的版本号,默认为0.9.0。

    broker.version.fallback:Older broker versions (<0.10.0) provides no way for a client to query for supported protocol features (ApiVersionRequest, see api.version.request) making it impossible for the client to know what features it may use. As a workaround a user may set this property to the expected broker version and the client will automatically adjust its feature set accordingly if the ApiVersionRequest fails (or is disabled). The fallback broker version will be used for api.version.fallback.ms. Valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0.
    Type: string

  4. Doris默认支持 Kafka 0.10.0.0(含) 以上版本。如果要使用 Kafka 0.10.0.0 以下版本 (0.9.0.x, 0.8.x.y),需要修改 be 的配置,将 kafka_broker_version_fallback 的值设置为要兼容的旧版本,并将 kafka_api_version_request 的值设置为 false,或者在创建routine load的时候直接设置 property.broker.version.fallback 的值为要兼容的旧版本 并将 property.api.version.request 的值设置为 false。

2. Kafka 0.8.0部署和配置
  1. 下载kafka_2.8.0-0.8.0.tar.gz压缩包并解压。
    https://archive.apache.org/dist/kafka/0.8.0/kafka_2.8.0-0.8.0.tar.gz
    tar -xzvf kafka_2.8.0-0.8.0.tar.gz
    

    从官网我们可以看到该版本的Released时间为:December 3, 2013

  2. 启动Kafka内置Zookeeper
    nohup bin/zookeeper-server-start.sh config/zookeeper.properties &
    
  3. 启动Kafka
    nohup bin/kafka-server-start.sh config/server.properties &
    
  4. Kafka Topic创建

    bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 1 --partition 1 --topic test
    
3. Doris环境准备和数据导入

  1. 在Doris创建kakfa_doris数据库和clicklog表:
     #登录MySQL
     mysql -u root  -h 127.0.0.1 -P 9030
    # 创建数据库
    create database kakfa_doris;
    #切换数据库
    use kakfa_doris;
    #创建clicklog表
    CREATE TABLE IF NOT EXISTS kakfa_doris.clicklog
    (
        `clickTime` DATETIME NOT NULL COMMENT "点击时间",
        `type` VARCHAR(10) NOT NULL COMMENT "点击类型",
        `id`  VARCHAR(100) COMMENT "唯一id",
        `user` VARCHAR(100) COMMENT "用户名称",
        `city` VARCHAR(50) COMMENT "所在城市"
    )
    DUPLICATE KEY(`clickTime`, `type`)
    DISTRIBUTED BY HASH(`type`) BUCKETS 1
    PROPERTIES (
    "replication_allocation" = "tag.location.default: 1"
    );
    
    
  2. 创建Routine Load任务
    CREATE ROUTINE LOAD kakfa_doris.load_from_kafka_test ON clicklog
    COLUMNS(clickTime,id,type,user)
    PROPERTIES
    (
        "desired_concurrent_number"="3",
        "max_batch_interval" = "5",
        "max_batch_rows" = "300000",
        "max_batch_size" = "209715200",
        "strict_mode" = "false",
        "format" = "json"
    )
    FROM KAFKA
    (
        "kafka_broker_list" = "127.0.0.1:9092",
        "kafka_topic" = "test",
        "property.group.id" = "doris",
        "property.broker.version.fallback"="0.8.0",
        "property.api.version.request"="false"
     );
    
    

    这里通过如下参数实现了对Kafka0.8.0的支持:

    • “property.broker.version.fallback”=“0.8.0”,
    • “property.api.version.request”=“false”
  3. Kafka数据生产

输入如下命令,进入Kafka生产者控制台:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

在控制台输入如下数据并回车,将数据发送到Kafka:

{"clickTime":"2022-07-18","type":"click","id":"1","user":"alex","city":"xian"}
  1. 结果表查询和验证

    执行如下查询SQL,可以看到数据已经进入Doris

mysql> select * from clicklog;
+---------------------+-------+------+------+------+
| clickTime           | type  | id   | user | city |
+---------------------+-------+------+------+------+
| 2022-07-18 00:00:00 | click | 1    | alex | NULL |
+---------------------+-------+------+------+------+
1 row in set (0.02 sec)
  1. Routine Load任务查看

​ 执行SHOW ALL ROUTINE LOAD FOR load_from_kafka_test \G;查看Routine Load任务。

mysql> SHOW ALL ROUTINE LOAD FOR load_from_kafka_test \G;
*************************** 1. row ***************************
                  Id: 179728
                Name: load_from_kafka_test
          CreateTime: 2022-07-18 07:12:02
           PauseTime: NULL
             EndTime: NULL
              DbName: default_cluster:kakfa_doris
           TableName: clicklog
               State: RUNNING
      DataSourceType: KAFKA
      CurrentTaskNum: 1
       JobProperties: {"timezone":"Europe/London","send_batch_parallelism":"1","load_to_single_tablet":"false","maxBatchSizeBytes":"209715200","exec_mem_limit":"2147483648","strict_mode":"false","jsonpaths":"","currentTaskConcurrentNum":"1","fuzzy_parse":"false","partitions":"*","columnToColumnExpr":"clickTime,id,type,user","maxBatchIntervalS":"5","whereExpr":"*","dataFormat":"json","precedingFilter":"*","mergeType":"APPEND","format":"json","json_root":"","deleteCondition":"*","desireTaskConcurrentNum":"3","maxErrorNum":"0","strip_outer_array":"false","execMemLimit":"2147483648","num_as_string":"false","maxBatchRows":"300000"}
DataSourceProperties: {"topic":"test","currentKafkaPartitions":"0","brokerList":"127.0.0.1:9092"}
    CustomProperties: {"group.id":"doris","kafka_default_offsets":"OFFSET_END","api.version.request":"false","broker.version.fallback":"0.8.0"}
           Statistic: {"receivedBytes":78,"runningTxns":[],"errorRows":0,"committedTaskNum":29,"loadedRows":1,"loadRowsRate":0,"abortedTaskNum":0,"errorRowsAfterResumed":0,"totalRows":1,"unselectedRows":0,"receivedBytesRate":0,"taskExecuteTimeMs":145264}
            Progress: {"0":"1"}
                 Lag: {"0":0}
ReasonOfStateChanged: 
        ErrorLogUrls: 
            OtherMsg: 
1 row in set (0.00 sec)

ERROR: 
No query specified


你可能感兴趣的:(Doris,kafka,大数据,java)