想要更全面了解Spark内核和应用实战,可以购买我的新书。
《图解Spark 大数据快速分析实战》(王磊)
Doris通过Routine Load支持将Kafka数据接入Doris
Kafka目前(2022-07-18)版本已经到了3.2.0,但是客户反馈他们线上Kafka集群是0.8.0版本,想要测试兼容性。
Doirs Kafka客户端通过librdkafka实现,librdkafka通过broker.version.fallback和api.version.request来兼容历史的Kafka版本。具体参数参照:https://docs.confluent.io/3.1.1/clients/librdkafka/CONFIGURATION_8md.html。具体解释为:
api.version.request:取值范围[true,false],默认值为false。
Request broker’s supported API versions to adjust functionality to available protocol features. If set to false the fallback version
broker.version.fallback
will be used. NOTE: Depends on broker version >=0.10.0. If the request is not supported by (an older) broker thebroker.version.fallback
fallback is used.
Type: boolean
broker.version.fallback:兼容历史的版本号,默认为0.9.0。
broker.version.fallback:Older broker versions (<0.10.0) provides no way for a client to query for supported protocol features (ApiVersionRequest, see
api.version.request
) making it impossible for the client to know what features it may use. As a workaround a user may set this property to the expected broker version and the client will automatically adjust its feature set accordingly if the ApiVersionRequest fails (or is disabled). The fallback broker version will be used forapi.version.fallback.ms
. Valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0.
Type: string
Doris默认支持 Kafka 0.10.0.0(含) 以上版本。如果要使用 Kafka 0.10.0.0 以下版本 (0.9.0.x, 0.8.x.y),需要修改 be 的配置,将 kafka_broker_version_fallback 的值设置为要兼容的旧版本,并将 kafka_api_version_request 的值设置为 false,或者在创建routine load的时候直接设置 property.broker.version.fallback 的值为要兼容的旧版本 并将 property.api.version.request 的值设置为 false。
https://archive.apache.org/dist/kafka/0.8.0/kafka_2.8.0-0.8.0.tar.gz
tar -xzvf kafka_2.8.0-0.8.0.tar.gz
从官网我们可以看到该版本的Released时间为:December 3, 2013
nohup bin/zookeeper-server-start.sh config/zookeeper.properties &
nohup bin/kafka-server-start.sh config/server.properties &
Kafka Topic创建
bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 1 --partition 1 --topic test
#登录MySQL
mysql -u root -h 127.0.0.1 -P 9030
# 创建数据库
create database kakfa_doris;
#切换数据库
use kakfa_doris;
#创建clicklog表
CREATE TABLE IF NOT EXISTS kakfa_doris.clicklog
(
`clickTime` DATETIME NOT NULL COMMENT "点击时间",
`type` VARCHAR(10) NOT NULL COMMENT "点击类型",
`id` VARCHAR(100) COMMENT "唯一id",
`user` VARCHAR(100) COMMENT "用户名称",
`city` VARCHAR(50) COMMENT "所在城市"
)
DUPLICATE KEY(`clickTime`, `type`)
DISTRIBUTED BY HASH(`type`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
CREATE ROUTINE LOAD kakfa_doris.load_from_kafka_test ON clicklog
COLUMNS(clickTime,id,type,user)
PROPERTIES
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"format" = "json"
)
FROM KAFKA
(
"kafka_broker_list" = "127.0.0.1:9092",
"kafka_topic" = "test",
"property.group.id" = "doris",
"property.broker.version.fallback"="0.8.0",
"property.api.version.request"="false"
);
这里通过如下参数实现了对Kafka0.8.0的支持:
输入如下命令,进入Kafka生产者控制台:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
在控制台输入如下数据并回车,将数据发送到Kafka:
{"clickTime":"2022-07-18","type":"click","id":"1","user":"alex","city":"xian"}
执行如下查询SQL,可以看到数据已经进入Doris
mysql> select * from clicklog;
+---------------------+-------+------+------+------+
| clickTime | type | id | user | city |
+---------------------+-------+------+------+------+
| 2022-07-18 00:00:00 | click | 1 | alex | NULL |
+---------------------+-------+------+------+------+
1 row in set (0.02 sec)
执行SHOW ALL ROUTINE LOAD FOR load_from_kafka_test \G;查看Routine Load任务。
mysql> SHOW ALL ROUTINE LOAD FOR load_from_kafka_test \G;
*************************** 1. row ***************************
Id: 179728
Name: load_from_kafka_test
CreateTime: 2022-07-18 07:12:02
PauseTime: NULL
EndTime: NULL
DbName: default_cluster:kakfa_doris
TableName: clicklog
State: RUNNING
DataSourceType: KAFKA
CurrentTaskNum: 1
JobProperties: {"timezone":"Europe/London","send_batch_parallelism":"1","load_to_single_tablet":"false","maxBatchSizeBytes":"209715200","exec_mem_limit":"2147483648","strict_mode":"false","jsonpaths":"","currentTaskConcurrentNum":"1","fuzzy_parse":"false","partitions":"*","columnToColumnExpr":"clickTime,id,type,user","maxBatchIntervalS":"5","whereExpr":"*","dataFormat":"json","precedingFilter":"*","mergeType":"APPEND","format":"json","json_root":"","deleteCondition":"*","desireTaskConcurrentNum":"3","maxErrorNum":"0","strip_outer_array":"false","execMemLimit":"2147483648","num_as_string":"false","maxBatchRows":"300000"}
DataSourceProperties: {"topic":"test","currentKafkaPartitions":"0","brokerList":"127.0.0.1:9092"}
CustomProperties: {"group.id":"doris","kafka_default_offsets":"OFFSET_END","api.version.request":"false","broker.version.fallback":"0.8.0"}
Statistic: {"receivedBytes":78,"runningTxns":[],"errorRows":0,"committedTaskNum":29,"loadedRows":1,"loadRowsRate":0,"abortedTaskNum":0,"errorRowsAfterResumed":0,"totalRows":1,"unselectedRows":0,"receivedBytesRate":0,"taskExecuteTimeMs":145264}
Progress: {"0":"1"}
Lag: {"0":0}
ReasonOfStateChanged:
ErrorLogUrls:
OtherMsg:
1 row in set (0.00 sec)
ERROR:
No query specified