kafka连接presto的demo

1.先安装kafka、presto 确保服务可用
2.参考官网的demo
https://prestodb.io/docs/0.266/connector/kafka-tutorial.html

  1. 切换到kafka的安装路径
curl -o kafka-tpch https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_0811-1.0.sh

给权限
chmod 755 kafka-tpch

运行命令,加载数据(注意端口号)
./kafka-tpch load --brokers localhost:9092 --prefix tpch. --tpch-type tiny

2022-02-11T17:18:12.480+0800     INFO   main    io.airlift.log.Logging  Logging to stderr
2022-02-11T17:18:12.508+0800     INFO   main    de.softwareforge.kafka.LoadCommand      Processing tables: [customer, orders, lineitem, part, partsupp, supplier, nation, region]
2022-02-11T17:18:13.004+0800     INFO   pool-1-thread-1 de.softwareforge.kafka.LoadCommand      Loading table 'customer' into topic 'tpch.customer'...
2022-02-11T17:18:13.005+0800     INFO   pool-1-thread-2 de.softwareforge.kafka.LoadCommand      Loading table 'orders' into topic 'tpch.orders'...
2022-02-11T17:18:13.005+0800     INFO   pool-1-thread-4 de.softwareforge.kafka.LoadCommand      Loading table 'part' into topic 'tpch.part'...
2022-02-11T17:18:13.005+0800     INFO   pool-1-thread-5 de.softwareforge.kafka.LoadCommand      Loading table 'partsupp' into topic 'tpch.partsupp'...
2022-02-11T17:18:13.006+0800     INFO   pool-1-thread-3 de.softwareforge.kafka.LoadCommand      Loading table 'lineitem' into topic 'tpch.lineitem'...
2022-02-11T17:18:13.007+0800     INFO   pool-1-thread-7 de.softwareforge.kafka.LoadCommand      Loading table 'nation' into topic 'tpch.nation'...
2022-02-11T17:18:13.007+0800     INFO   pool-1-thread-6 de.softwareforge.kafka.LoadCommand      Loading table 'supplier' into topic 'tpch.supplier'...
2022-02-11T17:18:13.012+0800     INFO   pool-1-thread-8 de.softwareforge.kafka.LoadCommand      Loading table 'region' into topic 'tpch.region'...
2022-02-11T17:18:17.432+0800    ERROR   pool-1-thread-8 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.region
2022-02-11T17:18:17.819+0800     INFO   pool-1-thread-8 de.softwareforge.kafka.LoadCommand      Generated 5 rows for table 'region'.
2022-02-11T17:18:17.963+0800    ERROR   pool-1-thread-3 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.lineitem
2022-02-11T17:18:18.576+0800    ERROR   pool-1-thread-6 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.supplier
2022-02-11T17:18:18.793+0800     INFO   pool-1-thread-6 de.softwareforge.kafka.LoadCommand      Generated 100 rows for table 'supplier'.
2022-02-11T17:18:19.050+0800    ERROR   pool-1-thread-4 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.part
2022-02-11T17:18:19.594+0800    ERROR   pool-1-thread-2 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.orders
2022-02-11T17:18:20.206+0800    ERROR   pool-1-thread-1 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.customer
2022-02-11T17:18:20.769+0800    ERROR   pool-1-thread-5 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.partsupp
2022-02-11T17:18:21.056+0800    ERROR   pool-1-thread-7 kafka.producer.async.DefaultEventHandler        Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: tpch.nation
2022-02-11T17:18:21.307+0800     INFO   pool-1-thread-7 de.softwareforge.kafka.LoadCommand      Generated 25 rows for table 'nation'.
2022-02-11T17:18:22.926+0800     INFO   pool-1-thread-4 de.softwareforge.kafka.LoadCommand      Generated 2000 rows for table 'part'.
2022-02-11T17:18:23.188+0800     INFO   pool-1-thread-1 de.softwareforge.kafka.LoadCommand      Generated 1500 rows for table 'customer'.
2022-02-11T17:18:25.367+0800     INFO   pool-1-thread-5 de.softwareforge.kafka.LoadCommand      Generated 8000 rows for table 'partsupp'.
2022-02-11T17:18:28.902+0800     INFO   pool-1-thread-2 de.softwareforge.kafka.LoadCommand      Generated 15000 rows for table 'orders'.
2022-02-11T17:18:30.358+0800     INFO   pool-1-thread-3 de.softwareforge.kafka.LoadCommand      Generated 60175 rows for table 'lineitem'.

  1. 在etc/catalog/kafka.properties中进行配置

connector.name=kafka
kafka.nodes=localhost:9092
kafka.table-names=tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp,tpch.supplier,tpch.nation,tpch.region
kafka.hide-internal-columns=false

5.配置完重启presto

6.通过客户端登录

./presto --catalog kafka --schema tpch
presto:tpch> show tables;
  Table
----------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 presto
 region
 supplier
(9 rows)

查看表结构

presto:tpch> DESCRIBE customer;
      Column       |  Type   | Extra |                   Comment
-------------------+---------+-------+---------------------------------------------
 _partition_id     | bigint  |       | Partition Id
 _partition_offset | bigint  |       | Offset for the message within the partition
 _key              | varchar |       | Key text
 _key_corrupt      | boolean |       | Key data is corrupt
 _key_length       | bigint  |       | Total number of key bytes
 _message          | varchar |       | Message text
 _message_corrupt  | boolean |       | Message data is corrupt
 _message_length   | bigint  |       | Total number of message bytes
presto:tpch> SELECT count(*) FROM customer;
 _col0
-------
  1500

presto:tpch> SELECT _message FROM customer LIMIT 5;
                                                                                                                                                 _message
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {"rowNumber":1,"customerKey":1,"name":"Customer#000000001","address":"IVhzIApeRb ot,c,E","nationKey":15,"phone":"25-989-741-2988","accountBalance":711.56,"marketSegment":"BUILDING","comment":"to the even, regular platelets. regular, ironic epitaphs nag e"}
 {"rowNumber":3,"customerKey":3,"name":"Customer#000000003","address":"MG9kdTD2WBHm","nationKey":1,"phone":"11-719-748-3364","accountBalance":7498.12,"marketSegment":"AUTOMOBILE","comment":" deposits eat slyly ironic, even instructions. express foxes detect slyly. blithel
 {"rowNumber":5,"customerKey":5,"name":"Customer#000000005","address":"KvpyuHCplrB84WgAiGV6sYpZq7Tj","nationKey":3,"phone":"13-750-942-6364","accountBalance":794.47,"marketSegment":"HOUSEHOLD","comment":"n accounts will have to unwind. foxes cajole accor"}
 {"rowNumber":7,"customerKey":7,"name":"Customer#000000007","address":"TcGe5gaZNgVePxU5kRrvXBfkasDTea","nationKey":18,"phone":"28-190-982-9759","accountBalance":9561.95,"marketSegment":"AUTOMOBILE","comment":"ainst the ironic, express theodolites. express, even pinto bean
 {"rowNumber":9,"customerKey":9,"name":"Customer#000000009","address":"xKiAFTjUsCuxfeleNqefumTrjS","nationKey":8,"phone":"18-338-906-3675","accountBalance":8324.07,"marketSegment":"FURNITURE","comment":"r theodolites according to the requests wake thinly excuses: pending
(5 rows)

presto:tpch> SELECT sum(cast(json_extract_scalar(_message, '$.accountBalance') AS double)) FROM customer LIMIT 10;
   _col0
------------
 6681865.59
(1 row)

配置主题文件 默认在/etc/kafka下,配置完需要重启presto

{
    "tableName": "customer",
    "schemaName": "tpch",
    "topicName": "tpch.customer",
    "key": {
        "dataFormat": "raw",
        "fields": [
            {
                "name": "kafka_key",
                "dataFormat": "LONG",
                "type": "BIGINT",
                "hidden": "false"
            }
        ]
    },
    "message": {
        "dataFormat": "json",
        "fields": [
            {
                "name": "row_number",
                "mapping": "rowNumber",
                "type": "BIGINT"
            },
            {
                "name": "customer_key",
                "mapping": "customerKey",
                "type": "BIGINT"
            },
            {
                "name": "name",
                "mapping": "name",
                "type": "VARCHAR"
            },
            {
                "name": "address",
                "mapping": "address",
                "type": "VARCHAR"
            },
            {
                "name": "nation_key",
                "mapping": "nationKey",
                "type": "BIGINT"
            },
            {
                "name": "phone",
                "mapping": "phone",
                "type": "VARCHAR"
            },
            {
                "name": "account_balance",
                "mapping": "accountBalance",
                "type": "DOUBLE"
            },
            {
                "name": "market_segment",
                "mapping": "marketSegment",
                "type": "VARCHAR"
            },
            {
                "name": "comment",
                "mapping": "comment",
                "type": "VARCHAR"
            }
        ]
    }
}
  1. 重启完成以后,再次登录查看
presto:tpch> DESCRIBE customer;
      Column       |  Type   | Extra |                   Comment
-------------------+---------+-------+---------------------------------------------
 kafka_key         | bigint  |       |
 row_number        | bigint  |       |
 customer_key      | bigint  |       |
 name              | varchar |       |
 address           | varchar |       |
 nation_key        | bigint  |       |
 phone             | varchar |       |
 account_balance   | double  |       |
 market_segment    | varchar |       |
 comment           | varchar |       |
 _partition_id     | bigint  |       | Partition Id
 _partition_offset | bigint  |       | Offset for the message within the partition
 _key              | varchar |       | Key text
 _key_corrupt      | boolean |       | Key data is corrupt
 _key_length       | bigint  |       | Total number of key bytes
 _message          | varchar |       | Message text
 _message_corrupt  | boolean |       | Message data is corrupt
 _message_length   | bigint  |       | Total number of message bytes
(21 rows)


presto:tpch> SELECT * FROM customer LIMIT 5;
 kafka_key | row_number | customer_key |        name        |                address                | nation_key |      phone      | account_balance | market_segment |                                                      comment
-----------+------------+--------------+--------------------+---------------------------------------+------------+-----------------+-----------------+----------------+---------------------------------------------------------------------------------------------------------
         1 |          2 |            2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak        |         13 | 23-768-687-3665 |          121.65 | AUTOMOBILE     | l accounts. blithely ironic theodolites integrate boldly: caref
         3 |          4 |            4 | Customer#000000004 | XxVSJsLAGtn                           |          4 | 14-128-190-5944 |         2866.83 | MACHINERY      |  requests. final, regular ideas sleep final accou
         5 |          6 |            6 | Customer#000000006 | sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn  |         20 | 30-114-968-4951 |         7638.57 | AUTOMOBILE     | tions. even deposits boost according to the slyly bold packages. final accounts cajole requests. furious
         7 |          8 |            8 | Customer#000000008 | I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5 |         17 | 27-147-574-9335 |         6819.74 | BUILDING       | among the slyly regular theodolites kindle blithely courts. carefully even theodolites haggle slyly alon
         9 |         10 |           10 | Customer#000000010 | 6LrEaV6KR6PLVcgl2ArL Q3rqzLzcT1 v2    |          5 | 15-741-346-9870 |         2753.54 | HOUSEHOLD      | es regular deposits haggle. fur
(5 rows)

8.kafka的主题文件目录指定,需要配置参数。
kafka.table-description-dir=/etc/kafka

kafka.table-description-dir
部署中引用一个文件夹,其中包含一个或多个JSON文件(必须以.json结尾),其中包含表描述文件。此属性是可选的;默认值为etc/kafka。

你可能感兴趣的:(kafka连接presto的demo)