该章节将介绍如何构建数据管道。
环境:
Kafka-2.1.1 + Kafka集群
可以运行worker集群,由于节点数量有限,这里使用一个worker。
connect-distributed.sh config/connect-distributed.properties
hadoop@master:~/kafka-2.1.1$ !curl
curl http://master:8083/connector-plugins
[{"class":"org.apache.kafka.connect.file.FileStreamSinkConnector","type":"sink","version":"2.1.1"},
{"class":"org.apache.kafka.connect.file.FileStreamSourceConnector","type":"source","version":"2.1.1"}]
注:下面的 http://slave3:8083 中的slave3 改为自己启动 connect 服务的主机名(例如,博主这使用的是slave3,下一个小节博主改成了master,你不用修改)
echo '{"name":"load-kafka-config", \
"config":{"connector.class":"FileStreamSource", \
"file":"/home/hadoop/kafka-2.1.1/config/server.properties", "topic":"kafka-config-topic"}}' | \
curl -X POST -d @- http://slave3:8083/connectors --header "content-Type:application/json"
kafka-console-consumer.sh --bootstrap-server master:9092 \
--topic kafka-config-topic --from-beginning
echo '{"name":"dump-kafka-config", \
"config":{"connector.class":"FileStreamSink", \
"file":"/home/hadoop/kafka-2.1.1/config/copy-server.properties", \
"topics":"kafka-config-topic"}}' | \
curl -X POST -d @- http://slave3:8083/connectors --header "content-Type:application/json"
connector.class改为数据池(FileStreamSink),topic改为topics(因为数据池可以将多个主题写入一个文件,数据源只能被写入一个主题)
查看:
删除连接器:
curl -X DELETE http://slave3:8083/connectors/load-kafka-config
安装elasticsearch 和 jdbc 的连接器。
下载:
链接:https://pan.baidu.com/s/10FrQG1QxfD_Khevvm3M_aQ
提取码:vlfj
tar -zxvf elasticsearch-6.7.0.tar.gz -C ~/
mkdir kafka-2.1.1/plugins
# 直接在控制台执行
CLASSPATH=/home/hadoop/kafka-2.1.1/libs/mysql-connector-java-5.1.44-bin.jar
/bin/elasticsearch
connect-distributed.sh config/connect-distributed.properties
mysql> create database test
-> ;
Query OK, 1 row affected (0.00 sec)
mysql> use test
Database changed
mysql> create table login(username varchar(30), login_time datetime);
Query OK, 0 rows affected (0.11 sec)
mysql> insert into login values('jack', now());
Query OK, 1 row affected (0.11 sec)
mysql> insert into login values('tom', now());
Query OK, 1 row affected (0.05 sec)
mysql> insert into login values('rose', now());
Query OK, 1 row affected (0.16 sec)
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
echo '{"name":"mysql-login-connector", \
"config":{"connector.class":"JdbcSourceConnector", \
"connection.url":"jdbc:mysql://127.0.0.1:3306/test?user=root&password=123456", \
"mode":"timestamp", "table.whitelist":"login", "validate.non.null":false, \
"timestamp.column.name":"login_time", "topic.prefix":"mysql."}}' \
| curl -X POST -d @- http://master:8083/connectors --header "content-Type:application/json"
结果:
8. 查看mysql.login 主题数据:
与此同时,如果你在mysql 新插入一条数据,在kafka中会同步打印出来(没有关闭读取消息的进程)
echo '{"name":"elastic-login-connector", \
"config":{"connector.class":"ElasticsearchSinkConnector", \
"connection.url":"http://localhost:9200", \
"type.name":"mysql-data", "topics":"mysql.login", \
"key.ignore":true}}' | \
curl -X POST -d @- http://master:8083/connectors --header "content-Type:application/json"
结果:
解析:connection.url 是elasticsearch服务器地址。默认情况下,每个kafka主题对应elasticsearch里的一个索引,主题名字与索引名字相同。type.namemysql-data为写入的数据定义类型。在创建mysql数据表时未指定主键,这里使用主题名字、分区id和偏移量作为主键,所以设置key.ignore=true。
原因:
启动connector 时,没有加载 elasticsearch配置,将启动的connect关闭:
创建如下文件:
kafka-2.1.1/config/quickstart-elasticsearch.properties
在 quickstart-elasticsearch.properties 中添加如下内容:
##
# Copyright 2016 Confluent Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=test-elasticsearch-sink
key.ignore=true
connection.url=http://localhost:9200
type.name=kafka-connect
再次启动connect :
connect-distributed.sh connect-distributed.properties quickstart-elasticsearch.properties
除了已有的连接器,也可以构建自己的连机器。
其他连接器:https://www.confluent.io/product/connertors