Debezium for MySQL+Kafka+Confluent Schema Registry环境搭建


  • 1.Kafka:kafka_2.11-2.0.0.tgz
  • 2.Confluent:confluent-oss-5.0.0-2.11.tar.gz
  • 3.Debezium:debezium-connector-mysql-0.8.1.Final-plugin.tar.gz
ali-18 *.*.*.18
ali-36 *.*.*.36
ali-37 *.*.*.37


解压命令:tar -xzf debezium-connector-mysql-0.8.1.Final-plugin.tar.gz 

       注意:包含mysql连接器的jar包的文件夹debezium-connector-mysql一定要直接放在插件目录下。不同的插件的jar包放在不通的文件夹下,可以防止jar包冲突,因为不同的插件的jar包是隔离的。三台Kafka connect worker机器上的Confluent插件目录下都要有插件文件夹(因为connector提交到一个分布式的worker集群后,不一定在哪台worker上调度运行)。

       由于笔者需要用Avro格式的kafka消息和分布式的kafka connect,因此需要修改Confluent的schema-registry下的配置:

       Kafka Connect worker的配置文件connect-avro-distributed.properties的配置如下:

# Sample configuration for a distributed Kafka Connect worker that uses Avro serialization and
# integrates the the Schema Registry. This sample configuration assumes a local installation of
# Confluent Platform with all services running on their default ports.

# Bootstrap Kafka servers. If multiple servers are specified, they should be comma-separated.

# The group ID is a unique identifier for the set of workers that form a single Kafka Connect
# cluster

# The converters specify the format of data in Kafka and how to translate it into Connect data.
# Every Connect user will need to configure these based on the format they want their data in
# when loaded from or stored into Kafka

# Internal Storage Topics.
# Kafka Connect distributed workers store the connector and task configurations, connector offsets,
# and connector statuses in three internal topics. These topics MUST be compacted.
# When the Kafka Connect distributed worker starts, it will check for these topics and attempt to create them
# as compacted topics if they don't yet exist, using the topic name, replication factor, and number of partitions
# as specified in these properties, and other topic-specific settings inherited from your brokers'
# auto-creation settings. If you need more control over these other topic-specific settings, you may want to
# manually create these topics before starting Kafka Connect distributed workers.
# The following properties set the names of these three internal topics for storing configs, offsets, and status.

# The following properties set the replication factor for the three internal topics, defaulting to 3 for each
# and therefore requiring a minimum of 3 brokers in the cluster. Since we want the examples to run with
# only a single broker, we set the replication factor here to just 1. That's okay for the examples, but
# ALWAYS use a replication factor of AT LEAST 3 for production environments to reduce the risk of
# losing connector offsets, configurations, and status.

# The config storage topic must have a single partition, and this cannot be changed via properties.
# Offsets for all connectors and tasks are written quite frequently and therefore the offset topic
# should be highly partitioned; by default it is created with 25 partitions, but adjust accordingly
# with the number of connector tasks deployed to a distributed worker cluster. Kafka Connect records
# the status less frequently, and so by default the topic is created with 5 partitions.

# The offsets, status, and configurations are written to the topics using converters specified through
# the following required properties. Most users will always want to use the JSON converter without schemas.
# Offset and config data is never visible outside of Connect in this format.

# Confluent Control Center Integration -- uncomment these lines to enable Kafka client interceptors
# that will report audit data that can be displayed and analyzed in Confluent Control Center
# producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
# consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor

# These are provided to inform the user about the presence of the REST host and port configs
# Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.

# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.

# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
# Replace the relative path below with an absolute path if you are planning to start Kafka Connect from within a
# directory other than the home directory of Confluent Platform.

       其实就是配置了kafka集群,schema-registry-url还有开放了一个rest端口18083(用于向kafka connect worker 提交 connector配置)。

下面需要编写mysql connector的配置信息了,先创建一个目录用于存放配置信息(connector配置信息只要放在一台机器上就行了,用curl命令提交的kafka connect的worker上之后就可以删除掉了,建议保留,以后可以参考或者修改):

debezium mysql connector配置如下:

    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "",
    "database.port": "3308",
    "database.user": "**",
    "database.password": "**",
    "": "184000",
    "": "prod",
    "database.history.kafka.bootstrap.servers": "ali-18:9092,ali-36:9092,ali-37:9092",
    "database.history.kafka.topic": "" ,
    "include.schema.changes": "true" ,

cd /opt/confluent-5.0.0/ && ./bin/schema-registry-start -daemon ./etc/schema-registry/
3.启动kafka connect worker(三台机器上都要执行)
cd /opt/confluent-5.0.0/ &&./bin/connect-distributed -daemon ./etc/schema-registry/

[root@ali-37 kafka-connect-debezium]#cd /opt/confluent-5.0.0/etc/kafka-connect-debezium
[root@ali-37 kafka-connect-debezium]#curl -X POST -H "Content-Type: application/json" --data@debezium_mysql_source_affair.json http://ali-36:18083/connectors

如果一切正常,应该可以看到kafka集群上多了一些和表名相关的topic,topic命名规则为debezium mysql connector配置文件中配置的serverName.databaseName.tableName。
