kafka学习--kafka connect部署

1. 部署独立模式的kafka connect

  在独立模式下,所有工作都在一个进程中执行。这种配置更容易设置和开始,但是不会从kafka connect集群特性中获益。

启动脚本如下:

> bin/connect-standalone.sh config/connect-standalone.properties connector1.properties [connector2.properties ...]

第一个参数是worker的配置,主要包含kafka connection的一些参数:序列化格式,提交偏移量频率。

The provided example

config/server.properties

# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=localhost:9092
 
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging

offset.flush.interval.ms=10000

 

# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Note: symlinks will be followed to discover dependencies or plugins.
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
#plugin.path=
  • bootstrap.servers : 用于引导连接到Kafka的Kafka服务器列表
  • key.converter : 转换器类,用于在Kafka连接格式和写入Kafka的序列化形式之间进行转换。常见格式的例子包括JSON和Avro
  • value.converter :转换器类,用于在Kafka连接格式和写入Kafka的序列化形式之间进行转换。
  • offset.storage.file.filename : 偏移量保存的文件名

 后面的参数是kafka connector的配置文件,

       name=local-file-sink

connector.class=FileStreamSink
tasks.max=1
file=test.sink.txt
topics=connect-test

2. 部署分布式的kafka connect

     分布式模式处理工作的自动平衡,允许您动态地向上(或向下)伸缩,并在活动任务、配置和偏移提交数据中提供容错功能。部署执行一下命令:

> bin/connect-distributed.sh config/connect-distributed.properties

在分布式模式下,Kafka Connect将偏移量、配置和任务状态存储在Kafka主题中。建议手动为偏移量、配置和状态创建主题,以实现所需的分区数量和复制因子。如果在启动Kafka Connect时还没有创建主题,那么将使用默认的分区数和复制因子自动创建主题,这可能不适合使用Kafka Connect。

connect-distributed.properties

# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##
 
# This file contains some of the configurations for the Kafka Connect distributed worker. This file is intended
# to be used with the examples, and some settings may differ from those used in a production system, especially
# the `bootstrap.servers` and those specifying replication factors.
 
# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
bootstrap.servers=localhost:9092
 
# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster
 
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
 
# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
#offset.storage.partitions=25
 
# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
config.storage.topic=connect-configs
config.storage.replication.factor=1
 
# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
status.storage.topic=connect-status
status.storage.replication.factor=1
#status.storage.partitions=5
 
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000
 
# These are provided to inform the user about the presence of the REST host and port configs
# Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.
#rest.host.name=
#rest.port=8083
 
# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
#rest.advertised.host.name=
#rest.advertised.port=
 
# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
#plugin.path=

3. 配置kafka connectors

kafka connectors配置是简单的键值映射。对于独立模式,这些在属性文件中定义,并传递到命令行上的kafka Connect进程。在分布式模式下,它们将包含在JSON有效负载中,用于创建(或修改)kafka connectors的请求。

大多数配置都依赖于kafka connectors,因此不能在这里列出它们。然而,有几个常见的选择:

name : kafka connectors的唯一名称。试图用相同的名称重新注册将失败。

connector.class:连接器的Java类

tasks.max: 应该为这个连接器创建的最大任务数。如果连接器不能达到这种并行度,它可能会创建更少的任务。

key.converter:  转换器(可选)覆盖工作人员设置的默认密钥转换器。

value.converter: 转换器-(可选)覆盖工作程序设置的默认值转换器。

class配置支持几种格式:此连接器的类的全名或别名。如果连接器是org.apache.kafka.connect.file.FileStreamSinkConnector。您可以指定这个全名,也可以使用FileStreamSink或FileStreamSinkConnector来缩短配置。

Sink connectors 也有一些额外的选项来控制它们的输入。每个接收器连接器必须设置以下之一:

topics:  使用逗号分隔的主题列表作为此连接器的输入

topics.regex: 用作此连接器输入的主题的Java正则表达式

对于任何其他选项,您应该参考连接器的文档。

4. 配置transformations

kafka connectors可以配置transformations来进行轻量级的一次性消息修改。它们可以方便地进行数据按摩和事件路由。

A transformation chain can be specified in the connector configuration.

  • transforms :   List of aliases for the transformation, specifying the order in which the transformations will be applied.
  • transforms.$alias.type :  Fully qualified class name for the transformation.
  • transforms.$alias.$transformationSpecificConfig:  Configuration properties for the transformation

Kafka Connect包含了几个广泛适用的数据和路由transformations:

  • InsertField: 使用静态数据或记录元数据添加字段
  • ReplaceField:筛选或重命名字段
  • MaskField:用类型(0、空字符串等)的有效空值替换字段
  • ValueToKey
  • HoistField:   将整个事件包装为struct或map中的单个字段
  • ExtractField:  从Struct和Map中提取特定字段,并在结果中只包含该字段
  • SetSchemaMetadata:  修改Schema名称或版本
  • TimestampRouter: Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps
  • RegexRouter - modify the topic of a record based on original topic, replacement string and a regular expression

 

你可能感兴趣的:(Kafka)