2019年13周 Kafka Connect for Hbase

Kafka Connect for Hbase

  • 官网地址
  • 说明文档
    • 特性
    • 要求
    • 假设
    • Properties
      • Example connector.properties file
    • 部署
      • 下载hbase-sink
      • hbase-site.xml to classpath
      • 建hbase表
      • 启动confluent
      • 启动hbase sink
      • 用 avro console测试
      • 数据库查询
  • 遇到的问题

官网地址

https://www.confluent.io/connector/kafka-connect-hbase-sink/

说明文档

2019年13周 Kafka Connect for Hbase_第1张图片
链接地址 https://github.com/nishutayal/kafka-connect-hbase/blob/master/README.md

特性

  • 支持Avro和json数据格式
  • 支持一个或多个column families 写入
  • kafka会根据字段选择conlumn family,如果只有一个column family,所有字段只写入一个conlum family, 不需要mapping定义
  • row key 查询

要求

  • Confluent 4.0
  • Kafka 1.0.0
  • HBase 1.4.0
  • JDK 1.8

假设

  • Hbase 表已经存在
  • column families已经建好

Properties

name data type required desciption
zookeeper.quorum string yes hbase集群的zookeeper
event.parser.class string yes 配置数据解析器,可以是AvroEventParser 或JsonEventParser
topics string yes list of kafka topics.
hbase..rowkey.columns string yes rowkey字段*(?逗号分隔)*
hbase..family string yes 配置column families,一个或多个,逗号分隔
hbase...columns string No 只有多个column families才需要配置,逗号分隔

Example connector.properties file

name=kafka-cdc-hbase
connector.class=io.svectors.hbase.sink.HBaseSinkConnector
tasks.max=1
topics=test
zookeeper.quorum=localhost:2181
event.parser.class=io.svectors.hbase.parser.AvroEventParser
hbase.test.rowkey.columns=id
hbase.test.rowkey.delimiter=|
hbase.test.family=c,d
hbase.test.c.columns=c1,c2
hbase.test.d.columns=d1,d2

实际配置

name=kafka-cdc-hbase
connector.class=io.svectors.hbase.sink.HBaseSinkConnector
tasks.max=1
topics=kafka_test
zookeeper.quorum=bigdata1:2181
event.parser.class=io.svectors.hbase.parser.AvroEventParser

# properties for hbase table 'kafka_test'
hbase.kafka_test.rowkey.columns=id
hbase.kafka_test.rowkey.delimiter=|
hbase.kafka_test.family=cf
# incase of more than one column family, define the column mapping
#hbase.kafka_test.family=c,d
#hbase.kafka_test.c.columns=c1,c2
#hbase.kafka_test.d.columns=d1,d2

部署

下载hbase-sink

2019年13周 Kafka Connect for Hbase_第2张图片
将下载的zip包nishutayal-kafka-connect-hbase-1.0.0.zip上传到服务器
具体操作

cd /home/kafka/confluent-5.1.2/share/java/
mkdir kafka-connect-hbase
unzip nishutayal-kafka-connect-hbase-1.0.0.zip
cp ./nishutayal-kafka-connect-hbase-1.0.0/lib/*  kafka-connect-hbase

hbase-site.xml to classpath

jar -uvf  hbase-site.xml

实际操作

jar -uvf kafka-connect-hbase-1.0.0.jar hbase-sit.xml

注: OpenJDK命令不能运行,可以在有ORacle的JDK上操作,再传过来,过再window系统上用压缩工具把hbase-site.xml添加到根目录里。

建hbase表

hbase shell
create 'kafka_test','cf'
list

启动confluent

启动hbase sink

export CLASSPATH=$CONFLUENT_HOME/share/java/kafka-connect-hbase/hbase-sink.jar

$CONFLUENT_HOME/bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-hbase/hbase-sink.properties

实际操作

export CLASSPATH=$CLASSPATH:/home/kafka/confluent-5.1.2/share/java/kafka-connect-hbase

bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-hbase/hbase-sink-connector.properties

用 avro console测试

bin/kafka-avro-console-producer --broker-list appserver5:9092 --topic kafka_test --property value.schema='{"type":"record","name":"record","fields":[{"name":"id","type":"int"}, {"name":"name", "type": "string"}]}'
#insert at prompt
{"id": 1, "name": "foo"}
{"id": 2, "name": "bar"}

数据库查询

hbase shell
list
scan ‘kafka_test’

在这里插入图片描述

遇到的问题

  1. bigdata1 和 bigdata3 的hbase的配置文件hbase-site.xml 不一样, 尤其zookeeper配置不一样,开始将bigdata1的hbase-site.xml添加到jar包中,结果连不上hbase,经查询zookeeper,发现和hbase-site.xml中的配置不一样。经查询bigdata3中的hbase-site.xml和实际zookeeper是一样的,替换后,好使。
  2. classpath不要有hbase-site.xml, 如果有且配置不对,也会导致从zookeeper重获取不到信息。
[2019-03-25 12:40:23,070] INFO WorkerSinkTask{id=kafka-cdc-hbase-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:303)
[2019-03-25 12:40:23,080] INFO Cluster ID: zP7zgprFQhm9lMJ8NDfwXg (org.apache.kafka.clients.Metadata:285)
[2019-03-25 12:40:23,081] INFO [Consumer clientId=consumer-1, groupId=connect-kafka-cdc-hbase] Discovered group coordinator appserver5:9092 (id: 2147483597 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:654)
[2019-03-25 12:40:23,083] INFO [Consumer clientId=consumer-1, groupId=connect-kafka-cdc-hbase] Revoking previously assigned partitions [] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:458)
[2019-03-25 12:40:23,084] INFO [Consumer clientId=consumer-1, groupId=connect-kafka-cdc-hbase] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:486)
[2019-03-25 12:40:23,095] INFO [Consumer clientId=consumer-1, groupId=connect-kafka-cdc-hbase] Successfully joined group with generation 1 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:450)
[2019-03-25 12:40:23,097] INFO [Consumer clientId=consumer-1, groupId=connect-kafka-cdc-hbase] Setting newly assigned partitions [kafka_test-0] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:289)
[2019-03-25 12:40:23,105] INFO [Consumer clientId=consumer-1, groupId=connect-kafka-cdc-hbase] Resetting offset for partition kafka_test-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher:583)
[2019-03-25 12:40:23,329] WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (org.apache.hadoop.util.NativeCodeLoader:62)
[2019-03-25 12:40:23,524] WARN hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.size (org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil:76)
[2019-03-25 12:40:23,558] INFO Process identifier=hconnection-0x64df6e5a connecting to ZooKeeper ensemble=bigdata1:2181 (org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:122)
[2019-03-25 12:40:23,596] WARN Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties (org.apache.hadoop.metrics2.impl.MetricsConfig:125)
[2019-03-25 12:40:23,611] INFO Scheduled snapshot period at 10 second(s). (org.apache.hadoop.metrics2.impl.MetricsSystemImpl:376)
[2019-03-25 12:40:23,612] INFO HBase metrics system started (org.apache.hadoop.metrics2.impl.MetricsSystemImpl:192)
[2019-03-25 12:40:23,624] INFO Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl (org.apache.hadoop.hbase.metrics.MetricRegistries:65)
[2019-03-25 12:40:23,667] INFO ClusterId read in ZooKeeper is null (org.apache.hadoop.hbase.client.ZooKeeperRegistry:107)
  1. topic和hbase的表名要一直,目前不清楚原因

2019年13周 Kafka Connect for Hbase_第3张图片

你可能感兴趣的:(kafka,hbase)