Bottledwater同步PostgreSQL中的数据变化到Kafka消息队列

当我们遇到需要捕获数据库中数据变化的时候，总是会想到通过消息队列来实现该需求，通过把数据变化发布到消息队列，来完成系统上下游的解耦。关心这些数据变化的应用可以从消息队列上获取这些数据。

Bottledwater-pg是针对PostgreSQL数据库的一种消息生产者，可以将PostgreSQL数据库的数据写入confluent Kafka，从而实时的分享给消息订阅者。支持PostgreSQL 9.4以及以上版本，支持全量快照，以及持续解析数据WAL日志中的增量数据并写入Kafka。每一张数据库表为一个topic。数据在使用decode从WAL取出后，使用Avro将数据格式化(通常格式化为JSON)再写入Kafka。

Bottledwater-pg有docker、源码编译、Ubuntu三种使用方式，本文以源码编译方式说明如何部署。

一. 环境说明

源端


PostgreSQL ：postgresql-10.5

Kafka：kafka_2.11-2.3.0

Bottledwater-pg依赖以下软件包：

avro-c > =1.8.0

jansson

libcurl

librdkafka > =0.9.1

Bottledwater-pg可选以下软件包：

libsnappy

boost


另外编译要求较高的cmake版本，操作系统自带的cmake会出现编译错误，本文使用：

cmake-3.8.0

二. 安装前准备

2.1postgresql安装

配置用户和组

groupadd postgres

useradd postgres -g postgres

环境准备

yum install -y perl-ExtUtils-Embed readline-devel zlib-devel pam-devel libxml2-devel libxslt-devel openldap-devel python-devel gcc-c++ openssl-devel cmake gcc* readline-devel

权限配置

mkdir /opt/postgres

chown -R postgres:postgres /opt/postgres/

配置环境变量

vi /etc/profile

#在文件末尾将以下环境变量添加进去

export PATH=/opt/postgres/bin:$PATH

export PGHOME=/opt/postgres

export PGDATA=/opt/postgres/data/

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PGHOME/lib/

export PATH=$PGHOME/bin:$PATH:$HOME/bin```

2.1.1安装过程

安装数据库

cd 下载好的压缩包存放路径

#解压文件

tar -zxvf postgresql-10.5.tar.gz

cd postgresql-10.5

#参数根据自己需求配置

./configure --prefix=/opt/postgres/ --with-python --with-libxml --with-libxslt make make install

****，安装过程内容太长就不截图了，从屏显的信息最后看到PostgreSQL installation complete. 就说明安装好了，如果报错大多数都是安装包问题或者依赖没下载，看下错误信息基本都能解决****

初始化数据库

su postgres

#初始化数据库的参数也是根据自己需要添加 ，可以通过--help查看

/opt/postgres/bin/initdb -D $PGDATA -E UTF8

#如果出现以下message就说明初始化成功了

********************************************************************* creating directory /opt/postgres/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: /opt/postgres/bin/pg_ctl -D /opt/postgres/data -l logfile start ********************************************************************

#启动数据库服务 #数据文件和日志文件的路径根据自己需求指定。

/opt/postgres/bin/pg_ctl -D $PGDATA -l /opt/postgres/server.log start

到此postgres服务就安装完毕了。

2.1.2使用数据库

进入数据库

su postgres

#进入数据库

[postgres@localhost postgres]

$ psql Type "help" for help.

postgres=#

修改数据库配置，允许其他服务器连接

#postgres安装好以后需要修改2个配置文件才能允许别的服务器访问。

cd /opt/postgres/data

vi postgresql.conf

#找到listen_addresses和port参数，修改如下，也可根据自己需求修改

listen_addresses = '*'

port = 5432

#根据自己的网段设置下放行的ip规则

vi pg_hba.conf 
# IPv4 local connections:

host   all    all    192.168.0.0/16    md5

配置好以后就可以使用postgres数据库了。

2.2编译安装cmake

（root用户操作）

# cd /opt # tar zxvf cmake-3.8.0.tar.gz

# cd cmake-3.8.0

# ./bootstrap

# make

# make install

# cmake -version

cmake version 3.8.0

CMake suite maintained and supported by Kitware ([kitware.com/cmake](http://kitware.com/cmake)).

2.3编译安装jansson

（root用户操作）

# cd /opt

# tar jxvf jansson-2.9.tar.bz2

# cd jansson-2.9

# ./configure

# make

# make install

# ls /usr/local/lib/pkgconfig jansson.pc

#export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH

2.4编译安装avro

（root用户操作）

# cd /opt

# yum install -y xz-*

# yum install -y zlib-devel.x86_64

# tar zxvf avro-src-1.8.1.tar.gz

# cd avro-src-1.8.1/lang/c

# mkdir build # cd build

# cmake .. -DCMAKE_INSTALL_PREFIX=/opt/avro -DCMAKE_BUILD_TYPE=Release -DTHREADSAFE=true

# make

# make test

# make install

导入库文件

# vi /etc/[ld.so](http://ld.so/).conf /opt/avro/lib # ldconfig

配置临时环境变量

# export LD_LIBRARY_PATH=/opt/avro/lib:$LD_LIBRARY_PATH

# export PKG_CONFIG_PATH=/opt/avro/lib/pkgconfig:/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH

2.5安装libcurl

（root用户操作）

# yum install -y libcurl-devel.x86_64

2.6编译安装librdkafka

（root用户操作）

# cd /opt # unzip librdkafka-master.zip

# cd librdkafka-master

# ./configure

# make

# make install

# ls /usr/local/lib/pkgconfig

jansson.pc   rdkafka.pc    rdkafka++.pc

2.7添加引用库

（root用户操作）

# vi /etc/[ld.so](http://ld.so/).conf.d/bottledwater.conf

/opt/avro/lib

/usr/local/lib

/opt/postgres/lib #初始化数据库位置的引用库

# ldconfig

2.8编译安装bottledwater-pg

（root用户操作）

# chown -R postgres:postgres /opt/

(postgres用户操作）

配置环境变量

$ vi ~/.bash_profile

# .bash_profile

# Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi

# User specific environment and startup programs export

PG_HOME=/opt/postgres #初始化数据库位置

export LD_LIBRARY_PATH=/opt/avro/lib:$LD_LIBRARY_PATH

export PKG_CONFIG_PATH=/opt/avro/lib/[pkgconfig:/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH](http://pkgconfig/usr/local/lib/pkgconfig:%24PKG_CONFIG_PATH)

PATH=$PG_HOME/bin:$PATH:$HOME/bin

export PATH

准备安装包

$ unzip bottledwater-pg-master.zip

$ cd bottledwater-pg-master

这里不知道是操作系统环境的问题还是开源软件本身的问题，需要修改源码包里的2处Makefile才能通过编译。


     PG_CFLAGS = -I$(shell pg_config --includedir) -I$(shell pg_config --includedir-server) -I$(shell pg_config --pkgincludedir)

修改kafka/Makefile

    PG_CFLAGS = -I$(shell pg_config --includedir) -I$(shell pg_config --includedir-server) -I$(shell pg_config --pkgincludedir)

    LDFLAGS=-L/usr/lib64 $(CURL_LDFLAGS) $(PG_LDFLAGS) $(KAFKA_LDFLAGS) $(AVRO_LDFLAGS) $(JSON_LDFLAGS)

编译并安装bottledwater-pg

$ make

$ make install

安装完成后会自动在PostgreSQL数据库扩展包目录下生成扩展库文件和扩展库控制文件。

$ ls /opt/postgres/lib/bottledwater*

/opt/postgres/lib/[bottledwater.so](http://bottledwater.so/)

$ ls /opt/postgres/share/extension/bottledwater*

/opt/postgres/share/extension/bottledwater--0.1.sql

/opt/postgres/share/extension/bottledwater.control

2.9 kafka安装

Step 1: 下载代码

下载kafka_2.11-2.3.0版本并且解压它。

tar -xzf kafka_2.11-2.3.0.tgz

cd kafka_2.11-2.3.0

Step 2: 启动服务

运行kafka需要使用Zookeeper，所以你需要先启动Zookeeper，如果你没有Zookeeper，你可以使用kafka自带打包和配置好的Zookeeper。

bin/zookeeper-server-start.sh config/zookeeper.properties

[2019-10-23 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) ... 现在启动kafka服务

bin/kafka-server-start.sh config/server.properties &

[2019-10-23 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties) [2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties) ...

2.9数据库配置

试验环境中PostgreSQL已有数据库msc

msc库中有一张表：

CREATE TABLE COMPANY (

ID INT PRIMARY KEY NOT NULL,

NAME TEXT NOT NULL,

AGE INT NOT NULL,

ADDRESS CHAR (50),

SALARY REAL,

JOIN_DATE DATE

);

修改数据库配置文件

$ vi /opt/postgres/data/postgresql.conf

max_worker_processes = 8 # 至少为8

wal_level = logical # 至少为logical,可以更高

max_wal_senders = 8 # 至少为8

wal_keep_segments = 256 # 至少为256

max_replication_slots = 4 # 至少为4

修改数据库白名单配置文件

　　*这部分权限可能过大，可以精简

$ vi /opt/postgres/data//pg_hba.conf

local replication all trust

host replication all 127.0.0.1/32 trust

host replication all 0.0.0.0/0 md5

host all postgres 10.19.100.23/32 trust

重启数据库并对要监控的库创建bottledwater扩展

$ pg_ctl restart 
$ psql -U postgres -d mas -c "create extension bottledwater;"

CREATE EXTENSION

三测试同步

$cd /opt/bottledwater-pg-master

假设一切都在localhost的默认端口上运行，则可以按以下方式启动bottledwater：

$./kafka/bottledwater --postgres=postgres://localhost

第一次运行时，它将创建一个名为的复制插槽bottledwater，为数据库创建一致的快照，然后将其发送到Kafka。（您可以使用--slot 命令行标志来更改复制插槽的名称。）快照完成后，它将切换为使用复制流。

当您不再希望运行瓶装水时，必须删除其复制插槽（否则最终将用完磁盘空间，因为开放的复制插槽会阻止WAL进行垃圾回收）。您可以通过psql 再次打开并运行以下命令来执行此操作：

select pg_drop_replication_slot('bottledwater');

第二次运行则发送到kafka中

$./kafka/bottledwater --postgres=postgres://[email protected]:5432/msc --broker=39.108.83.108:9092 -f json

[INFO] Writing messages to Kafka in JSON format [INFO] Created replication slot "bottledwater", capturing consistent snapshot "00000718-1". INFO: bottledwater_export: Table mas.mastest is keyed by index mastest_pkey [INFO] Snapshot complete, streaming changes from 0/1749F58.

向源端数据库写入数据

INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE)

VALUES(6,'Paul',32,'Cal6',20000.00,'2001-08-13');

DELETE from COMPANY where ID= 1

UPDATE COMPANY set NAME='joker' WHERE ID= 6

目的端查看消息事件

查看topics列表

# bin/kafka-topics.sh --list --zookeeper localhost:2181

company

#./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic company --from-beginning --property print.key=true

{"id": {"int": 1}} {"id": {"int": 1}, "name": {"string": "Paul"}, "age": {"int": 32}, "address": {"string": "California "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 7, "day": 13}}}

{"id": {"int": 2}} {"id": {"int": 2}, "name": {"string": "Paul"}, "age": {"int": 32}, "address": {"string": "California "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 7, "day": 13}}}

{"id": {"int": 3}} {"id": {"int": 3}, "name": {"string": "Paul"}, "age": {"int": 32}, "address": {"string": "California "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 7, "day": 13}}}

null

null

{"id": {"int": 4}} {"id": {"int": 4}, "name": {"string": "Paul"}, "age": {"int": 32}, "address": {"string": "California "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 7, "day": 13}}}

{"id": {"int": 5}} {"id": {"int": 5}, "name": {"string": "Paul"}, "age": {"int": 32}, "address": {"string": "California5 "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 8, "day": 13}}}

{"id": {"int": 6}} {"id": {"int": 6}, "name": {"string": "Paul"}, "age": {"int": 32}, "address": {"string": "Cal6 "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 8, "day": 13}}}

{"id": {"int": 1}} null

{"id": {"int": 6}} {"id": {"int": 6}, "name": {"string": "zyc"}, "age": {"int": 32}, "address": {"string": "Cal6 "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 8, "day": 13}}}

{"id": {"int": 6}} {"id": {"int": 6}, "name": {"string": "joker"}, "age": {"int": 32}, "address": {"string": "Cal6 "}, "salary": {"float": 20000.0}, "join_date": {"Date": {"year": 2001, "month": 8, "day": 13}}}

说明：

1. 目的端Kafka配置文件中，Listener的配置不能用默认值localhost

2. 每一张PostgreSQL中的表都会被做为一个topic，每个topic是自动创建的不需要人工干预，即使后期新建的表也是如此。

3. 消息事件以“主键列 + 变更后的所有列”作为消息内容

4. 主从复制模式下，slave不能启动bottledwater,因为备库不产生wal日志

四bottledwater命令行选项

bottledwater客户端接受的各种命令行选项的参考，并带有指向相关文档区域的链接。如果这与的输出不一致bottledwater --help，则--help是正确的（请提交请求请求以更新此引用！）。

-d，--postgres=[postgres://user:pass@host:port/dbname](postgres://user:pass@hostport)（必需）：PostgreSQL服务器的连接字符串或URI。

-s，--slot=slotname （默认值：bottledwater）：[复制插槽的](https://github.com/confluentinc/bottledwater-pg#configuration)名称。该插槽是在首次使用时自动创建的。

-b，--broker=host1[:port1],host2[:port2]... （默认值：localhost：9092）：Kafka代理主机/端口的列表，以逗号分隔。

-r，--schema-registry=[http://hostname:port](http://hostnameport/)（默认值：[http：// localhost：8081](http://localhost:8081/)）：注册Avro模式的服务的URL。（仅用于 --output-format=avro。时省略--output-format=json。）

-f，--output-format=[avro|json] （默认值：avro）：如何编码用于写入Kafka的消息。请参见[输出格式的](https://github.com/confluentinc/bottledwater-pg#output-formats)讨论。

-u，--allow-unkeyed：允许导出没有主键的表。[默认情况下不允许](https://github.com/confluentinc/bottledwater-pg#consuming-data)这样做，因为更新和删除需要主键来标识其行。

-p，--topic-prefix=prefix：字符串，以加在所有[主题名称之前](https://github.com/confluentinc/bottledwater-pg#topic-names)。例如，使用 --topic-prefix=postgres，来自表“ users”的更新将被写入主题“ postgres.users”。

-e，--on-error=[log|exit] （默认值：exit）：如果出现暂时性错误（例如无法发布到Kafka），该怎么办。请参阅[错误处理的](https://github.com/confluentinc/bottledwater-pg#error-handling)讨论。

-x，--skip-snapshot：跳过采取[一致的快照](https://github.com/confluentinc/bottledwater-pg#configuration)的现有数据库的内容和刚开始流的任何新的更新。（忽略复制插槽是否已存在。）

-C，--kafka-config property=value：kafka制片设置全局配置属性（见[librdkafka文档](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md)）。

-T，--topic-config property=value：kafka生产者将主题配置属性（见[librdkafka文档](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md)）。

--config-help：打印Kafka配置属性列表。

-h，--help：打印此帮助文本。

Bottledwater同步PostgreSQL中的数据变化到Kafka消息队列

你可能感兴趣的:(Bottledwater同步PostgreSQL中的数据变化到Kafka消息队列)