测试mysql与greenplum实时同步

测试mysql与greenplum实时同步

一.测试服务器环境准备

1.集群服务器ip及主机名

192.168.10.225 artemis.hadoop.com
192.168.10.226 uranus.hadoop.com
192.168.10.227 ares.hadoop.com
192.168.10.228 bird.hadoop.com
192.168.10.229 lover.hadoop.com

  • 修改命令

    vim /etc/hosts

2.zookeeper环境

  • 节点服务器

    server.1=uranus.hadoop.com:2888:3888
    server.2=ares.hadoop.com:2888:3888
    server.3=bird.hadoop.com:2888:3888

    分别对应226 227 228

  • 安装位置

    /usr/local/zookeeper-3.4.6/bin/

  • 配置文件位置

    /usr/local/zookeeper-3.4.6/bin/../conf/zoo.cfg

  • 安装源文件位置

    /app/zookeeper-3.4.5.tar.gz

  • 下载地址https://archive.apache.org/dist/zookeeper/

  • 环境配置vim /etc/profile

    #zookeeper
    ZOOKEEPER_HOME=/usr/local/zookeeper-3.4.6/
    export PATH=$ZOOKEEPER_HOME/bin:$PATH
    
  • 启动、关闭、查看命令

        zkServer.sh start
        ZooKeeper JMX enabled by default
        Using config: /ryyf/apache-zookeeper-3.5.9-bin/bin/../conf/zoo.cfg
        Starting zookeeper ... STARTED
    
         zkServer.sh stop
        ZooKeeper JMX enabled by default
        Using config: /ryyf/apache-zookeeper-3.5.9-bin/bin/../conf/zoo.cfg
        Stopping zookeeper ... STOPPED
    
        zkServer.sh status
        ZooKeeper JMX enabled by default
        Using config: /ryyf/apache-zookeeper-3.5.9-bin/bin/../conf/zoo.cfg
        Client port found: 2181. Client address: localhost. Client SSL: false.
        Mode: follower
    

3.java环境

  • 多版本jdk安装并把jdk11设置为非默认版本

  • java 路径

    /app/jdk1.8.0_144/bin/java

    /app/jdk1.8.0_201/bin/java

    /app/jdk-11.0.12/bin/java

alternatives --install /usr/bin/java java /app/jdk-11.0.12/bin/java 300

alternatives --install /usr/bin/java java /app/jdk1.8.0_201/bin/java 1

alternatives --install /usr/bin/java java /app/jdk1.8.0_144/bin/java 100

alternatives --config java

There are 3 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
 + 1           /app/jdk1.8.0_201/bin/java
*  2           /app/jdk-11.0.12/bin/java
   3           /app/jdk1.8.0_144/bin/java
  • java版本

    java version "1.8.0_144"
    Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
    
  • 安装源文件位置

    /app/jdk-8u144-linux-x64.tar.gz

  • profile位置及修改

    vim /etc/profile
    JAVA_HOME=/app/jdk1.8.0_201
    JRE_HOME=$JAVA_HOME/jre
    PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:
    CLASSPATH=.:$JRE_HOME:/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:
    export PATH JAVA_HOME JRE_HOME CLASSPATH
    
    

4.kafka环境

  • kafka 版本 kafka_2.12-2.1.0
  • kafka位置:225 /app/kafka_2.12-2.1.0.jar
  • kafka集群主机:225 226 228
  • kafka配置文件:/usr/local/kafka/config/server.properties
  • kafka下载地址:http://archive.apache.org/dist/kafka/
  • kafka下载地址: wget http://archive.apache.org/dist/kafka/2.1.0/kafka_2.12-2.1.0.tgz
  • 启动关闭命令:/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties &
  • /usr/local/kafka/bin/kafka-server-stop.sh stop

5.maxwell环境

  • github项目地址:https://github.com/zendesk/maxwell

  • 网站地址:http://maxwells-daemon.io/

  • 下载地址:https://github.com/zendesk/maxwell/releases/download/v1.35.5/maxwell-1.35.5.tar.gz

  • curl -sLo - https://github.com/zendesk/maxwell/releases/download/v1.35.5/maxwell-1.35.5.tar.gz \
           | tar zxvf -
    cd maxwell-1.35.5
    
  • 文件地址: /app/maxwell-1.35.5 需要jdk11

  • 帮助文件地址:http://maxwells-daemon.io/quickstart/#google-cloud-pubsub

6.canel

  • 项目首页地址:https://github.com/alibaba/canal

  • 下载地址:

    https://github.com/alibaba/canal/releases

    https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz

  • 网上教程https://blog.csdn.net/yehongzhi1994/article/details/107880162

  • 版本:1.15

  • 路径地址:/app/canal.deployer-1.1.5.tar.gz

 mkdir /root/canal
 tar zxvf canal.deployer-1.0.23.tar.gz  -C /app/canal

二.各自配置文件及详细说明

1.jdk11多版本共存安装

2.zookeeper

3.kafka vim /usr/local/kafka/config/server.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# host.name=uranus.hadoop.com
############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
# listeners=PLAINTEXT://172.16.2.226:9092

# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
# advertised.listeners=PLAINTEXT://172.16.2.226:9092
advertised.host.name=uranus.hadoop.com
advertised.port=9092

zookeeper.connect=uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=40

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=24

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
参数名称 参数值 备注
broker.id 1 broker.id的值三个节点要配置不同的值,分别配置为0,1,2
advertised.host.name uranus.hadoop.com 在hosts文件配置kafka1域名,另外两台分别为:kafka2.sd.cn,kafka3.sd.cn
advertised.port 9092 默认端口,不需要改
log.dirs /tmp/kafka-logs
num.partitions 40 分区数,自行修改
log.retention.hours 24 日志保存时间
zookeeper.connect uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181 zookeeper连接地址,多个以逗号隔开
listeners PLAINTEXT://172.16.2.226:9092 就是主要用来定义Kafka Broker的Listener的配置项。本案不填写
advertised.listeners PLAINTEXT://172.16.2.226:9092 参数的作用就是将Broker的Listener信息发布到Zookeeper中,如果不配置采用listeners,本案不填写
hostname 本案不填写

4.canal配置过程

  • 创建用户

    CREATE USER 'canal'@'localhost' IDENTIFIED BY 'canal';
    GRANT ALL PRIVILEGES ON *.* TO 'canal'@'localhost' WITH GRANT OPTION;
    CREATE USER 'canal'@'%' IDENTIFIED BY 'canal';
    GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' WITH GRANT OPTION;
    flush privileges;
    
    
  • 修改配置文件:(如果是访问本机,并且用户密码都为canal则不需要修改配置文件)

    vi /app/canal/conf/example/instance.properties
    #################################################
    ## mysql serverId , v1.0.26+ will autoGen
    canal.instance.mysql.slaveId=225
    
    # enable gtid use true/false
    canal.instance.gtidon=true
    
    # position info
    canal.instance.master.address=192.168.10.216:3306
    canal.instance.master.journal.name=
    canal.instance.master.position=
    canal.instance.master.timestamp=
    canal.instance.master.gtid=
    
    # rds oss binlog
    canal.instance.rds.accesskey=
    canal.instance.rds.secretkey=
    canal.instance.rds.instanceId=
    
    # table meta tsdb info
    canal.instance.tsdb.enable=true
    #canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
    #canal.instance.tsdb.dbUsername=canal
    #canal.instance.tsdb.dbPassword=canal
    
    #canal.instance.standby.address =
    #canal.instance.standby.journal.name =
    #canal.instance.standby.position =
    #canal.instance.standby.timestamp =
    #canal.instance.standby.gtid=
    
    # username/password
    canal.instance.dbUsername=canal
    canal.instance.dbPassword=canal
    canal.instance.connectionCharset = UTF-8
    # enable druid Decrypt database password
    canal.instance.enableDruid=false
    #canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
    
    # table regex
    canal.instance.filter.regex=.*\\..*
    # table black regex
    canal.instance.filter.black.regex=mysql\\.slave_.*
    # table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
    #canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
    # table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
    #canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch
    
    # mq config
    canal.mq.topic=mysql_binlog
    # dynamic topic route by schema or table regex
    #canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
    canal.mq.partition=0
    # hash partition config
    #canal.mq.partitionsNum=3
    #canal.mq.partitionHash=test.table:id^name,.*\\..*
    #canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
    #################################################
    
  • 修改canal 配置文件

vi /app/canal/conf/canal.properties 
#################################################
#########    common argument        #############
#################################################
# tcp bind ip
canal.ip =
# register ip to zookeeper
canal.register.ip =
canal.port = 11111
canal.metrics.pull.port = 11112
# canal instance user/passwd
# canal.user = canal
# canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458

# canal admin config
#canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
#canal.admin.register.auto = true
#canal.admin.register.cluster =
#canal.admin.register.name =

canal.zkServers =
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ
canal.serverMode = kafka
# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n)
canal.instance.memory.buffer.size = 16384
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024
## meory store gets mode used MEMSIZE or ITEMSIZE
canal.instance.memory.batch.mode = MEMSIZE
canal.instance.memory.rawEntry = true

## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false

# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size =  1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60

# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30

# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
canal.instance.filter.transaction.entry = false
canal.instance.filter.dml.insert = false
canal.instance.filter.dml.update = false
canal.instance.filter.dml.delete = false

# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB

# binlog ddl isolation
canal.instance.get.ddl.isolation = false

# parallel parser config
canal.instance.parser.parallel = true
## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
#canal.instance.parser.parallelThreadSize = 16
## disruptor ringbuffer size, must be power of 2
canal.instance.parser.parallelBufferSize = 256

# table meta tsdb info
canal.instance.tsdb.enable = true
canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
canal.instance.tsdb.dbUsername = canal
canal.instance.tsdb.dbPassword = canal
# dump snapshot interval, default 24 hour
canal.instance.tsdb.snapshot.interval = 24
# purge snapshot expire , default 360 hour(15 days)
canal.instance.tsdb.snapshot.expire = 360

#################################################
#########               destinations            #############
#################################################
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
# set this value to 'true' means that when binlog pos not found, skip to latest.
# WARN: pls keep 'false' in production env, or if you know what you want.
canal.auto.reset.latest.pos.mode = false

canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml

canal.instance.global.mode = spring
canal.instance.global.lazy = false
canal.instance.global.manager.address = ${canal.admin.manager}
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml

##################################################
#########             MQ Properties      #############
##################################################
# aliyun ak/sk , support rds/mq
canal.aliyun.accessKey =
canal.aliyun.secretKey =
canal.aliyun.uid=

canal.mq.flatMessage = true
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
# Set this value to "cloud", if you want open message trace feature in aliyun.
canal.mq.accessChannel = local

canal.mq.database.hash = true
canal.mq.send.thread.size = 30
canal.mq.build.thread.size = 8

##################################################
#########                    Kafka                   #############
##################################################
kafka.bootstrap.servers = 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
kafka.acks = all
kafka.compression.type = none
kafka.batch.size = 16384
kafka.linger.ms = 1
kafka.max.request.size = 1048576
kafka.buffer.memory = 33554432
kafka.max.in.flight.requests.per.connection = 1
kafka.retries = 0

kafka.kerberos.enable = false
kafka.kerberos.krb5.file = "../conf/kerberos/krb5.conf"
kafka.kerberos.jaas.file = "../conf/kerberos/jaas.conf"

##################################################
#########                   RocketMQ         #############
##################################################
rocketmq.producer.group = test
rocketmq.enable.message.trace = false
rocketmq.customized.trace.topic =
rocketmq.namespace =
rocketmq.namesrv.addr = 127.0.0.1:9876
rocketmq.retry.times.when.send.failed = 0
rocketmq.vip.channel.enabled = false
rocketmq.tag =

##################################################
#########                   RabbitMQ         #############
##################################################
rabbitmq.host =
rabbitmq.virtual.host =
rabbitmq.exchange =
rabbitmq.username =
rabbitmq.password =
rabbitmq.deliveryMode =

三.各自环境测试

1.zookeeper测试

2.kafka环境测试

1、创建topic:test

/usr/local/kafka/bin/kafka-topics.sh --create --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181 --replication-factor 1 --partitions 1 --topic test

2、列出已创建的topic列表

/usr/local/kafka/bin/kafka-topics.sh --list --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181 


3、模拟客户端去发送消息 在其中一台测试!

/usr/local/kafka/bin/kafka-console-producer.sh --broker-list uranus.hadoop.com:9092,artemis.hadoop.com:9092,bird.hadoop.com:9092 --topic test

4、模拟客户端去接受消息 新版本的接收语句不一样不是 --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181 !!!

/usr/local/kafka/bin/kafka-console-consumer.sh  --bootstrap-server uranus.hadoop.com:9092,artemis.hadoop.com:9092,bird.hadoop.com:9092 --topic test --from-beginning
删除topic
/usr/local/kafka/bin/kafka-topics.sh  --delete --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181   --topic mysql_binlog

     (1)登录zookeeper客户端:命令:/usr/local/zookeeper-3.4.6/bin/zookeeper-client

     (2)找到topic所在的目录:ls /brokers/topics

     (3)找到要删除的topic,执行命令:rmr /brokers/topics/【topic name】即可,此时topic被彻底删除。
     
 删除kafka存储目录(server.properties文件log.dirs配置,默认为"/tmp/kafka-logs")相关topic目录

3.maxwell 测试 这玩意需要jdk11!!!

1.创建topic:mysql_binlog
/usr/local/kafka/bin/kafka-topics.sh --create --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181  -replication-factor 3 --partitions 5 --topic mysql_binlog

2.在后台启动mysql和maxwell
/app/maxwell-1.35.5/bin/maxwell --user='root' --password='1234@Abcd' --port=3306 --host='192.168.10.216' --producer=kafka \
--kafka.bootstrap.servers=uranus.hadoop.com:9092,ares.hadoop.com:9092,bird.hadoop.com:9092 --kafka_topic=mysql_binlog &

/app/maxwell-1.35.5/bin/maxwell --config config.properties

3.监控kafka中的数据
/usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server uranus.hadoop.com:9092,artemis.hadoop.com:9092,bird.hadoop.com:9092 --topic mysql_binlog --from-beginning
[2022-01-11 13:40:12,523] WARN [Consumer clientId=consumer-1, groupId=console-consumer-81729] Connection to node -2 (ares.hadoop.com/192.168.10.227:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
{"database":"test","table":"home","type":"insert","ts":1641870563,"xid":76127446,"commit":true,"data":{"id":31223,"tab":"123","values":"131231"}}
{"database":"test","table":"home","type":"insert","ts":1641871080,"xid":76127964,"commit":true,"data":{"id":666,"tab":"666","values":"66666"}}
{"database":"test","table":"home","type":"update","ts":1641871200,"xid":76128473,"commit":true,"data":{"id":66644,"tab":"666","values":"66666"},"old":{"id":666}}
{"database":"test","table":"home","type":"update","ts":1641871967,"xid":76129546,"commit":true,"data":{"id":4234234,"tab":"123","values":"123"},"old":{"id":88881}}
{"database":"test","table":"home","type":"update","ts":1641872049,"xid":76129898,"commit":true,"data":{"id":6666667,"tab":"5555","values":"555555"},"old":{"id":5555}}
{"database":"test","table":"home","type":"update","ts":1641872403,"xid":76130973,"commit":true,"data":{"id":243324,"tab":"123","values":"123"},"old":{"id":4234234}}


4.gpkafka配置及测试-maxwell版

DATABASE: test
USER: gpadmin
HOST: 192.168.10.227
PORT: 5432
VERSION: 2
KAFKA:
   INPUT:
      SOURCE:
        BROKERS: 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
        TOPIC: mysql_binlog
      VALUE:
        COLUMNS:
          - NAME: c1
            TYPE: json
        FORMAT: json
      ERROR_LIMIT: 100
   OUTPUT:
      SCHEMA: test
      TABLE: home
      MAPPING:
        - NAME: id
          EXPRESSION: (c1->'data'->>'id')::decimal
        - NAME: tab
          EXPRESSION: (c1->'data'->>'tab')::text
        - NAME: values
          EXPRESSION: (c1->'data'->>'values')::text
   COMMIT:
      MINIMAL_INTERVAL: 2000
  • 注意json格式!

kafka中的json数据格式:

插入

{"database":"test","table":"home","type":"insert","ts":1641885332,"xid":76159057,"commit":true,"data":{"id":5543223,"tab":"23","values":"2234"}}

修改

{"database":"test","table":"home","type":"update","ts":1641871200,"xid":76128473,"commit":true,"data":{"id":66644,"tab":"666","values":"66666"},"old":{"id":666}}
  • 示例https://segmentfault.com/a/1190000022567264 未成功调试

5.canal测试

  • 启动canal

    sh /app/canal/bin/startup.sh
    
    cd to /root/canal/bin for workaround relative path
    LOG CONFIGURATION : /root/canal/bin/../conf/logback.xml
    canal conf : /root/canal/bin/../conf/canal.properties
    CLASSPATH :/root/canal/bin/../conf:/root/canal/bin/../lib/zookeeper-3.4.5.jar:/root/canal/bin/../lib/zkclient-0.1.jar:/root/canal/bin/../lib/spring-2.5.6.jar:/root/canal/bin/../lib/slf4j-api-1.7.12.jar:/root/canal/bin/../lib/protobuf-java-2.6.1.jar:/root/canal/bin/../lib/oro-2.0.8.jar:/root/canal/bin/../lib/netty-3.2.5.Final.jar:/root/canal/bin/../lib/logback-core-1.1.3.jar:/root/canal/bin/../lib/logback-classic-1.1.3.jar:/root/canal/bin/../lib/log4j-1.2.14.jar:/root/canal/bin/../lib/jcl-over-slf4j-1.7.12.jar:/root/canal/bin/../lib/guava-18.0.jar:/root/canal/bin/../lib/fastjson-1.1.35.jar:/root/canal/bin/../lib/commons-logging-1.1.1.jar:/root/canal/bin/../lib/commons-lang-2.6.jar:/root/canal/bin/../lib/commons-io-2.4.jar:/root/canal/bin/../lib/commons-beanutils-1.8.2.jar:/root/canal/bin/../lib/canal.store-1.0.23.jar:/root/canal/bin/../lib/canal.sink-1.0.23.jar:/root/canal/bin/../lib/canal.server-1.0.23.jar:/root/canal/bin/../lib/canal.protocol-1.0.23.jar:/root/canal/bin/../lib/canal.parse.driver-1.0.23.jar:/root/canal/bin/../lib/canal.parse.dbsync-1.0.23.jar:/root/canal/bin/../lib/canal.parse-1.0.23.jar:/root/canal/bin/../lib/canal.meta-1.0.23.jar:/root/canal/bin/../lib/canal.instance.spring-1.0.23.jar:/root/canal/bin/../lib/canal.instance.manager-1.0.23.jar:/root/canal/bin/../lib/canal.instance.core-1.0.23.jar:/root/canal/bin/../lib/canal.filter-1.0.23.jar:/root/canal/bin/../lib/canal.deployer-1.0.23.jar:/root/canal/bin/../lib/canal.common-1.0.23.jar:/root/canal/bin/../lib/aviator-2.2.1.jar:.:/usr/java/jdk1.8.0_121/lib
    cd to /root/canal for continue
    
    
    
  • 关闭canal

sh /app/canal/bin/stop.sh
master1: stopping canal 16062 ... 
Oook! cost:1
  • 相关日志位置
cat /app/canal/logs/canal/canal.log
cat /app/canal/logs/example/example.log
  • json样式
{"data":[{"id":"13","tab":"123","values":"1231231"}],"database":"test","es":1642058435000,"id":10,"isDdl":false,"mysqlType":{"id":"int(14)","tab":"varchar(20)","values":"varchar(60)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":4,"tab":12,"values":12},"table":"home","ts":1642058440150,"type":"DELETE"}
{"data":[{"id":"32","tab":"12","values":"213"}],"database":"test","es":1642058451000,"id":11,"isDdl":false,"mysqlType":{"id":"int(14)","tab":"varchar(20)","values":"varchar(60)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":4,"tab":12,"values":12},"table":"home","ts":1642058455906,"type":"INSERT"}

6.gpkafka配置及测试-canal版

  • 暂时只考虑插入的情况!

  • canal中的插入语句的json格式为:

    {"data":[{"id":"444","tab":"444","values":"4444","rrr":null}],"database":"test","es":1642062004000,"id":5,"isDdl":false,"mysqlType":{"id":"int(14)","tab":"varchar(21)","values":"varchar(60)","rrr":"varchar(50)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":4,"tab":12,"values":12,"rrr":12},"table":"home","ts":1642062008970,"type":"INSERT"}
    
  • 格式不是标准的json需要处理

DATABASE: test
USER: gpadmin
HOST: 192.168.10.227
PORT: 5432
VERSION: 2
KAFKA:
   INPUT:
      SOURCE:
        BROKERS: 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
        TOPIC: test_home
      VALUE:
        COLUMNS:
          - NAME: c1
            TYPE: json
        FORMAT: json
      ERROR_LIMIT: 100
   OUTPUT:
      SCHEMA: test
      TABLE: home
      MAPPING:
        - NAME: id
          EXPRESSION: ((c1#>>'{data,0}')::json->>'id')::decimal
        - NAME: tab
          EXPRESSION: ((c1#>>'{data,0}')::json->>'tab')::text
        - NAME: values
          EXPRESSION: ((c1#>>'{data,0}')::json->>'values')::text
   COMMIT:
      MINIMAL_INTERVAL: 2000
gpkafka load test_home.yaml

成功达成!

DATABASE: test
USER: gpadmin
HOST: 192.168.10.227
PORT: 5432
VERSION: 2
KAFKA:
   INPUT:
      SOURCE:
        BROKERS: 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
        TOPIC: test_home
      VALUE:
        COLUMNS:
          - NAME: c1
            TYPE: json
        FORMAT: json
      ERROR_LIMIT: 100
   OUTPUT:
      SCHEMA: test
      MODE: MERGE
      MATCH_COLUMNS:
        - id
      UPDATE_COLUMNS:
        - tab
        - values
      ORDER_COLUMNS:
        - ts
        - xid
        - del_mark
        - ddl_type
      TABLE: home
      MAPPING:
        - NAME: id
          EXPRESSION: ((c1#>>'{data,0}')::json->>'id')::decimal
        - NAME: tab
          EXPRESSION: ((c1#>>'{data,0}')::json->>'tab')::text
        - NAME: values
          EXPRESSION: ((c1#>>'{data,0}')::json->>'values')::text
        - NAME: ts
          EXPRESSION: (c1->>'es')::decimal
        - NAME: xid
          EXPRESSION: (c1->>'id')::decimal
        - NAME: ddl_type
          EXPRESSION: (c1->>'type')::text
        - NAME: del_mark
          EXPRESSION: CASE WHEN ((c1->>'type')::text= 'DELETE') then true else false end
   COMMIT:
      MINIMAL_INTERVAL: 2000

https://segmentfault.com/a/1190000022567264 完成,是DELETE_CONDITION: (c1->>'type')::text = 'DELETE' 条件写错误

完美达成,可以实现一个虚拟删除记录

你可能感兴趣的:(测试mysql与greenplum实时同步)