本文介绍如何配置Oracle GoldenGate 同步数据到Kafka,包括OGG源端安装配置,OGG for Big Data replication安装配置,以及如何与Kafka集群集成。
软件准备
1. Oracle11.2.0.2
2. Oracle GoldenGate V12.2.0.1.1 for Oracleon Linux x86-64
部署在Oracle Server上。
3. Oracle GoldenGate for Big DataV12.2.0.1.1 on Linux x86-64
部署在Replication Server上
4. zookeeper 3.4.6
部署在kafka servers上,用于failover
5. kafka 2.11.0.9.1
三台节点信息如下:
192.168.2.105 kafka01
192.168.2.107 kafka02
192.168.2.108 kafka03
1.在三台机器上创建kafka用户:
useradd kafka
2.配置三台机器的相互信任关系(SSH)
kafka01机器上,切换到kafka用户
键入命令ssh-keygen -t rsa,三次回车后,在.ssh文件夹下生成一个私钥和一个公钥
将公钥生成一个authorized_keys命令:catid_rsa.pub >authorized_keys
分别在kafka02和kafka03上执行ssh-keygen-t rsa命令,三台机器都生成了私钥和公钥
把在kafka01上生成的authorized_keys传到kafka02和skafka03上去
scp authorized_keys kafka02:~/.ssh
将kafka02的公钥添加到authorized_keys文件中
cat id_rsa.pub >>authorized_keys
从kaka02中把authorized_keys文件scp到kafka03上
kafka03的.ssh目录,同样将公钥追加到authorized_keys文件
cat id_rsa.pub >>authorized_keys
将此文件分别传给kafka02和kafka01,使三台机器的authorized_keys文件一致
到此,三台节点已经建立好无密码登陆
以下例子中是把zookeeper分布安装在3个节点上:
192.168.2.105 kafka01
192.168.2.107 kafka02
192.168.2.108 kafka03
下载zookeeper3.4.6
http://zookeeper.apache.org/releases.html
在kafka01上作如下操作
mkdir /home/kafka/zookeeper
释放zookeeper-3.4.6.tar.gz到/home/kafka/ zookeeper
配置zoo.cfg
cd /home/kafka/ zookeeper
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
设置如下参数:
dataDir=/home/kafka/zookeeper/data
server.1= kafka01:2888:3888
server.2= kafka02:2888:3888
server.3= kafka03:2888:3888
创建节点标识
cd /home/kafka/zookeeper
mkdir data
cd data
vi myid
输入"1"
复制安装包到kafka02,kafka03上并且修改myid内容分别为2和3
scp -r zookeeper-3.4.6 kafka02:~
scp -r zookeeper-3.4.6 kafka03:~
启动Zookeeper
在三个节点上分别启动zk: zkServer.sh start
查看节点状态 zkServer.sh status
在三个节点上做如下操作
cd /home/kafka/kafka/config
vi server.properties
修改broker.id=1(kafka01为1,kafka02为2,kafka03为3)
修改zookeeper.connect=kafka01:2181,kafka02:2181,kafka03:2181
注意:需要配置host.name=kafka节点的IP地址
advertised.host.name = kafka节点的IP地址
advertised.port=kafka服务的port(这里使用默认的9092)
如果没有配置上面的参数,客户端可能无法发送消息到kafka集群。
在集群中的这三个节点上分别后台启动Kafka,分别执行如下命令:
cd /home/kafka/kafka/bin
./kafka-server-start.sh/home/kafka/kafka/config/server.properties &
创建一个名称为oggtest的Topic,5个分区,并且复制因子为3,执行如下命令:
./kafka-topics.sh –create –zookeeperkafka01:2181,kafka02:2181,kafka03:2181 –replication-factor 3 –partitions 5–topic oggtest
查看创建的Topic,执行如下命令:(可以在任意节点上查看,这里在slave2)
./kafka-topics.sh –describe –zookeeperkafka01:2181,kafka02:2181,kafka03:2181–topic oggtest
查看所有topic命令
./kafka-topics.sh –describe –zookeeperkafka01:2181,kafka02:2181,kafka03:2181
我们可以通过Kafka自带的bin/kafka-console-producer.sh和bin/kafka-console-consumer.sh脚本,来验证演示如果发布消息、消费消息。
在一个终端,启动Producer,并向我们上面创建的名称为oggtest的Topic中生产消息,执行如下脚本:
./kafka-console-producer.sh –broker-listkafka01:9092,kafka02:9092,kafka03:9092 –topic oggtest
在另一个终端,启动Consumer,并订阅我们上面创建的名称为my-replicated-topic5的Topic中生产的消息,执行如下脚本:
./kafka-console-consumer.sh –zookeeperkafka01:2181,kafka02:2181,kafka03:2181 –from-beginning –topic oggtest
1.检查是否开启Archive logging
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /oracle/data/test/arch
Oldest online log sequence 49
Next log sequence to archive 51
Current log sequence 51
如果没有开启Archive logging,执行下面操作开启,该操作需要关闭Oracle
SQL> alter database mount;
Database altered.
SQL> alter database archivelog;
Database altered.
SQL> alter database open;
1. 检查是否开启forcelogging和 minimal supplemental logging
SQL > SELECT supplemental_log_data_min,force_logging FROM v$database;
SUPPLEME FORCE_LOGGING
________________________________________
YES YES
如果没有开启Supplemental logging,执行下面操作开启,该操作无需关闭Oracle
SQL > alter database add supplemental log data (primary key) columns;
Database altered.
SQL > alter database add supplemental log data (unique) columns;
Database altered.
注意:如果不指定Primary key 和unique 属性,OGG将不会传送PK字段或Unique indiex字段信息。这样,下游的应用,在处理update数据时将失去依据。
SQL > alter database force logging;
SQL> alter system switch logfile;
SQL >create user ogg identified by oggaccount;
SQL >grant connect,resource to ogg;
SQL >grant select any dictionary to ogg;
SQL >grant select any table to ogg;
下载OracleGoldenGate V12.2.0.1.1 for Oracle on Linux x86-64
http://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html
创建OGG home路径
su - oracle–以下操作均在oracle帐户下进行
mkdir -p /opt/ogg
拷贝fbo_ggs_Linux_x64_shiphome.zip到/opt/ogg
cd /opt/ogg
unzip fbo_ggs_Linux_x64_sh iphome.zip
设置response参数
vi/opt/ogg/fbo_ggs_Solaris_sparc_shiphome/Disk1/response/oggcore.rsp
INSTALL_OPTION=ORA11g
SOFTWARE_LOCATION=/opt/ogg/
START_MANAGER=false
MANAGER_PORT=9001
DATABASE_LOCATION=$ORACLE_HOME
注意:由于我用的oracle版本是11.2.0.2.0,因此install option设置为ORA11g,如果用的是12C,这里要设置ORA12c
运行安装程序
cd fbo_ggs_Linux_x64_shiphome/Disk1/
./runInstaller -silent -responseFile/opt/ogg/fbo_ggs_Solaris_sparc_shiphome/Disk1/response/oggcore.rsp
设置环境变量
vi ~/.bash_profile
export ORACLE_HOME =
export OGG_HOME=
export PATH=PATH:PATH: ORACLE_HOME:OGGHOMEexportLDLIBRARYPATH=OGG H OMEexportLD L IBRARY P ATH=LD_LIBRARY_PATH:$ORACLE_HOME/lib
配置OGG
: cd $OGG_HOME
: ./ggsci–启动ogg
创建相关子目录
cd /opt/ogg
./ggsci
GGSCI>create subdirs
加密ogg帐户的登陆密码
当然,也可以使用明文密码做测试。
GGSCI> encrypt password ogg BLOWFISH,ENCRYPTKEY DEFAULT
Using Blowfish encryption with DEFAULT key.
Encrypted password:AACAAAAAAAAAAADAHBLDCCIIOIRFNEPB
Algorithm used: BLOWFISH
配置OGG manager
GGSCI>edit params mgr
输入下面脚本将配置manager使用9007端口运行OGG服务
PORT 9001
dynamicportlist 9901-9920,9930
autostart er *
autorestart er *,retries 4,waitminutes 4
startupvalidationdelay 5
purgeoldextracts/opt/ogg/dirdat/tt,usecheckpoints,minkeephours 2
启动OGG manager
GGSCI>start mgr
注册登录用户
GGSCI>dblogin userid ogg,passwordAACAAAAAAAAAAADAHBLDCCIIOIRFNEPB, BLOWFISH, ENCRYPTKEY DEFAULT
创建Extract
GGSCI>edit params ext_test
输入下面脚本创建了ext_test的extract
extract ext_test
userid ogg@test,passwordAACAAAAAAAAAAADAHBLDCCIIOIRFNEPB, BLOWFISH, ENCRYPTKEY DEFAULT
rmthost 192.168.2.106,mgrport 9007
–add exttrail /opt/ogg/dirdat/et, extract ext_test,megabytes 100
RMTfile /opt/ogg/dirdat/rt
table ogg.*; –any schema is fine, since ogghas open access for any table under any other account
添加extract toOGG
GGSCI>add extract ext_test, INTEGRATEDTRANLOG, BEGIN NOW
为extract指定远程trail文件路径及特征,注意特征必须是两个字符,这里为et
GGSCI>add rmttrail /opt/ogg/et,extractext_test
下载Oracle GoldenGate for Big Data V12.2.0.1.1
122011_ggs_Adapters_Linux_x64.zip
在所有replication servers上创建OGG home路径,比如:/opt/ogg
编辑/etc/hosts,添加kafka Servers
vi /etc/hosts
192.168.2.105 kafka01
192.168.2.107 kafka02
192.168.2.108 kafka03
释放压缩文件包到OGG home路径:/opt/ogg
拷贝配置模板文件
export OGGHOME=/opt/oggcpOGG HOME=/opt/oggcp OGG_HOME/AdapterExamples/big-data/kafka/* $OGG_HOME/dirprm/
启动ogg,并创建相关子目录
cd /opt/ogg/
./ggsci
GGSCI>create subdirs
Creating subdirectories under currentdirectory /opt/ogg
Parameter files /opt/ogg/dirprm: created
Report files /opt/ogg/dirrpt: created
Checkpoint files /opt/ogg/dirchk: created
Process status files /opt/ogg/dirpcs:created
SQL script files /opt/ogg/dirsql: created
Database definitions files /opt/ogg/dirdef:created
Extract data files /opt/ogg/dirdat: created
Temporary files /opt/ogg/dirtmp: created
Credential store files /opt/ogg/dircrd:created
Masterkey wallet files /opt/ogg/dirwlt:created
Dump files /opt/ogg/dirdmp: created
配置manager
GGSCI>edit params mgr
输入下面脚本,分配9007端口给mgr服务
port 9007
ACCESSRULE, PROG REPLICAT, IPADDR 192.168.2.*,ALLOW
dynamicportlist 9901-9920,9930
autostart er *
autorestart er *,retries 4,waitminutes 4
startupvalidationdelay 5
purgeoldextracts/opt/ogg/dirdat/*,usecheckpoints,minkeephours
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45
退出
GGSCI>exit
配置kafka.props
cd $OGG_HOME
vi ./dirprm/kafka.props
输入下面脚本
gg.handlerlist = kafkahandler
gg.handler.kafkahandler.type=kafka
gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties
gg.handler.kafkahandler.TopicName =oggtest
gg.handler.kafkahandler.format=avro_op
gg.handler.kafkahandler.format=delimitedtext
gg.handler.kafkahandler.format.fieldDelimiter=|
gg.handler.kafkahandler.SchemaTopicName=myoggtest
gg.handler.kafkahandler.BlockingSend=false
gg.handler.kafkahandler.includeTokens=false
gg.handler.kafkahandler.mode=op
gg.handler.kafkahandler.maxGroupSize =100,1Mb
gg.handler.kafkahandler.minGroupSize =50,500Kb
goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE
gg.log=log4j
gg.log.level=INFO
gg.report.time=30sec
gg.classpath=/opt/ogg/dirprm/:/kafka/kfk/libs/:/usr/avro/:
javawriter.bootoptions=-Xmx512m -Xms32m-Djava.class.path=/kafka/oggforbigdata/ggjava/ggjava.jar
注意:
gg.handler.kafkahandler.TopicName必须指定kafka端定义的topic
gg.handler.kafkahandler.format下面配置使用文本,并用”|”相隔,kafka最终接收到如下格式的消息。
I|ORATEST.OGG_TEST|2016-10-1712:17:24.996945|2016-10-17T20:41:32.169000|00000000000000001556|11|test11
gg.classpath须指定相应的lib路径
配置custom_kafka_producer.properties
vi dirprm/custom_kafka_producer.properties
输入下面脚本,注意bootstrap.servers要指定Kafka的端口
bootstrap.servers=kafka01:9092,kafka02:9092, kafka03:9092
acks=1
compression.type=gzip
reconnect.backoff.ms=1000
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
100KB per partition
batch.size=102400
linger.ms=10000
定义replication
cd $OGG_HOME
./ggsci
GGSCI>edit params rep_test
输入下面脚本将接收源端ogg发来的test表数据流
REPLICAT rep_test
– Command to add REPLICAT
– add replicat rep_test, exttrail /opt/ogg/dirdat/et
TARGETDB LIBFILE libggjava.so SETproperty=dirprm/kafka.props
–EXTFILE /kafka/testoggforbigdata/dirdat/et
–DDL INCLUDE ALL
SOURCEDEFS /opt/ogg/dirdef/test.p
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP ogg., TARGET ogg.;
注意这里需要定义表结构,该文件可以存放多个表的定义信息。
下面是一个def文件示例:
Database type: ORACLE
Character set ID: UTF-8
National character set ID: UTF-16
Locale: neutral
Case sensitivity: 14 14 14 14 14 14 14 1414 14 14 14 11 14 14 14
TimeZone: +08:00
*
Definition for table OGG.TEST
Record length: 142
Syskey: 0
Columns: 2
ID 64 50 0 0 0 1 0 50 50 50 0 0 0 0 1 0 1 22 -1 0 0 0
NAME 64 80 56 0 0 1 0 80 80 0 0 0 0 0 1 0 10 1 -1 20 0 0
End of definition
指定Trail文件
GGSCI> add replicat rep_test, exttrail/opt/ogg/dirdat/et
注意:这里trail文件的location和特征要和源端Extract的定义一致
1.启动kafka consumerconsole
kafka-console-consumer.sh –zookeeper :2181–topic oggtest –from-beginning
2.在oracle端启动oggextract
cd OGGHOME./ggsciGGSCI>startmgrGGSCI>startextractexttest检查运行状态GGSCI>infoall3.启动Recplication,在Replicationserver上cd OGG_HOME
./ggsci
GGSCI>start mgr
GGSCI>start replicat rep_test
4.在源端测试表中插入数据
sqlplus ogg/ogg@test
SQL>insert into ogg.test (id, name)values(1, ‘test01’);
SQL>commit;
5.查看kafka消费控制台是否接收到该行数据