Oracle11GR2同步数据进入KAFKA

Oracle 11G R2 同步数据进入KAFKA

相关软件下载

最新版

http://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html

旧版本

https://edelivery.oracle.com/osdc/faces/SoftwareDelivery

由于Oracle抽取有对应版本关系,请自行查阅下载

我用的全都是12版本,因为12版本对DDL功能支持较好

源端Oracle配置

1 建立安装目录

目录创建由自行决定,无要求

mkdir -p /app/ogg/oracle
mkdir -p /app/ogg/oracle/trails
unzip V975837-01.zip # 由于我把下载文件进行了重命名,所以和你们下载后的文件名不一样,这点不重要
ls

由于OGG12可以自动安装了,所以下面编写响应文件

cd fbo_ggs_Linux_x64_shiphome/Disk1/response/
vi oggcore.rsp

INSTALL_OPTION=ORA11g # 11G和12C此选项不同
SOFTWARE_LOCATION=/app/ogg/oracle # 安装目录

su - oracle
./runInstaller -silent -nowait -responseFile /usr/MyWorkSpace/fbo_ggs_Linux_x64_shiphome/Disk1/response/oggcore.rsp

注:1 由于我安装在同一台机器上进行测试,所以没有配置环境变量,以减少冲突

2 配置Oracle数据库

这里默认Oracle数据库已经正常使用,不包含Oracle数据库的安装

2.1 启动归档模式

su - oracle
sqlplus / as sysdba
SQL> archive log list 
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     447400
Next log sequence to archive   447404
Current log sequence	       447404

shutdown immediate # 立即关闭数据库 
startup mount # 启动实例并加载数据库,但不打开
alter database archivelog; # 更改数据库为归档模式
alter database open; # 打开数据库

2.2 Oracle打开日志相关

OGG基于辅助日志等进行实时传输,故需要打开相关日志确保可获取事务内容,通过下面的命令查看该状态

sqlplus OGG/OGG as sysdba
select force_logging, supplemental_log_data_min from v$database;

# 如未开启使用下列指令开启

alter database force logging;
alter database add supplemental log data;
alter database add supplemental log data (primary key) columns;
alter database add supplemental log data (unique) columns;
-- 后面两个列可以一起处理,语句如下:
alter database add supplemental log data (primary key,unique,foreign key) columns;

@marker_setup.sql;
@ddl_setup.sql;
@role_setup.sql;
GRANT GGS_GGSUSER_ROLE TO OGG
@ddl_enable.sql;

# 性能优化相关,进入oracle安装目录
cd /data2/app/oracle/product/11.4.0/db_1/rdbms/admin
sqlplus / as sysdba
@dbmspool.sql
@ddl_pin OGG

注:执行以上脚本时首先要保证没有其他的DDL语句执行,其次要保证进入OGG安装目录执行,否则执行时会卡死

2.3 创建复制用户

其实可以使用当前用户进行复制,可以根据情况自行考虑,使用单独用户的目的是便于管理,对业务并无影响

-- 创建表空间
CREATE TABLESPACE OGG
  DATAFILE 
   '/data2/app/oracle/datafile/OGG'SIZE 512M AUTOEXTEND ON NEXT 512M MAXSIZE UNLIMITED
  SEGMENT SPACE MANAGEMENT AUTO;

-- 创建临时表空间
CREATE TEMPORARY TABLESPACE OGGTEMP
 TEMPFILE
'/data2/app/oracle/datafile/OGGTEMP' SIZE 512M AUTOEXTEND ON NEXT 512M MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL;

-- 创建用户
create user OGG
  identified by OGG
  default tablespace OGG
  temporary tablespace OGGTEMP;
-- 权限可以更小,为了方便给予了DBA权限
grant dba to ogg;
grant create table,create sequence to goldengate;

2.4 初始化OGG

su - oracle
/app/ogg/oracle/ggsci

GGSCI (dataalydb) 1> create subdirs

Creating subdirectories under current directory /app/ogg/oracle

Parameter files                /app/ogg/oracle/dirprm: already exists
Report files                   /app/ogg/oracle/dirrpt: created
Checkpoint files               /app/ogg/oracle/dirchk: created
Process status files           /app/ogg/oracle/dirpcs: created
SQL script files               /app/ogg/oracle/dirsql: created
Database definitions files     /app/ogg/oracle/dirdef: created
Extract data files             /app/ogg/oracle/dirdat: created
Temporary files                /app/ogg/oracle/dirtmp: created
Stdout files                   /app/ogg/oracle/dirout: created

2.5 配置OGG源端进程

2.5.1 配置全局变量
GGSCI (dataalydb) 3> dblogin userid OGG password OGG;
Successfully logged into database.
GGSCI (dataalydb) 4> edit param ./globals
oggschema ogg
2.5.2 配置管理器mgr
GGSCI (dataalydb) 5> edit param mgr
PORT 7809
PURGEOLDEXTRACTS /app/ogg/oracle/trails/w1*, USECHECKPOINTS, MINKEEPFILES 10
AUTORESTART ER *, RETRIES 3, WAITMINUTES 5
PURGEDDLHISTORY MINKEEPDAYS 3, MAXKEEPDAYS 5, FREQUENCYMINUTES 30
PURGEMARKERHISTORY MINKEEPDAYS 3, MAXKEEPDAYS 5, FREQUENCYMINUTES 30
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45

MANAGER进程参数配置说明:

  • PORT:指定服务监听端口;这里以7839为例,默认端口为7809
  • DYNAMICPORTLIST:动态端口:可以制定最大256个可用端口的动态列表,当指定的端口不可用时,管理进程将会从列表中选择一个可用的端口,源端和目标段的Collector、Replicat、GGSCI进程通信也会使用这些端口;
  • COMMENT:注释行,也可以用–来代替;
  • AUTOSTART:指定在管理进程启动时自动启动哪些进程;
  • AUTORESTART:自动重启参数设置:本处设置表示每3分钟尝试重新启动所有EXTRACT进程,共尝试5次;
  • PURGEOLDEXTRACTS:定期清理trail文件设置:本处设置表示对于超过3天的trail文件进行删除。
  • LAGREPORT、LAGINFO、LAGCRITICAL:
  • 定义数据延迟的预警机制:本处设置表示MGR进程每隔1小时检查EXTRACT的延迟情况,如果超过了30分钟就把延迟作为信息记录到错误日志中,如果延迟超过了45分钟,则把它作为警告写到错误日志中。
2.5.3 配置抽取进程
REGISTER EXTRACT EXTND DATABASE
add extract extnd,INTEGRATED TRANLOG,begin now
add exttrail ./dirdat/nd,extract extnd,megabytes 100
-- Megabytes:指定队列大小,本处设置表示100M。
edit params extnd
EXTRACT extnd
SETENV (NLS_LANG = "SIMPLIFIED CHINESE_CHINA.AL32UTF8")
SETENV (ORACLE_HOME = "/data2/app/oracle/product/11.4.0/db_1")
SETENV (ORACLE_SID = "dataaly")
USERID OGG, PASSWORD OGG
EXTTRAIL ./dirdat/nd
DISCARDFILE ./dirdat/extnd.dsc,APPEND,MEGABYTES 5
TRANLOGOPTIONS ALTARCHIVEDLOGFORMAT %t_%s_%r.dbf
DDL INCLUDE ALL
DDLOPTIONS ADDTRANDATA
GETUPDATEBEFORES
FETCHOPTIONS, USESNAPSHOT, NOUSELATESTVERSION, MISSINGROW REPORT
STATOPTIONS REPORTFETCH
WARNLONGTRANS 1H, CHECKINTERVAL 5M
TABLE DATAALY.*;

, SQLPREDICATE “where ename=‘Gavin’”

注:

1 解决OGG-01755问题:

2020-05-20 20:05:33  ERROR   OGG-01755  Cannot register or unregister EXTRACT EXTND because of the following SQL error: OCI Error 26,947. See Extract user privileges in the Oracle GoldenGate for Oracle Installation and Setup Guide.

2 解决OGG-02912 Oracle GoldenGate Capture for Oracle, extnd.prm: Patch 17030189 is required on your Oracle mining database for trail format RELEASE 12.2 or later.

cd /app/ogg/oracle/
sqlplus / as sysdba
@prvtlmpg.plb
# 输入OGG用户
OGG

3 OCI Error 26,947就是ORA-26947,11.2.0.4版本的ORACLE数据库,要使用OGG需要设置enable_goldengate_replication参数为TRUE,这也是11.2.0.4版本的ORACLE数据库新增的参数,看来ORACLE数据库和OGG结合的越来越紧密了。

alter system set enable_goldengate_replication=true;
2.5.4 添加传输进程
add extract dpend, exttrailsource ./dirdat/nd, begin now
add rmttrail ./dirdat/nd, extract dpend, megabytes 100

edit params dpend

EXTRACT dpend
SETENV (NLS_LANG = "SIMPLIFIED CHINESE_CHINA.AL32UTF8")
USERID OGG, PASSWORD OGG
DDL INCLUDE ALL
DDLOPTIONS ADDTRANDATA
RMTHOST 10.0.9.19, MGRPORT 9876
RMTTRAIL ./dirdat/nd
DISCARDFILE ./dirdat/dpend.dsc,APPEND,MEGABYTES 5
TABLE DATAALY.*;
2.5.5 配置define文件
GGSCI (dataalydb) 7> edit params dataalydef
defsfile  ./dirdef/dataalydef
userid OGG,password OGG
TABLE DATAALY.*;

切换到OGG目录后执行如下指令

cd /app/ogg/oracle/
./defgen paramfile dirprm/dataalydef.prm

生成的文件在ogg安装目录下dirdef/dataalydef,将这个文件拷贝到OGG目标端的dirdef目录下

注: 12以后新版本可以跳过此步,不需要执行

3 目标端(kafka)配置

3.1 建立相关目录

mkdir -p /app/ogg/kafka/
cp /app/OGG_BigData_Linux_x64_12.3.2.1.1.zip /app/ogg/kafka/
chown -R oracle:oinstall /app/ogg/oracle
unzip OGG_BigData_Linux_x64_12.3.2.1.1.zip

[oracle@dataalydb kafka]$ ls
OGGBD-12.3.2.1-README.txt                 OGG_BigData_Linux_x64_12.3.2.1.1.tar
OGG_BigData_12.3.2.1.1_Release_Notes.pdf  OGG_BigData_Linux_x64_12.3.2.1.zip

tar -xvf OGG_BigData_Linux_x64_12.3.2.1.1.tar

3.2 初始化OGG

useradd -u 80 hadoop
chown -R hadoop:hadoop /app/ogg/kafka/
su - hadoop
vi .bash_profile

export JAVA_HOME=/usr/java/jdk1.8.0
export HADOOP_HOME=/app/hadoop-2.9.2
export SPARK_HOME=/app/spark
export ZOOKEEPER_HOME=/app/zookeeper/
export HIVE_HOME=/app/hive/
export SCALA_HOME=/app/scala/
export PATH=$HIVE_HOME/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:
export CLASSPATH=./:$HIVE_HOME/lib:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$SPARK_HOME/lib:$HADOOP_HOME/share/hadoop/common:$HADOOP_HOME/share/hadoop/yarn
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/


/app/ogg/kafka/ggsci

GGSCI (dataalydb) 1> create subdirs

Creating subdirectories under current directory /app/ogg/oracle

Parameter files                /app/ogg/oracle/dirprm: already exists
Report files                   /app/ogg/oracle/dirrpt: created
Checkpoint files               /app/ogg/oracle/dirchk: created
Process status files           /app/ogg/oracle/dirpcs: created
SQL script files               /app/ogg/oracle/dirsql: created
Database definitions files     /app/ogg/oracle/dirdef: created
Extract data files             /app/ogg/oracle/dirdat: created
Temporary files                /app/ogg/oracle/dirtmp: created
Stdout files                   /app/ogg/oracle/dirout: created

解决报错

/app/ogg/kafka/ggsci: error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory

切换到root账户

find / -name libjvm.so

/usr/java/jdk1.8.0/jre/lib/amd64/server/libjvm.so
/data2/app/oracle/product/11.4.0/db_1/jdk/jre/lib/amd64/server/libjvm.so

基本上安装了JDK肯定会有libjvm.so他不识别只能手动添加一下,假如没有libjvm.so,那么请自行安装jdk

vim /etc/ld.so.conf
/usr/java/jdk1.8.0/jre/lib/amd64/server/
ldconfig

假如未报错跳过上述部分

3.3 配置管理器mgr

GGSCI (dataalydb) 5> edit param mgr
PORT 9876
PURGEOLDEXTRACTS ./trails/kfk*, USECHECKPOINTS, MINKEEPFILES 10
AUTORESTART ER *, RETRIES 3, WAITMINUTES 5
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45

3.3 配置replicate进程

add replicat rkafka,exttrail ./dirdat/nd,begin now
GGSCI (dataalydb) 5> edit params rkafka
REPLICAT rkafka
DDL INCLUDE ALL
TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP DATAALY.*, TARGET DATAALY.*;

编辑kafka.props放入ogg的dirprm目录下

gg.handlerlist = kafkahandler
gg.handler.kafkahandler.type=kafka
gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties
#The following resolves the topic name using the short table name
#gg.handler.kafkahandler.topicMappingTemplate=ogg
gg.handler.kafkahandler.topicMappingTemplate=ogg
#The following selects the message key using the concatenated primary keys
#gg.handler.kafkahandler.keyMappingTemplate=${primaryKeys}
#gg.handler.kafkahandler.format=avro_op
gg.handler.kafkahandler.format=json
#gg.handler.kafkahandler.SchemaTopicName=mySchemaTopic
gg.handler.kafkahandler.SchemaTopicName=dataalyogg
gg.handler.kafkahandler.BlockingSend =false
gg.handler.kafkahandler.includeTokens=false
gg.handler.kafkahandler.mode=op
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE	
gg.log=log4j
gg.log.level=INFO
gg.report.time=30sec	
#Sample gg.classpath for Apache Kafka
gg.classpath=dirprm/:/app/kafka/libs/*
#Sample gg.classpath for HDP
#gg.classpath=/etc/kafka/conf:/usr/hdp/current/kafka-broker/libs/*
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

编辑custom_kafka_producer.properties文件,将文件放入与kafka.props相同目录下

#bootstrap.servers=host:port
bootstrap.servers=127.0.0.1:9092
acks=1
#compression.type=gzip
reconnect.backoff.ms=1000

value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
# 100KB per partition
batch.size=102400
linger.ms=10000

你可能感兴趣的:(oracle,hadoop)