Oracle数据实时同步大数据平台的解决方案
摘要:基于传统企业Oracle数据库实时增量数据同步到Kafka消息系统,供下游做实时分析/实时ETL等场景,引进Oracle GoldenGate组件提供不影响系统处理功能的实时数据集成和持续可用性解决方案,使企业能显著提高整个企业关键系统的可用性、可靠性和性能并同时降低IT成本。
一、企业现状篇
传统企业对信息化系统追求高稳定性和高可靠性,建立系统时期数据库方面大多数选择当时较为流行且稳定的Oracle数据库。随着DT时代到来,使得金融保险业对数据的依赖越发加强,传统Oracle已不能完全满足保险企业日益增多的全方位数据分析需求,传统企业整个IT架构和模式存在十年以上,完全推倒重来使用开源数据库替代,极其伤筋动骨,如何在现有架构不影响系统运行去实现数据对接大数据平台,利用大数据技术和能力提供更多价值的数据服务。
二、数据同步篇
传统关系型数据库数据同步到大数据平台的方式有很多。1.全量同步
2.按时间戳增量同步
3.基于数据库归档日志的同步
全量同步和按时间戳增量同步工具如Sqoop,DataX,FlinX都可以完美解决离线批处理同步方案,但不能满足实时场景需求,基于数据库归档日志的同步属于实时数据同步,如开源mysql可使用Canal利用主从自动同步机制实时采集binlog,对于商业软件Oracle数据库的日志格式不是公开的,没有开源解决方案,选择引进Oracle数据集成产品组件Oracle GoldenGate(OGG)来实现,OGG可提供不影响系统处理功能的实时数据集成和持续可用性解决方案,使企业能显著提高整个企业关键系统的可用性、可靠性和性能并同时降低IT成本。OGG For BigData可支持对接Kafka/HDFS/HBase/Elasticsearch/Flume/JDBC/MongoDB 等大数据常用组件。链接:https://docs.oracle.com/en/middleware/goldengate/big-data/index.html
三、安装配置篇
下面主要介绍Oracle数据实时增量同步Kafka消息系统,供下游做实时处理场景。源端 oracle database + oracle goldengate for oracle,目标端oracle goldengate for bigdata +kafka
(一)环境准备
1.java环境变量(版本jdk1.8)
2.机器网络连通
3.机器时钟同步
4.源端oracle数据库(版本11.2.0.4以上)
(二)环境背景
1.源端
机器IP: 192.168.72.3
服务:oracle, ogg的mgr(端口7809), ext_test, dpe_test
2.目标端
机器IP: 192.168.72.3
服务:ogg的mgr(端口7909), rep_test,zookeeper, kafka
3.版本
oracle 版本 11g release 2
kafka 版本 2.11-2.0.0
zookeeper 版本 3.4.13
JVM 版本,1.8(ogg for big data必须1.8 以上,否者无法启动)
ogg 源端安装包文件名,123010_fbo_ggs_Linux_x64_shiphome
ogg 目标端安装包文件名,123010_ggs_Adapters_Linux_x64
(三)源端配置
1.解压安装
1.设置response参数
[oracle@stream ~]$ cd /home/oracle/tmp/fbo_ggs_Linux_x64_shiphome/Disk1
[oracle@stream Disk1]$vi ./response/oggcore.rsp
INSTALL_OPTION=ORA11g
SOFTWARE_LOCATION=/home/oracle/software/oracle/goldengate
START_MANAGER=false
MANAGER_PORT=7809
DATABASE_LOCATION=$ORACLE_HOME
UNIX_GROUP_NAME=oracle
# INSTALL_OPTION 安装选项 oracle11g设置ORA11g;oracle12c设置ORA12c
# SOFTWARE_LOCATION OGG的安装目录
# START_MANAGER是否自动启动mgr管理进程 true,false
# MANAGER_PORT mgr启动端口,START_MANAGER=true时添加
# START_MANAGER 数据库地址 START_MANAGER=true时添加$ORACLE_HOME
# UNIX_GROUP_NAME 系统权限组
2.运行安装程序
[oracle@ogg Disk1]$ ./runInstaller -silent -responseFile /home/oracle/tmp/fbo_ggs_Linux_x64_shiphome/Disk1/response/oggcore.rsp
2.环境变量
goldengate安装要配置JAVA_HOME、ORACLE_SID、ORACLE_HOME、SOGG_HOME 、LD_LIBRARY_PATH环境变量。
#jdk
export JAVA_HOME=/home/oracle/software/jdk/jdk1.8.0_162
export PATH=$JAVA_HOME/bin:$PATH
#oracle
export ORACLE_HOME=/home/oracle/software/oracle/database/oracle11g/product/11.2.0/dbhome_1
export ORACLE_SID=orcl
export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/OPatch
#goldengate
export SOGG_HOME=/home/oracle/software/oracle/goldengate
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/lib32:$SOGG_HOME
export PATH=$PATH:$SOGG_HOME
3.权限分配
1.归档日志
源端数据库要开归档模式、置成force logging、开追加日志操作。
①Archive logging
检查oracle 是否已经开启 Archive logging(日志自动归档)
方式一:
[oracle@stream ~]$ sqlplus / as sysdba
SQL> archive log list
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 6
Current log sequence 8
SQL>
方式二:
SQL> select name,log_mode from v$database;
LOG_MODE 显示NOARCHIVELOG则代表没有开启。
如果没有开启Archive logging,需要先停止数据库,执行以下命令:
SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database archivelog;
Database altered.
SQL> alter database open;
②forcelogging & minimal supplemental logging
检查是否开启 forcelogging 和 minimal supplemental logging
SQL> SELECT supplemental_log_data_min,force_logging FROM v$database;
环境没有开启屏幕输出NO、开启输出YES,所以还需要执行开启命令,执行完毕后,我们再来查看forcelogging和minimal supplemental logging 的开启情况
SQL> alter database add supplemental log data;
SQL> alter database force logging;
//TODO待验证
SQL> alter database add supplemental log data (primary key) columns;
SQL> alter database add supplemental log data (unique) columns;
SQL> alter database force logging;
SQL> alter system switch logfile;
注意:如果不指定Primary key 和unique 属性,OGG将不会传送PK字段或Unique indiex字段信息。这样,下游的应用,在处理update数据时将失去依据
检查开启情况,显示如下则代表ok
查看forcelogging和minimal supplemental logging 的开启情况
SQL> SELECT supplemental_log_data_min,force_logging FROM v$database;
SUPPLEME FOR
-------- ---
IMPLICIT YES
③ENABLE_GOLDENGATE_REPLICATION参数
alter system set ENABLE_GOLDENGATE_REPLICATION=true scope=both;
2.指定用户并授权
create user goldengate identified by
grant connect to goldengate;
grant alter session to goldengate;
grant create session to goldengate;
grant connect to goldengate;
grant resource to goldengate;
grant select any dictionary to goldengate;
grant select any table to goldengate;
grant insert any table to goldengate;
grant update any table to goldengate;
grant delete any table to goldengate;
grant create any table to goldengate;
grant alter any table to goldengate;
grant select any transaction to goldengate;
grant create any index to goldengate;
grant alter any index to goldengate;
grant create any sequence to goldengate;
grant unlimited tablespace to goldengate;
grant drop any table to goldengate;
grant drop any sequence to goldengate;
grant flashback any table to goldengate;
3.Register Extract Process
--这里是Enable integrated capture mode的关键步骤
[oracle@stream goldengate]$ ./ggsci
GGSCI (stream) 1> dblogin userid goldengate password
GGSCI (stream as goldengate@orcl) 2> register extract ext_test database
2020-04-02 18:30:01 ERROR OGG-02062 User goldengate does not have the required privileges to use integrated capture.
---需要授权, 且注意用户名要大写
SQL> exec DBMS_GOLDENGATE_AUTH.GRANT_ADMIN_PRIVILEGE (grantee=>'GOLDENGATE', privilege_type=>'capture',grant_select_privileges=>true, do_grants=>TRUE);
PL/SQL procedure successfully completed.
SQL>
GGSCI (stream as goldengate@orcl) 2> register extract ext_test database
2020-04-02 19:14:47 INFO OGG-02003 Extract EXT_TEST successfully registered with database at SCN 1067318.
GGSCI (stream as goldengate@orcl) 3>
//TODO其他异常
GGSCI (stream as goldengate@orcl) 3> register extract ext_test database
2020-04-02 19:14:50 WARNING OGG-02064 Oracle compatibility version 11.2.0.0.0 has limited datatype support for integrated capture. Version 11.2.0.3 required for full support.
ERROR: Cannot register or unregister EXTRACT EXT_TEST because of the following SQL error: OCI Error 6,550.
--上面错误是由于权限不够,授权,
SQL>exec DBMS_GOLDENGATE_AUTH.GRANT_ADMIN_PRIVILEGE (grantee=>'GOLDENGATE');
ERROR: Cannot register or unregister EXTRACT EXT_TEST because of the following SQL error: OCI Error 1,950.
--上面错误是由于权限不够,授权,
SQL>GRANT UNLIMITED TABLESPACE TO GOLDENGATE;
GGSCI (stream as goldengate@orcl) 3> register extract ep1 database
2020-04-02 19:14:50 WARNING OGG-02064 Oracle compatibility version 11.2.0.0.0 has limited datatype support for integrated capture. Version 11.2.0.3 required for full support.
Extract EXT_TEST successfully registered with database at SCN 224553.
--看到已经注册成功
4.Enable表的supplemental logging
GGSCI (stream) 1> dblogin userid goldengate password
GGSCI (stream) 2> add trandata STAT.PRPCMAIN
4.MGR
MGR进程管理启动Oracle GoldenGate进程、启动动态进程、分配端口给GoldenGate进程、管理trail file、创建事件,错误和诊断报告工作,必须在第一时间启动;当某些原因导致GoldenGate崩溃或重启机器时,默认情况MGR是没有启动。
1.配置
GGSCI (stream) 1> edit params mgr
PORT 7809
DYNAMICPORTLIST 7810-7820
AUTOSTART EXTRACT *
PURGEOLDEXTRACTS ./dirdat/*,usecheckpoints, minkeepdays 10
Lagcriticalminutes 60
lagreportminutes 5
ACCESSRULE, PROG *, IPADDR 10.10.*.*, ALLOW
参数 |
说明 |
PORT |
表示MGR进程端口号 |
DYNAMICPORTLIST |
表示MGR进程动态为其它进程如Extract进程、Replicat进程分配的端口,可以是具体端口号或区间值 |
AUTOSTART |
当MGR启动时启动相应的EXTRACT和REPLICAT |
AUTORESTART |
自动启动失败的OGG进程,上面指定了只重试一次 |
PurgeMarkerHistory |
定义了清理DDL复制数据策略 |
2.streams_pool_size
配置streams_pool_size参数(每个节点都要执行),可调节
SQL> alter system set streams_pool_size=512M sid=’
5.EXTRACT
1.创建
GGSCI (stream) 1> add extract ext_test integrated tranlog, begin now
GGSCI (stream) 1> ADD EXTTRAIL ./dirdat/ex , EXTRACT ext_test, MEGABYTES 200
2.配置
GGSCI (stream) 1> edit params ext_test
extract ext_test
setenv (NLS_LANG="AMERICAN_AMERICA.ZHS16GBK")
userid goldengate@
--TRANLOGOPTIONS DBLOGREADER
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 100)
exttrail ./dirdat/ex
discardfile ./dirrpt/ext_test.dsc, append
GETUPDATES
GETDELETES
GETINSERTS
ddl include mapped objtype 'TABLE' include mapped objtype 'INDEX'
ddloptions addtrandata
ddloptions report
statoptions reportfetch
reportrollover at 08:30
TABLE STAT.PRPCMAIN;
参数 |
说明 |
GETUPDATES|IGNOREUPDATES |
是否复制UPDATE操作,缺省复制 |
GETDELETES|IGNOREDELETES |
是否复制DELETE操作,缺省复制 |
GETINSERTS|IGNOREINSERTS |
是否复制INSERT操作,缺省复制 |
|
|
6.PUMP
1.创建
GGSCI (stream) 1> ADD EXTRACT dpe_test, EXTTRAILSOURCE ./dirdat/ex
GGSCI (stream) 1> ADD RMTTRAIL ./dirdat/re, EXTRACT dpe_test, MEGABYTES 200
红色字体./dirdat/ex是源端ext_test 的路径,绿色字体./dirdat/re是发送到远端服务rep_test的路径。
2.配置
GGSCI (stream) 1> edit params dpe_test
extract dpe_test
rmthost 10.***, mgrport 7809
rmttrail ./dirdat/re
TABLE STAT.PRPCMAIN;
参数 |
说明 |
RMTHOST |
指定目标端地址和端口等信息 |
RMTTRAIL |
指定目标端保存trail文件的目录和两个字符文件名 |
TABLE |
指定同步的表,配置的方式同在主抽取进程(Primary Extract)的配置一样 |
|
|
(四)目标端配置
1.环境变量
goldengate for bigdata 安装要配置JAVA_HOME、TOGG_HOME 、LD_LIBRARY_PATH环境变量
#jdk
export JAVA_HOME=/home/hadoop/software/jdk/jdk1.8.0_162
export PATH=$JAVA_HOME/bin:$PATH
#zookeeper
export ZOOKEEPER_HOME=/home/hadoop/software/zookeeper/zookeeper-3.4.13
export PATH=$PATH:$ZOOKEEPER_HOME/bin
#kafka
export KAFKA_HOME=/home/hadoop/software/kafka/kafka_2.11-2.0.0
export PATH=$PATH:$KAFKA_HOME/bin
#goldengate
export TOGG_HOME=/home/hadoop/software/goldengate/ogg_bigdata_12.3.0.1
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TOGG_HOME:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64
export PATH=$PATH:$TOGG_HOME
2.MGR
1.配置
[hadoop@stream ogg_bigdata_12.3.0.1]$ ./ggsci
GGSCI (stream) 1> CREATE SUBDIRS
GGSCI (stream) 2> edit param mgr
PORT 7909
DYNAMICPORTLIST 7910-7920
AUTOSTART ER E*
AUTORESTART ER P*,RETRIES 4, WAITMINUTES 4
STARTUPVALIDATIONDELAY 5
3.REPLICAT
1.创建
GGSCI (stream) 3> add replicat rep_test, exttrail ./dirdat/re
2.配置
GGSCI (stream) 4> edit param rep_test
REPLICAT rep_test
TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka.props
REPORTCOUNT EVERY 1 MINUTES, RATE
MAP STAT.PRPCMAIN, TARGET STAT.PRPCMAIN;
3.KAFKA
[hadoop@stream ogg_bigdata_12.3.0.1]$ cd dirprm/
[hadoop@stream dirprm]$ vi kafka.props
[hadoop@stream dirprm]$ vi custom_kafka_producer.properties
详细参数说明见官网链接:
https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/GUID-2561CA12-9BAC-454B-A2E3-2D36C5C60EE5.htm#GADBD449
(五)常用命令
1.查看编辑进程参数
view param ext_test
edit param ext_test
2.查看进程异常
view report ext_test
3.修改进程按最新数据抽取
alter extract ext_test,begin now
(六)异常问题
我摊牌了,本文重点避坑指南。
1.目标端和源端版本问题
ERROR OGG-01332 File ./dirdat/re000000000, with compatibility level 6, is not compatible with the current software version's compatibility level of 5.Modify the file writer's parameter file to generate the appropriate format using the FORMAT LEVEL 5 option
EXTCART抽取进程在抽取队列后加FORMAT RELEASE 12.2
exttrail ./dirdat/ex,format RELEASE 12.2
PUMP投递进程在投递队列后加format level 5
rmttrail ./dirdat/re,format level 5
删除进程,删除dirdat文件夹源端的抽取文件和目标端的复制文件,重新添加进程。
长按,扫码,关注
及时收看更多精彩内容