3、RAC单机复制配置
3.1、环境简介
性质 |
IP |
系统 |
ORACLE版本 |
源端 |
10.123.112.201/10.123.112.202 |
LINUX rhel5 64位 |
10.2.0.1 |
目标端 |
10.123.112.235 |
LINUX rhel5 32位 |
10.2.0.1 |
3.2、源端安装OCFS2集群文件系统
RAC环境中为了实现高可用性,需将OGG安装在集群文件系统中,这样OGG可以访问RAC中的所有节点,我们这里测试采用OCFS2文件系统。
从http://oss.oracle.com下载与LINUX内核相符的OCFS2 RPM包
LINUX下执行uname �Cr查看系统内核版本 eg:
[oracle@node2ocfs]$ uname -r
2.6.18-92.el5
使用ROOT用户安装OCFS2的RPM包
[root@node1ocfs]# rpm -ivh ocfs2-tools-1.2.7-1.el5.x86_64.rpm \\
ocfs2console-1.2.7-1.el5.x86_64.rpm\\
ocfs2-2.6.18-92.el5-1.2.9-1.el5.x86_64.rpm
进入OCFS2控制台界面
[root@node1 ~]#ocfs2console
在出现的窗体中选择[Clucster]-[ConfigureNodes]在"NodeConfiguration"对话框中,输入2个专用互连的节点名、IP 地址、端口号后,选择 [Clucster]-[PropagateCluster Configuration] ,提示"Finished"。
配置后的信息显示如下:
在集群中的所有节点上以 root 用户帐户的身份运行以下命令
export PATH=$PATH:/sbin:/usr/sbin
/etc/init.d/o2cb enable
创建ocfs2文件系统,其中-N选项用于指明最多允许多少个节点同时使用此文件系统:
# mkfs -t ocfs2-N 2 /dev/sdh1
挂载分区:
# mount /dev/sdh1/ggate
配置启动自动载入(所有节点):
export PATH=$PATH:/sbin:/usr/sbin
chkconfig --add o2cb
/etc/init.d/o2cb configure
在/etc/rc.local增加入下内容:
chown -Roracle:dba /ggate
chmod -R 775 /ggate
3.3、源端安装GoldenGate
在GoldenGate安装目录(OCFS2目录/ggate)解压安装文件
unzipogg112101_fbo_ggs_Linux_x64_ora10g_64bit.zip
tar�Cxvf fbo_ggs_Linux_x64_ora10g_64bit.tar
设置环境变量
在用户参数文件中添加以下内容:
exportGGATE_HOME=/ggate
exportLD_LIBRARY_PATH=$GGATE_HOME:$ORACLE_HOME/lib
注意:添加后需使参数文件生效
安装GoldenGate
进入OGG控制台创建OGG工作目录
然后在安装目录下执行 ./ggsci 进入OGG控制台
执行命令 createsubdirs创建工作目录,显示如下:
GGSCI(node1) 1> create subdirs
Creatingsubdirectories under current directory /ggate
Parameterfiles /ggate/dirprm:already exists
Reportfiles /ggate/dirrpt:created
Checkpointfiles /ggate/dirchk:created
Processstatus files /ggate/dirpcs: created
SQLscript files /ggate/dirsql:created
Databasedefinitions files /ggate/dirdef: created
Extractdata files /ggate/dirdat: created
Temporaryfiles /ggate/dirtmp:created
Stdoutfiles /ggate/dirout:created
3.4、目标端安装GoldenGate
环境相同,安装方法与4.3一致,仅仅是安装位置不同,安装过程略,注意安装包与平台一致。
3.5、配置源端数据库
数据库模式配置
源端数据库必须开启归档模式
Alterdatabase archivelog;
开启最小附加日志
Alterdatabase add supplemental log data;
使用SELECTSUPPLEMENTAL_LOG_DATA_MIN FROM V$DATABASE;
可查看是否开启了最小附加日志;
源端数据库创建GoldenGate数据库用户并授权:(我们这里以ogg为例,使用其他亦可)
createuser ogg identified by oracle default tablespace DATA_OL;
grantconnect,resource,unlimited tablespace to ogg;
grantexecute on utl_file to ogg;
grantselect any dictionary,select any table to ogg;
grantalter any table to ogg;
grantflashback any table to ogg;
grantexecute on DBMS_FLASHBACK to ogg;
添加表级transdata
GGSCI(node1) 1> dblogin userid ogg,password oracle
Successfullylogged into database.
GGSCI(node1) 2> add trandata SCOTT.DEPT
Loggingof supplemental redo data enabled for table SCOTT.DEPT.
GGSCI(node1) 3> add trandata SCOTT.EMP
Loggingof supplemental redo data enabled for table SCOTT.EMP.
3.6、配置源端进程组
配置管理进程mgr:
GGSCI(node1) 1> edit param mgr
(粘贴下面这段配置)
PORT7839
DYNAMICPORTLIST7840-7939
--AUTOSTARTER *
AUTORESTARTEXTRACT *,RETRIES 5,WAITMINUTES 3
PURGEOLDEXTRACTS./dirdat/*,usecheckpoints, minkeepdays 3
LAGREPORTHOURS1
LAGINFOMINUTES30
LAGCRITICALMINUTES45
参数说明均与单点配置相同,参考3.5部分
启动管理进程:
GGSCI(node1) 2> start mgr
Managerstarted.
GGSCI(node1) 3> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
配置抽取进程:
GGSCI(node1) 6> add extract extnd,tranlog,begin now,threads 2
EXTRACTadded.
GGSCI(node1) 7> add exttrail ./dirdat/nd,extract extnd,megabytes 100
EXTTRAILadded.
GGSCI(node1) 8> edit params extnd
(粘贴下面这段配置)
EXTRACTextnd
SETENV(NLS_LANG = "AMERICAN_AMERICA.UTF8")
SETENV(ORACLE_HOME = "/u01/app/oracle/product/10.2.0/db_1")
USERID ogg@RAC, PASSWORDoracle
--GETTRUNCATES
REPORTCOUNTEVERY 1 MINUTES, RATE
DISCARDFILE./dirrpt/extnd.dsc,APPEND,MEGABYTES 1024
--THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 60000 IOLATENS60000
DBOPTIONS ALLOWUNUSEDCOLUMN
WARNLONGTRANS2h,CHECKINTERVAL 3m
EXTTRAIL./dirdat/nd
--TRANLOGOPTIONSEXCLUDEUSER USERNAME
FETCHOPTIONSNOUSESNAPSHOT
TRANLOGOPTIONS CONVERTUCS2CLOBS
TABLEscott.dept;
TABLEscott.emp;
注意:threads与RAC节点数相同即可,RAC中不再使用ORACLE_SID设置,而使用USERID ogg@RAC,注意两个节点均可连接数据库。
添加传输进程,配置参数
GGSCI(node1) 2> add extract dpend,exttrailsource ./dirdat/nd
EXTRACTadded.
GGSCI(node1) 3> add rmttrail /uo1/app/ogg/dirdat/nd, EXTRACT DPEND
RMTTRAILadded.
GGSCI(node1) 4> edit params dpend
(粘贴下面这段配置)
EXTRACTdpend
SETENV(NLS_LANG = AMERICAN_AMERICA.UTF8)
USERID ogg@RAC, PASSWORDoracle
PASSTHRU
RMTHOST10.123.112.235, MGRPORT 7839, compress
RMTTRAIL/uo1/app/ogg/dirdat/nd
TABLEscott.dept;
TABLEscott.emp;
3.7、配置目标数据库
目标库创建GoldenGate数据库用户并授权:
createuser ogg identified by oracle default tablespace USERS;
grantconnect,resource,unlimited tablespace to ogg;
grantexecute on utl_file to ogg;
grantselect any dictionary,select any table to ogg;
grantalter any table to ogg;
grantflashback any table to ogg;
grantexecute on DBMS_FLASHBACK to ogg;
grantinsert any table to ogg;
grantdelete any table to ogg;
grantupdate any table to ogg;
添加checkpoint表
GGSCI(sun.linux) 2> edit params GLOBALS
然后在参数文件中输入
GGSCHEMAogg
CHECKPOINTTABLEogg.checkpoint
GGSCI(sun.linux) 4> dblogin userid ogg,password oracle
Successfullylogged into database.
GGSCI(sun.linux) 5> add checkpointtable ogg.checkpoint
Successfullycreated checkpoint table ogg.checkpoint.
3.8、配置目标端进程组
配置MGR参数
GGSCI(sun.linux) 6> edit params mgr
(粘贴下面这段配置)
PORT7839
DYNAMICPORTLIST7840-7939
--AUTOSTARTER *
AUTORESTARTEXTRACT *,RETRIES 5,WAITMINUTES 3
PURGEOLDEXTRACTS./dirdat/*,usecheckpoints, minkeepdays 3
LAGREPORTHOURS1
LAGINFOMINUTES30
LAGCRITICALMINUTES45
配置复制队列
GGSCI(sun.linux)8> add replicat repnd,exttrail/uo1/app/ogg/dirdat/nd,checkpointtable ogg.checkpoint
REPLICATadded.
GGSCI(sun.linux) 10> edit params repnd
(粘贴下面这段配置)
REPLICATrepnd
SETENV(NLS_LANG = AMERICAN_AMERICA.UTF8)
USERIDogg, PASSWORD oracle
ASSUMETARGETDEFS
REPERRORdefault,discard
discardfile./dirrpt/repnd.dsc,append,megabytes 50
mapscott.*,target pmsbi.*;
3.9、启动进程进行数据同步
启动源端进程组
启动抽取进程和传输进程:
startextnd
startdpend
启动后使用info all查看进程状态,正常status应该RUNNING,显示如下:
GGSCI(node1) 19> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING DPEND 00:00:00 00:00:09
EXTRACT RUNNING EXTND 00:00:00 00:00:04
启动目标端进程
startrepnd
显示如下:
GGSCI(sun.linux) 2> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING REPND 00:00:00 00:00:03
到此RAC到单点OGG的安装配置就完成了,可以进行数据同步测试了。
4、RAC单机下的HA配置
第4部分的RACà单机的配置仅仅完成了数据复制的功能,不包含高可用的配置,当运行GoldenGate的节点出现故障时复制功能就将终止,如何使复制功能继续可用呢,有如下两种方式:
4.1、节点故障的手工处理方式
因为GoldenGate 安装在共享目录下,我们可以通过任一个节点连接到共享目录,启动GoldenGate运行界面。如果其中一个节点失败,导致GoldenGate进程中止,可以直接手工在另外一个节点启动进程组即可。
4.2、GoldenGate的HA配置
我们可以通过使用CRS来管理GoldenGate资源组,并且使用RAC的vip连接到GoldenGate,一旦数据库的某一个节点宕掉,Oracleclusterware将自动切换到另一个可用节点。
添加一个应用程序VIP资源
为GoldenGate vip资源创建一个profile
[oracle@node1ggate]$ cd $ORA_CRS_HOME/bin
[oracle@node1bin]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/bin
[oracle@node1 bin]$crs_profile �Ccreate ggvip �Ct application \\
�Ca /u01/app/oracle/product/10.2.0/crs_1\\
-o oi=eth0,ov=192.168.73.203,on=255.255.255.0
其中:ggvip为创建的应用程序vip的名字
把这个资源注册到CRS:
[oracle@node1 bin]$crs_register ggvip
把vip 的所有权给root,在root用户下执行:
[root@node1 bin]#./crs_setperm ggvip �Co root
为oracle用户分配启动这个资源的权限:
[root@node1 bin]#./crs_setperm ggvip �Cu user:oracle:r-x
通过oracle用户启动这个资源:
[oracle@node1bin]$ crs_start ggvip
Attempting tostart `ggvip` on member `node1`
Start of`ggvip` on member `node1` succeeded.
查看资源状态显示如下:
[oracle@node1bin]$ crs_stat ggvip -t
Name Type Target State Host
------------------------------------------------------------
ggvip application ONLINE ONLINE node1
创建一个action程序
action程序我们这里放到共享磁盘上,action程序最少需要可以接受三个参数:start,stop,check
start和stop:返回0成功,1 失败;
check :返回0表示GoldenGate在运行,1 表示不运行;
下面为示例程序 gg_action.scr的内容:
#!/bin/sh
#set the OracleGoldengate installation directory
exportGGS_HOME=/ggate
#set the oraclehome to the database to ensure GoldenGate will get the
#rightenvironment settings to be able to connect to the database
exportORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
#specify delayafter start before checking for successful start
start_delay_secs=5
#Include theGoldenGate home in the library path to start GGSCI
exportLD_LIBRARY_PATH=$ORACLE_HOME/lib:${GGS_HOME}:${LD_LIBRARY_PATH}
#check_processvalidates that a manager process is running at the PID
#thatGoldenGate specifies.
check_process() {
if ( [ -f"${GGS_HOME}/dirpcs/MGR.pcm" ] )
then
pid=`cut -f8"${GGS_HOME}/dirpcs/MGR.pcm"`
if [ ${pid} = `ps -e |grep ${pid} |grep mgr|cut -d " " -f2` ]
then
#manager process is running on the PID exitsuccess
exit 0
else
if [ ${pid} = `ps -e |grep ${pid} |grep mgr|cut -d " " -f1` ]
then
#manager process is running on the PID exitsuccess
exit 0
else
#manager process is not running on the PID
exit 1
fi
fi
else
#manager is not running because there is noPID file
exit 1
fi
}
#call_ggsci isa generic routine that executes a ggsci command
call_ggsci () {
ggsci_command=$1
ggsci_output=`${GGS_HOME}/ggsci << EOF
${ggsci_command}
exit
EOF`
}
case $1 in
'start')
#start manager
call_ggsci 'start manager'
#there is a small delay between issuing thestart manager command
#and the process being spawned on the OS.wait before checking
sleep ${start_delay_secs}
#check whether manager is running and exitaccordingly
check_process
;;
'stop')
#attempt a clean stop for all non-managerprocesses
#call_ggsci 'stop er *'
#ensure everything is stopped
call_ggsci 'stop er *!'
#call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'check')
check_process
;;
'clean')
#attempt a clean stop for all non-managerprocesses
#call_ggsci 'stop er *'
#ensure everything is stopped
#call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'abort')
#ensure everything is stopped
call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
esac
添加一个应用程序profile
[oracle@node1 ggate]$cd $ORA_CRS_HOME/bin
[oracle@node1bin]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/bin
[oracle@node1 bin]$crs_profile �Ccreate GG_app �Ct application \\
�Cr ggvip �Ca/ggate/gg_action.scr �Co ci=10
其中:-r ggvip表示ggvip必须在GoldenGate启动之前运行,
-a /ggate/gg_action.scr指定action 脚本的位置,在每个节点必须都可用
�Co ci=10:检查的时间间隔设置为10
把这个资源注册到CRS:
[oracle@node1 bin]$crs_register GG_app
把vip 的所有权给root,在root用户下执行:
[root@node1 bin]#./crs_setperm ggvip �Co oracle
为oracle用户分配启动这个资源的权限:
[root@node1 bin]#./crs_setperm GG_app �Cu user:oracle:r-x
通过oracle用户启动这个资源:
[oracle@node1bin]$ crs_start GG_app
Attempting tostart `GG_app` on member `node1`
Start of`GG_app` on member `node1` succeeded.
查看资源状态显示如下:
[oracle@node1bin]$ crs_stat GG_app -t
Name Type Target State Host
------------------------------------------------------------
GG_app application ONLINE ONLINE node1
测试节点迁移
在测试环境中可以使用crs_relocate �CfGG_app使它强行漂移:过程显示如下:
[oracle@node1~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
GG_app application ONLINE ONLINE node1
ggvip application ONLINE ONLINE node1
ora....AC1.srv application ONLINE ONLINE node1
ora....AC2.srv application ONLINE ONLINE node2
ora.RAC.RAC.cs application ONLINE ONLINE node2
ora....C1.inst application ONLINE ONLINE node1
ora....C2.inst application ONLINE ONLINE node2
ora.RAC.db application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
[oracle@node1 ~]$ crs_relocate -f GG_app
Attempting to stop `GG_app` on member `node1`
Stop of `GG_app` on member `node1` succeeded.
Attempting to stop `ggvip` on member `node1`
Stop of `ggvip` on member `node1` succeeded.
Attempting to start `ggvip` on member `node2`
Start of `ggvip` on member `node2` succeeded.
Attempting to start `GG_app` on member `node2`
Start of `GG_app` on member `node2` succeeded.
[oracle@node1~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
GG_app application ONLINE ONLINE node2
ggvip application ONLINE ONLINE node2
ora....AC1.srv application ONLINE ONLINE node1
ora....AC2.srv application ONLINE ONLINE node2
ora.RAC.RAC.cs application ONLINE ONLINE node2
ora....C1.inst application ONLINE ONLINE node1
ora....C2.inst application ONLINE ONLINE node2
ora.RAC.db application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
可以看到GoldenGate成功转移到2节点运行了。
5、常见错误及解决方法
5.1、OGG-00446
启动源端抽取进程extnd,ggserr.log错误显示如下:
2012-08-1711:11:38 ERROR OGG-00446 Oracle GoldenGate Capture for Oracle, extnd.prm: Could not find archived log for sequence45835 thread 1 under default destinations SQL , errorretrieving redo file name for sequence 45835, archived = 1, use_alternate =0Not able to establish initial position for begin time 2012-08-15 17:28:28.
导致原因:早期归档日志被删除或已备份,导致找不到归档日志文件;
处理方法:
将备份的归档日志恢复到归档日志目录下,即可解决错误;
测试库可以指定抽取进程从某个时间点开始读取日志,跳过已删除的归档日志文件,命令如下:alterextract extnd,begin 2012-8-16 16:38;
5.2、OGG-01223
启动源端传输进程DPEND,ggserr.log错误显示如下:
2012-08-1711:43:50 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle,dpend.prm: TCP/IP error 79 (Connectionrefused).
2012-08-1711:45:01 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle,dpend.prm: TCP/IP error 79 (Connectionrefused).
导致原因:因为目标端110上MGR进程没有启动,导致报错
处理方法:
在目标端启动startmgr启动进程后,再启动源端的传输进程DPEND,错误消失,文件顺利传输过来了。
正常的日志如下:
2012-08-1714:31:51 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, dpend.prm: EXTRACT DPEND started.
2012-08-1714:33:13 INFO OGG-01226 Oracle GoldenGate Capture for Oracle, dpend.prm: Socket buffer size set to 27985 (flush size27985).
2012-08-1714:33:26 INFO OGG-01052 Oracle GoldenGate Capture for Oracle, dpend.prm: No recovery is required for target file F:\\ogg\\dirdat\\nd000000,at RBA 0 (file not opened).
2012-08-1714:33:26 INFO OGG-01478 Oracle GoldenGate Capture for Oracle, dpend.prm: Output file F:\\ogg\\dirdat\\nd is using formatRELEASE 11.2.
5.3、OGG-01224
启动源端传输进程DPEND,ggserr.log错误显示如下:
2012-08-2205:33:10 ERROR OGG-01224 Oracle GoldenGate Capture for Oracle, dpend.prm: TCP/IP error 113 (No route to host).
2012-08-2205:33:10 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, dpend.prm: PROCESS ABENDING.
导致原因:因为目标端235上的防火墙没有关闭,导致报错
处理方法:
在目标端机器关闭防火墙后,再启动源端的传输进程DPEND,错误消失,文件顺利传输过来了。
5.4、OGG-01031
启动源端传输进程DPEND,ggserr.log错误显示如下:
2012-08-28 15:09:39 ERROR OGG-01031 Oracle GoldenGateCapture for Oracle, dpend.prm: There isa problem in network communication, a remote file problem, encryption keys fortarget and source do not match (if using ENCRYPT) or an unknown error. (Replyreceived is Unable to open file "/uo1/app/ogg/dirdat/nd000004" (error2, No such file or directory)).
2012-08-28 15:09:41 ERROR OGG-01668 Oracle GoldenGateCapture for Oracle, dpend.prm: PROCESSABENDING.目标端ggserr.log错误显示如下:
2012-08-2815:06:30 WARNING OGG-01223 Oracle GoldenGate Collector for Oracle: Unable to lock file"/uo1/app/ogg/dirdat/nd000004" (error 11, Resource temporarilyunavailable). Lock currently held byprocess id (PID) 13854.
2012-08-2815:06:30 WARNING OGG-01223 Oracle GoldenGate Collector for Oracle: Unable to open file"/uo1/app/ogg/dirdat/nd000004" (error 2, No such file or directory).
导致原因:可能是网络出现过故障,OGG源端的Data Pump进程与目标断了联系,目标端mgr为其启动的server进程一直还在运行,下次data pump重启时目标mgr会试图生成另外一个server进程,这样两个进程会争同一个队列文件。
处理方法:
1、停掉源端的所有data pump,使用ps �Cef|grep server(或OGG安装目录)看看是不是还有OGG的server进程在跑,如果有,杀死它(一定要确认源端data pump全停掉,并且杀的是server进程,不要杀其它extract/replicat/mgr等),重启源端data pump即可。
2、可能是目标端的trail file出问题了,前滚重新生成一个新的队列文件
SEND EXTRACT xxx ETROLLOVER
或者:alter extract xxx etrollover
xxx为datapump的名称
5.5、OGG-01154
错误信息:2011-03-29 15:53:57 WARNINGOGG-01154 Oracle GoldenGate Delivery forOracle, repya.prm: SQL error 14402mapping EPMA.D_METER to E
PMA.D_METER OCIError ORA-14402: updating partition key column would cause a partition change(status = 14402), SQL .
导致原因:源端更新了分区列,但目标端没有打开行移动,导致更新时报错;
处理方法:SQLPLUS>alter table SCHEMA.TABLENAME enable row movement;