测试环境:
源端SUSE10 32bit oracle10.2.04 ogg 11.2.1.0.1
目标端SUSE11 64bit oracle11.2.02 ogg 11.2.1.0.1
测试环境ogg配置信息:
源端mgr、ext、pump
GGSCI (testa) 13> view params mgr
port 7809
GGSCI (testa) 14> view params ext1
extract ext1
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
userid goldengate@stm01, password goldengate
DISCARDFILE ./dirrpt/ext1.dsc , APPEND,MEGABYTES 100
exttrail /ogg/stm01trail/lt
GETTRUNCATES;
obey ./dirsql/transdb_table.txt
GGSCI (testa) 15> view params pump1
extract pump1
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
passthru
rmthost 192.168.211.12, mgrport 7809, compress
rmttrail /ogg/transdbtrail/rt
GETTRUNCATES;
table testogg.*;
GGSCI (testa) 16> exit
ogg@testa:~> more ./dirsql/transdb_table.txt
TABLE TESTOGG.TESTA;
TABLE TESTOGG.TESTB;
目标端mgr、replicat
GGSCI (transdb) 3> view params mgr
PORT 7809
DYNAMICPORTLIST 7840-7914
PURGEOLDEXTRACTS /ogg/transdbtrail/*,usecheckpoints, minkeepdays 3
PURGEOLDEXTRACTS /ogg/targetdbtrail/*,usecheckpoints, minkeepdays 3
LAGREPORTHOURS 1
LAGINFOMINUTES 30
GGSCI (transdb) 4> view params rep1
REPLICAT rep1
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
USERID goldengate, PASSWORD goldengate
SQLEXEC "ALTER SESSION SET CONSTRAINTS=DEFERRED"
REPORT AT 00:01
REPORTCOUNT EVERY 720 MINUTES, RATE
REPERROR DEFAULT, ABEND
assumetargetdefs
DISCARDFILE ./dirrpt/rep1.dsc, APPEND, MEGABYTES 100
DISCARDROLLOVER AT 02:30
GETTRUNCATES
ALLOWNOOPUPDATES
MAP TESTOGG.*, TARGET TESTOGG.*;
源端和目标端建立的测试表语句:
create table testb (a int not null,b int not null,c int);
create unique index idx01 on testb (b);
create index idx on testb (a);
源端进行add trandata操作语句:
GGSCI (testa) 1> dblogin userid goldengate, password goldengate
Successfully logged into database.
GGSCI (testa) 2> add trandata testogg.testb
Logging of supplemental redo data enabled for table TESTOGG.TESTB.
查看源端数据库testb的补充日志字段
SQL> select * from dba_log_group_columns
2 where log_group_name in (select log_group_name from dba_log_groups where owner='TESTOGG' and table_name='TESTB') order by position;
OWNER LOG_GROUP_NAME TABLE_NAME COLUMN_NAME POSITION LOGGIN
------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------- ------
TESTOGG GGS_11668 TESTB B 1 LOG
可以看到ogg命令行中add trandata操作在oracle数据库中补充日志选择的是仅有的那个非空唯一索引所在列
在源端插入测试数据
SQL> insert into testb values (1,1,1);
SQL> insert into testb values (1,2,3);
SQL> insert into testb values (2,3,3);
SQL> commit;
Commit complete.
在源端更新数据
SQL> update testb set c=2 where c=1;
1 row updated.
SQL> commit;
Commit complete.
注意此时同步过去的值应该是c(改变数据)和b(补充日志记录字段)
去目标端队列文件中查看此sql的队列信息
GGSCI (transdb) 11> info rep1
REPLICAT REP1 Last Started 2012-08-25 13:42 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:05 ago)
Log Read Checkpoint File /ogg/transdbtrail/rt000011
2012-08-25 13:43:48.450714 RBA 2999
查看logdump找到最新的一条记录
ogg@transdb:/ogg/transdbtrail> logdump
Oracle GoldenGate Log File Dump Utility for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
Logdump 14 >open rt000011
Current LogTrail is /ogg/transdbtrail/rt000011
Logdump 15 >ghdr on
Logdump 16 >detail on
一直n直到没消息显示
Logdump 30 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 27 (x001b) IO Time : 2012/08/25 13:43:29.607.325
IOType : 5 (x05) OrigNode : 255 (xff)
TransInd : . (x00) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 32 AuditPos : 12638736
Continued : N (x00) RecCount : 1 (x01)
2012/08/25 13:43:29.607.325 Insert Len 27 RBA 2607
Name: TESTOGG.TESTB
After Image: Partition 4 G b
0000 0005 0000 0001 3100 0100 0500 0000 0131 0002 | ........1........1..
0005 0000 0001 31 | ......1
Column 0 (x0000), Len 5 (x0005)
Column 1 (x0001), Len 5 (x0005)
Column 2 (x0002), Len 5 (x0005)
Logdump 31 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 27 (x001b) IO Time : 2012/08/25 13:43:29.607.325
IOType : 5 (x05) OrigNode : 255 (xff)
TransInd : . (x02) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 32 AuditPos : 12641808
Continued : N (x00) RecCount : 1 (x01)
2012/08/25 13:43:29.607.325 Insert Len 27 RBA 2748
Name: TESTOGG.TESTB
After Image: Partition 4 G e
0000 0005 0000 0001 3100 0100 0500 0000 0132 0002 | ........1........2..
0005 0000 0001 33 | ......3
Column 0 (x0000), Len 5 (x0005)
Column 1 (x0001), Len 5 (x0005)
Column 2 (x0002), Len 5 (x0005)
Logdump 32 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 18 (x0012) IO Time : 2012/08/25 13:43:48.450.714
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 32 AuditPos : 12646416
Continued : N (x00) RecCount : 1 (x01)
2012/08/25 13:43:48.450.714 FieldComp Len 18 RBA 2867
Name: TESTOGG.TESTB
After Image: Partition 4 G s
0001 0005 0000 0001 3100 0200 0500 0000 0132 | ........1........2
Column 1 (x0001), Len 5 (x0005)
Column 2 (x0002), Len 5 (x0005)
Logdump 33 >n
Logdump 34 >n
Logdump 35 >n
Logdump 36 >
可以发现刚同步过来的update的rba号为2867,ogg操作类型为FieldComp所取得字段序号是1和2(注意是从0开始算),对应testb表字段则为字段b和字段c
以上状态可以视为此次27索引变更前的状态
此时源端表和目标端表数据:
SQL> select * from testb;
A B C
---------- ---------- ----------
2 3 3
1 1 2
1 2 3
接下来进行索引修改操作,模拟6个步骤
第1步:停止ogg队列,删除普通索引
源端
GGSCI (testa) 8> stop *
Sending STOP request to EXTRACT EXT1 ...
Request processed.
Sending STOP request to EXTRACT PUMP1 ...
Request processed.
GGSCI (testa) 9> inf oall
ERROR: Invalid command.
GGSCI (testa) 10> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT1 00:00:00 00:00:02
EXTRACT STOPPED PUMP1 00:00:00 00:00:02
目标
GGSCI (transdb) 8> stop *
EXTRACT EXT2 is already stopped.
EXTRACT PUMP2 is already stopped.
EXTRACT PUMP3 is already stopped.
Sending STOP request to REPLICAT REP1 ...
Request processed.
GGSCI (transdb) 9> inf oall
ERROR: Invalid command.
GGSCI (transdb) 10> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT2 00:00:00 38:13:20
EXTRACT STOPPED PUMP2 00:00:00 38:13:20
EXTRACT STOPPED PUMP3 00:00:00 37:52:01
REPLICAT STOPPED REP1 00:00:00 00:00:03
在两边删除索引:
SQL> show user
USER is "TESTOGG"
SQL> drop index idx;
Index dropped.
第2步:添加唯一索引
发现报错
SQL> create unique index idx on testb (a);
create unique index idx on testb (a)
*
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found
第3步:启动ogg队列,删除重复数据(注意此步操作,原B字段的trandata依旧有效),停止ogg队列
源端:
GGSCI (testa) 12> start *
Sending START request to MANAGER ...
EXTRACT EXT1 starting
Sending START request to MANAGER ...
EXTRACT PUMP1 starting
GGSCI (testa) 13> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT1 01:11:26 00:00:00
EXTRACT RUNNING PUMP1 00:00:00 01:11:22
目标端:
GGSCI (transdb) 12> start rep1
Sending START request to MANAGER ...
info allREPLICAT REP1 starting
GGSCI (transdb) 15> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT2 00:00:00 39:24:43
EXTRACT STOPPED PUMP2 00:00:00 39:24:43
EXTRACT STOPPED PUMP3 00:00:00 39:03:24
REPLICAT RUNNING REP1 00:00:00 01:11:26
源端数据库中进行数据删除
删除重复数据1,2,3行
SQL> delete from testb where a=1 and b=2;
1 row deleted.
SQL> commit;
Commit complete.
SQL>
SQL> select * from testb;
A B C
---------- ---------- ----------
2 3 3
1 1 2
查看目标端数据
SQL> show user
SQL> select * from testb;
A B C
---------- ---------- ----------
2 3 3
1 1 2
SQL>
查看目标端最新的队列文件中logdump日志:
Logdump 41 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x00)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 0 (x0000) IO Time : 2012/08/25 15:52:58.692.384
IOType : 151 (x97) OrigNode : 0 (x00)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 0 AuditPos : 0
Continued : N (x00) RecCount : 0 (x00)
2012/08/25 15:52:58.692.384 RestartOK Len 0 RBA 1437
Name:
After Image: Partition 0 G s
Logdump 42 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: B (x42)
RecLength : 9 (x0009) IO Time : 2012/08/25 15:56:04.480.252
IOType : 3 (x03) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 32 AuditPos : 15728656
Continued : N (x00) RecCount : 1 (x01)
2012/08/25 15:56:04.480.252 Delete Len 9 RBA 1496
Name: TESTOGG.TESTB
Before Image: Partition 4 G s
0001 0005 0000 0001 32 | ........2
Column 1 (x0001), Len 5 (x0005)
Logdump 43 >n
Logdump 44 >n
Logdump 45 >n
Logdump 46 >n
Logdump 47 >n
Logdump 48 >n
Logdump 49 >
字段序号1表示字段b,即补充日志记录字段。ogg操作类型属于delete,故自动选用b字段作唯一行标识。这步验证,证明了截止到这里,源端trandata队列不因a字段上索引的删除而失效,反应到生产上是索引删除后进行重复数据的过程和结果都是正常并且未产生任何影响。
停止ogg队列进程
源端
GGSCI (testa) 19> stop *
Sending STOP request to EXTRACT EXT1 ...
Request processed.
Sending STOP request to EXTRACT PUMP1 ...
Request processed.
GGSCI (testa) 20> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT1 00:00:00 00:00:01
EXTRACT STOPPED PUMP1 00:00:00 00:00:01
目标端:
GGSCI (transdb) 19> stop *
EXTRACT EXT2 is already stopped.
EXTRACT PUMP2 is already stopped.
EXTRACT PUMP3 is already stopped.
Sending STOP request to REPLICAT REP1 ...
Request processed.
GGSCI (transdb) 20> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT2 00:00:00 39:44:34
EXTRACT STOPPED PUMP2 00:00:00 39:44:34
EXTRACT STOPPED PUMP3 00:00:00 39:23:14
REPLICAT STOPPED REP1 00:00:00 00:00:01
第4步:再次添加唯一索引
在源端和目标端中添加唯一索引
SQL> create unique index idx on testb (a);
Index created.
建立完毕后,检查数据库testb表的补充日志字段是否有变化
SQL> select * from dba_log_group_columns
2 where log_group_name in (select log_group_name from dba_log_groups where owner='TESTOGG' and table_name='TESTB') order by position;
OWNER LOG_GROUP_NAME TABLE_NAME COLUMN_NAME POSITION LOGGIN
------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------- ------
TESTOGG GGS_11668 TESTB B 1 LOG
第5步:添加trandata报补充日志已存在信息
生产上发生故障的情况是在ogg命令行中,通过add trandata进行添加runsvr.espostrn的补充日志,但返回已存在信息。
这步测试,我们先不做add trandata步骤,直接启动ogg进程,观察队列情况
源端:
GGSCI (testa) 21> start *
Sending START request to MANAGER ...
EXTRACT EXT1 starting
Sending START request to MANAGER ...
EXTRACT PUMP1 starting
GGSCI (testa) 22> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT1 00:00:00 00:06:12
EXTRACT STOPPED PUMP1 00:00:00 00:06:12
GGSCI (testa) 23> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT1 00:06:19 00:00:00
EXTRACT RUNNING PUMP1 00:00:00 00:06:14
目标端:
GGSCI (transdb) 22> start rep1
Sending START request to MANAGER ...
info allREPLICAT REP1 starting
GGSCI (transdb) 23>
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT2 00:00:00 39:50:39
EXTRACT STOPPED PUMP2 00:00:00 39:50:38
EXTRACT STOPPED PUMP3 00:00:00 39:29:19
REPLICAT RUNNING REP1 00:00:00 00:00:00
队列状态显示running正常,发起一条测试数据,由于目前trandata的附加日志是b字段,但表中非空唯一索引已经有2个,所以选用c字段做条件进行c字段值的更改。
源端:
SQL>
SQL> update testb set c=1 where c=2;
1 row updated.
SQL> commit;
Commit complete.
SQL> select * from testb;
A B C
---------- ---------- ----------
2 3 3
1 1 1
目标端:
SQL> select * from testb;
A B C
---------- ---------- ----------
2 3 3
1 1 2
查看目标端replicat进程
GGSCI (transdb) 27> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT2 00:00:00 39:57:52
EXTRACT STOPPED PUMP2 00:00:00 39:57:52
EXTRACT STOPPED PUMP3 00:00:00 39:36:33
REPLICAT ABENDED REP1 00:00:03 00:02:20
已经停止,查看具体信息
2012-08-25 16:24:03 WARNING OGG-01004 Aborted grouped transaction on 'TESTOGG.TESTB', Database error 1403 (OCI Error ORA-01403: no data found, SQL <UPDATE "TESTOGG"."
TESTB" SET "B" = :a1,"C" = :a2 WHERE "A" = :b0>).
2012-08-25 16:24:03 WARNING OGG-01003 Repositioning to rba 1525 in seqno 13.
2012-08-25 16:24:03 WARNING OGG-01154 SQL error 1403 mapping TESTOGG.TESTB to TESTOGG.TESTB OCI Error ORA-01403: no data found, SQL <UPDATE "TESTOGG"."TESTB" SET "B"
= :a1,"C" = :a2 WHERE "A" = :b0>.
2012-08-25 16:24:03 WARNING OGG-01003 Repositioning to rba 1525 in seqno 13.
Source Context :
SourceModule : [er.errors]
SourceID : [/scratch/aime1/adestore/views/aime1_adc4150256/oggcore/OpenSys/src/app/er/errors.cpp]
SourceFunction : [take_rep_err_action]
SourceLine : [623]
ThreadBacktrace : [8] elements
: [/ogg/software/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7fa06dea606e]]
: [/ogg/software/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x2cc) [0x7fa06dea244c]]
: [/ogg/software/libgglog.so(_MSG_ERR_MAP_TO_TANDEM_FAILED(CSourceContext*, ggs::gglib::ggapp::CQualDBObjName<(DBObjType)1> const&, ggs::gglib
::ggapp::CQualDBObjName<(DBObjType)1> const&, CMessageFactory::MessageDisposition)+0x53) [0x7fa06de9af19]]
: [/ogg/software/replicat(take_rep_err_action(short, int, char const*, extr_ptr_def*, __std_rec_hdr*, char*, file_def*, bool)+0xdac) [0x51daa0
]]
: [/ogg/software/replicat(process_extract_loop()+0x2240) [0x536ab0]]
: [/ogg/software/replicat(main+0x732) [0x548752]]
: [/lib64/libc.so.6(__libc_start_main+0xe6) [0x7fa06849cbc6]]
: [/ogg/software/replicat(__gxx_personality_v0+0x322) [0x4be48a]]
2012-08-25 16:24:03 ERROR OGG-01296 Error mapping from TESTOGG.TESTB to TESTOGG.TESTB.
报出ogg-01296错误,注意warning中,ogg-01154错误中的语句where条件,不再是补充日志b字段的值,而是选择了刚添加上唯一索引的字段a,继续查看logdump和dsc文件日志
Logdump 36 >open rt000013
Current LogTrail is /ogg/transdbtrail/rt000013
Logdump 37 >ghdr on
Logdump 38 >detail on
Logdump 39 >pos 1525
Reading forward from RBA 1525
Logdump 40 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 26 (x001a) IO Time : 2012/08/25 16:23:59.822.700
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 32 AuditPos : 16718864
Continued : N (x00) RecCount : 1 (x01)
2012/08/25 16:23:59.822.700 FieldComp Len 26 RBA 1525
Name: TESTOGG.TESTB
After Image: Partition 4 G s
0000 0004 ffff 0000 0001 0005 0000 0001 3100 0200 | ................1...
0500 0000 0131 | .....1
Column 0 (x0000), Len 4 (x0004)
Column 1 (x0001), Len 5 (x0005)
Column 2 (x0002), Len 5 (x0005)
Logdump 41 >n
Logdump 42 >n
Logdump 43 >n
Logdump 44 >
显示三个字段值均被抽取,注意FieldComp,这操作应该取出的是修改值和补充日志字段,但注意a字段也在其中,继续查看dsc文件内错误日志
Oracle GoldenGate Delivery for Oracle process started, group REP1 discard file opened: 2012-08-25 16:19:10
Current time: 2012-08-25 16:24:03
Discarded record from action ABEND on error 1403
OCI Error ORA-01403: no data found, SQL <UPDATE "TESTOGG"."TESTB" SET "B" = :a1,"C" = :a2 WHERE "A" = :b0>
Aborting transaction on /ogg/transdbtrail/rt beginning at seqno 13 rba 1525
error at seqno 13 rba 1525
Problem replicating TESTOGG.TESTB to TESTOGG.TESTB
Record not found
Mapping problem with compressed update record (target format)...
*
A =
B = 1
C = 1
*
Process Abending : 2012-08-25 16:24:03
根据dsc日志文件内容,可以很明显的看到B和C的值,其中C=1为变化后的值,B=1为补充日志字段唯一标示行的值。但此时出现了A= null的现象,显然这不应该出现在这里。这与生产报错日志情况一致,均是新添加的非空唯一索引所在的字段,并且无法从源端日志中抽取出值,因为源端补充日志字段是B而不是A
接着,我们修复队列,模拟生产的操作,将表删除空间,使环境回到步骤4结束时的状态(这步详细步骤略),之后开始模拟生产操作add trandata
第4部结束后,源端和目标端testb表数据一致,a字段和b字段均有非空约束且有唯一索引
SQL> select * from testb;
A B C
---------- ---------- ----------
2 3 3
1 1 2
在源端ogg控制台上进行add trandata操作
GGSCI (testa) 46> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT1 00:00:00 00:00:09
EXTRACT STOPPED PUMP1 00:00:00 00:00:09
GGSCI (testa) 47> dblogin userid goldengate,password goldengate
Successfully logged into database.
GGSCI (testa) 48> add trandata testogg.testb
Logging of supplemental redo log data is already enabled for table TESTOGG.TESTB.
接着进行模拟生产第6步操作:启动ogg队列
源端
GGSCI (testa) 50> start *
Sending START request to MANAGER ...
EXTRACT EXT1 starting
Sending START request to MANAGER ...
EXTRACT PUMP1 starting
GGSCI (testa) 58> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT1 00:00:00 00:00:04
EXTRACT RUNNING PUMP1 00:00:00 00:00:07
状态看似正常
目标端:
GGSCI (transdb) 15> start rep1
Sending START request to MANAGER ...
inREPLICAT REP1 starting
GGSCI (transdb) 19> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT2 00:00:00 40:25:27
EXTRACT STOPPED PUMP2 00:00:00 40:25:27
EXTRACT STOPPED PUMP3 00:00:00 40:04:08
REPLICAT RUNNING REP1 00:00:00 00:00:06
进程状态显示也是正常
但此时,此条链路已经存在隐患
接着做update testb set c=1 where c=2;操作
源端:
SQL> update testb set c=1 where c=2;
1 row updated.
SQL> commit;
Commit complete.
SQL> select * from testb;
A B C
---------- ---------- ----------
2 3 3
1 1 1
目标端:
SQL> /
A B C
---------- ---------- ----------
2 3 3
1 1 2
值未变化,查看ogg队列
GGSCI (transdb) 20> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT2 00:00:00 40:27:17
EXTRACT STOPPED PUMP2 00:00:00 40:27:17
EXTRACT STOPPED PUMP3 00:00:00 40:05:58
REPLICAT ABENDED REP1 00:00:04 00:00:33
目标端replicat进程再次ABENDED,继续查看错误信息
2012-08-25 16:55:16 WARNING OGG-01004 Aborted grouped transaction on 'TESTOGG.TESTB', Database error 1403 (OCI Error ORA-01403: no data found, SQL <UPDATE "TESTOGG"."
TESTB" SET "B" = :a1,"C" = :a2 WHERE "A" = :b0>).
2012-08-25 16:55:16 WARNING OGG-01003 Repositioning to rba 1612 in seqno 16.
2012-08-25 16:55:16 WARNING OGG-01154 SQL error 1403 mapping TESTOGG.TESTB to TESTOGG.TESTB OCI Error ORA-01403: no data found, SQL <UPDATE "TESTOGG"."TESTB" SET "B"
= :a1,"C" = :a2 WHERE "A" = :b0>.
2012-08-25 16:55:16 WARNING OGG-01003 Repositioning to rba 1612 in seqno 16.
Source Context :
SourceModule : [er.errors]
SourceID : [/scratch/aime1/adestore/views/aime1_adc4150256/oggcore/OpenSys/src/app/er/errors.cpp]
SourceFunction : [take_rep_err_action]
SourceLine : [623]
ThreadBacktrace : [8] elements
: [/ogg/software/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7f74e094e06e]]
: [/ogg/software/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x2cc) [0x7f74e094a44c]]
: [/ogg/software/libgglog.so(_MSG_ERR_MAP_TO_TANDEM_FAILED(CSourceContext*, ggs::gglib::ggapp::CQualDBObjName<(DBObjType)1> const&, ggs::gglib
::ggapp::CQualDBObjName<(DBObjType)1> const&, CMessageFactory::MessageDisposition)+0x53) [0x7f74e0942f19]]
: [/ogg/software/replicat(take_rep_err_action(short, int, char const*, extr_ptr_def*, __std_rec_hdr*, char*, file_def*, bool)+0xdac) [0x51daa0
]]
: [/ogg/software/replicat(process_extract_loop()+0x2240) [0x536ab0]]
: [/ogg/software/replicat(main+0x732) [0x548752]]
: [/lib64/libc.so.6(__libc_start_main+0xe6) [0x7f74daf44bc6]]
: [/ogg/software/replicat(__gxx_personality_v0+0x322) [0x4be48a]]
2012-08-25 16:55:16 ERROR OGG-01296 Error mapping from TESTOGG.TESTB to TESTOGG.TESTB.
与之前错误信息一致,查看logdump和dsc
ogg@transdb:/ogg/transdbtrail> logdump
Oracle GoldenGate Log File Dump Utility for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
Logdump 44 >open rt000016
Current LogTrail is /ogg/transdbtrail/rt000016
Logdump 45 >ghdr on
Logdump 46 >detail on
Logdump 47 >pos 1612
Reading forward from RBA 1612
Logdump 48 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 26 (x001a) IO Time : 2012/08/25 16:55:12.068.279
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 32 AuditPos : 17156624
Continued : N (x00) RecCount : 1 (x01)
2012/08/25 16:55:12.068.279 FieldComp Len 26 RBA 1612
Name: TESTOGG.TESTB
After Image: Partition 4 G s
0000 0004 ffff 0000 0001 0005 0000 0001 3100 0200 | ................1...
0500 0000 0131 | .....1
Column 0 (x0000), Len 4 (x0004)
Column 1 (x0001), Len 5 (x0005)
Column 2 (x0002), Len 5 (x0005)
Logdump 49 >n
Logdump 50 >n
Logdump 51 >n
Logdump 52 >
查看dsc错误日志
Oracle GoldenGate Delivery for Oracle process started, group REP1 discard file opened: 2012-08-25 16:52:50
Current time: 2012-08-25 16:55:16
Discarded record from action ABEND on error 1403
OCI Error ORA-01403: no data found, SQL <UPDATE "TESTOGG"."TESTB" SET "B" = :a1,"C" = :a2 WHERE "A" = :b0>
Aborting transaction on /ogg/transdbtrail/rt beginning at seqno 16 rba 1612
error at seqno 16 rba 1612
Problem replicating TESTOGG.TESTB to TESTOGG.TESTB
Record not found
Mapping problem with compressed update record (target format)...
*
A =
B = 1
C = 1
*
Process Abending : 2012-08-25 16:55:16
问题现象同之前一致。
这验证了两个问题:
1.引起故障的操作是添加了唯一索引并且未删除重新添加trandata导致。导致这个问题的原因是ogg捕获进程的选择唯一标识行的列(即所谓的添加trandata操作)与oracle的补充日志所在列不一致,导致ogg捕获进程选择列从oracle日志文件中取出的a字段值为null,因为oracle的补充日志根本不是在a字段上,而是b字段上。
2.该故障不是对已有trandata日志进行重复add trandata操作引起。
问题深究:
为什么ogg捕获进程的选择唯一标识行的列(即所谓的添加trandata操作)与oracle的补充日志所在列不一致?
这就是之前所说的:
当有两个非空唯一索引存在时,ogg命令行中进行add trandata操作会发生什么情况呢?根据oracle官方资料,oracle会根据两个非空唯一索引名进行判断选择其中一个作为补充日志记录依据。可自行测试验证。
由此,可以推断出,ogg命令在进行add trandata操作的时候,实际是根据ogg的选择习惯去在数据库中添加trandata,在启动捕获进程后,捕获进程根据ogg选择唯一标识行的列的习惯,去进行数据捕获,由此产生了ogg和oracle补充日志两边不一致的情况。
继续展开,说道ogg的选择列习惯,此次发生的问题是由于runsvr.espostrn表由原先一个非空唯一索引变为两个非空唯一索引,ogg依据索引名选择其中一个添加trandata,接下来做一个测试,验证是如何依据名称来判断:
使用刚才的testb表,在源端进行不同名称索引删除重建,并且每次重建后,删除trandata并重新添加的操作
情况一:索引名为IDX和IDX01的情况
GGSCI (testa) 62> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED EXT1 00:00:00 00:00:12
EXTRACT STOPPED PUMP1 00:00:00 00:00:11
GGSCI (testa) 63> dblogin userid goldengate,password goldengate
Successfully logged into database.
GGSCI (testa) 64>
GGSCI (testa) 64> delete trandata testogg.testb
Logging of supplemental redo log data disabled for table TESTOGG.TESTB.
查看数据库中索引和表信息
SQL> select a.index_name,b.column_name,a.table_name,a.uniqueness,a.status from dba_indexes a,dba_ind_columns b where a.owner='TESTOGG' and a.table_name='TESTB' and a.index_name=b.index_name;
INDEX_NAME COLUMN_NAME TABLE_NAME UNIQUENES STATUS
------------------------------ ------------------------------ ------------------------------ --------- --------
IDX01 B TESTB UNIQUE VALID
IDX A TESTB UNIQUE VALID
SQL> desc testb
Name Null? Type
----------------------------------------- -------- ----------------------------
A NOT NULL NUMBER(38)
B NOT NULL NUMBER(38)
C NUMBER(38)
在ogg命令行中进行add trandata操作
GGSCI (testa) 65> add trandata testogg.testb
Logging of supplemental redo data enabled for table TESTOGG.TESTB.
查看数据库中补充日志记录字段
SQL> select * from dba_log_group_columns where owner='TESTOGG' and table_name='TESTB';
OWNER LOG_GROUP_NAME TABLE_NAME COLUMN_NAME POSITION LOGGIN
------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------- ------
TESTOGG GGS_11668 TESTB A 1 LOG
现象,ogg选择了IDX索引名称的列作为oracle补充日志列
情况二:索引名为IDX01和IDX02的情况
GGSCI (testa) 66> delete trandata testogg.testb
Logging of supplemental redo log data disabled for table TESTOGG.TESTB.
重建IDX,改名为IDX02
SQL> drop index idx;
Index dropped.
SQL> create unique index idx02 on testb (a);
Index created.
SQL>
SQL>
SQL> select a.index_name,b.column_name,a.table_name,a.uniqueness,a.status from dba_indexes a,dba_ind_columns b where a.owner='TESTOGG' and a.table_name='TESTB' and a.index_name=b.index_name;
INDEX_NAME COLUMN_NAME TABLE_NAME UNIQUENES STATUS
------------------------------ ------------------------------ ------------------------------ --------- --------
IDX01 B TESTB UNIQUE VALID
IDX02 A TESTB UNIQUE VALID
GGSCI (testa) 67> add trandata testogg.testb
Logging of supplemental redo data enabled for table TESTOGG.TESTB.
SQL> select * from dba_log_group_columns where owner='TESTOGG' and table_name='TESTB';
OWNER LOG_GROUP_NAME TABLE_NAME COLUMN_NAME POSITION LOGGIN
------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------- ------
TESTOGG GGS_11668 TESTB B 1 LOG
SQL>
现象,ogg选择了IDX01索引名称的列作为oracle补充日志列
情况三:索引名为PDX和IDX的情况
GGSCI (testa) 68> delete trandata testogg.testb
Logging of supplemental redo log data disabled for table TESTOGG.TESTB.
重建索引
SQL> drop index idx01;
Index dropped.
SQL> drop index idx02;
Index dropped.
SQL> create unique index pdx on testb (b);
Index created.
SQL> create unique index idx on testb (a);
Index created.
SQL> select a.index_name,b.column_name,a.table_name,a.uniqueness,a.status from dba_indexes a,dba_ind_columns b where a.owner='TESTOGG' and a.table_name='TESTB' and a.index_name=b.index_name;
INDEX_NAME COLUMN_NAME TABLE_NAME UNIQUENES STATUS
------------------------------ ------------------------------ ------------------------------ --------- --------
PDX B TESTB UNIQUE VALID
IDX A TESTB UNIQUE VALID
添加trandata
GGSCI (testa) 69> add trandata testogg.testb
Logging of supplemental redo data enabled for table TESTOGG.TESTB.
SQL> select * from dba_log_group_columns where owner='TESTOGG' and table_name='TESTB';
OWNER LOG_GROUP_NAME TABLE_NAME COLUMN_NAME POSITION LOGGIN
------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------- ------
TESTOGG GGS_11668 TESTB A 1 LOG
现象,ogg选择了IDX索引名称的列作为oracle补充日志列
情况四:索引名称与生产情况的一致,为runsvr.espostrn的两个非空唯一索引名臣PKTRNLOG和IDXTRNLOG0
GGSCI (testa) 70> delete trandata testogg.testb
Logging of supplemental redo log data disabled for table TESTOGG.TESTB.
SQL> drop index idx;
Index dropped.
SQL> drop index pdx;
Index dropped.
SQL> create unique index IDXTRNLOG0 on testb (b);
create unique index PKTRNLOG on testb (a);
Index created.
SQL>
Index created.
SQL> select a.index_name,b.column_name,a.table_name,a.uniqueness,a.status from dba_indexes a,dba_ind_columns b where a.owner='TESTOGG' and a.table_name='TESTB' and a.index_name=b.index_name;
INDEX_NAME COLUMN_NAME TABLE_NAME UNIQUENES STATUS
------------------------------ ------------------------------ ------------------------------ --------- --------
PKTRNLOG A TESTB UNIQUE VALID
IDXTRNLOG0 B TESTB UNIQUE VALID
GGSCI (testa) 71> add trandata testogg.testb
Logging of supplemental redo data enabled for table TESTOGG.TESTB.
SQL> select * from dba_log_group_columns where owner='TESTOGG' and table_name='TESTB';
OWNER LOG_GROUP_NAME TABLE_NAME COLUMN_NAME POSITION LOGGIN
------------------------------ ------------------------------ ------------------------------ ------------------------------ ---------- ------
TESTOGG GGS_11668 TESTB B 1 LOG
现象,ogg选择了IDXTRNLOG0索引名称的列作为oracle补充日志列。这个情况与目前27生产重新添加trandata后,情况一致
由此得出结论,当出现两个非空唯一索引时,ogg依据索引名称进行唯一选择,按字母由小到大、数字由小到大的情况,优先级逐步降低。