适用于 Extract 进程(仅适用于 Oracle数据库)
使用 BR 参数可以控制 GoldenGate 的 Bounded Recovery (BR) 功能。Bounded Recovery 功能仅支持 Oracle 数据库。
Bounded Recovery 是通用 Extract 检查点工具的组件之一,可以保证当Extract 进程出于任何原因(计划停机或意外停机)停止后,无论在进程停止时的时间点上存在多少个未提交的事务还是这些事务持续的时间多么久,Extract 进程都能进行高效地恢复。Bounded Recovery 为 Extract 进程从恢复到其停止的时间点并恢复正常处理所需要的时间设定了一个时间上限。
当Extract 进程在 redo log 中遇到某个事务的起点(在 Oracle 中通常为第一个可执行的 sql 语句)时,便会将从该事务中捕获到的所有数据缓存到内存中。即使一开始该事务不包含任何数据,Extract 进程也必须将事务缓存到内存中,因为该事务中后面的操作可能包含要捕获的数据。
当Extract进程在 redo log 中遇到事务的commit 记录,便会将缓存在内存中的整个事务写入trail 文件,并将其从内存中清除。当 Extract 进程遇到事务的rollback 记录时,便会丢弃缓存中所缓存的整个事务。在 Extract 进程处理 commit 或rollback 记录之前,都会视事务为Open状态(未提交或回滚的),并持续不断地收集该事务的信息。
如果Extract 在遇到事务的 commit 或 rollback 记录之前停止,则在Extract 进程重启后,必须对所有缓存在内存中的信息进行恢复。此操作适用于 Extract 进程停止时所有处于 open 状态的事务。
Ø 如果在 Extract 进程停止时,没有处于 open 状态的事务,则恢复操作从current Extract read checkpoint开始,这是正常的恢复过程。
Ø 如果 redo log 中存在起始点非常接近于 Extract 进程停止时间点的 open事务,则 Extract进程会重新读入redo log,从其中最早的 open事务的起始点开始恢复。此过程需要 Extract 进程对该之前已写入 trail 或 discarded 文件的事务执行重复的工作,但是这项工作只需要处理相对较少的数据,属于可接受的成本范围内。这种恢复也可视为正常恢复。
Ø 如果存在一个或多个 Extract 进程视为长时间运行的 open 事务,则Extract 进程便会通过 BoundedRecovery 进行恢复。
如果事务处于open 状态的时间超过 BR 参数的BRINTERVAL选项中指定的Bounded
Recovery 间隔,则 OGG 就视该事务为长时间运行的 open 事务。例如,如果 Bounded Recovery 间隔为4小时,则任何持续时间超过4小时的事务都可视为长时间运行的 open 事务。
每隔一个Bounded Recovery 间隔,Extract 都会进行一次Bounded Recovery checkpoint,该检查点操作会将Extract 进程的当前状态和数据写入磁盘,包括任何存在的长时间运行的事务的状态和数据。如果 Extract 进程在一个Bounded Recovery检查点之后停止,则该进程将从上一个Bounded Recovery间隔点或最后一个Bounded Recovery检查点位置进行恢复,而不会从 redo log 或 archived log 中最早的长时间运行open 事务的起始位置开始进行恢复。
Bounded Recovery的最大时间 (Extract恢复到停止时间点的最大时间)永远不会超过当前 Bounded Recovery 检查点间隔的 2 倍。实际的恢复时间将由如下因素决定:
Ø 从 Extract 进程停止开始到最后一个有效的 Bounded Recovery 间隔之间的时间。
Ø 整个恢复期间 Extract 进程的处理情况。
Ø 之前写入磁盘的事务的处理时间比。当首先要进行磁盘写的时候,Bounded Recovery 处理这些事务(丢弃这些事务)要比extract快很多。 大多数事务数据的重新处理都包含此过程。
当Extract 进程进行恢复时,该进程会还原最后一个Bounded Recovery 检查点(包含任何长时间运行的事务)保存的数据和状态。
例如,如果一个事务处于 open 状态的时间有 24 小时,BoundedRecovery 间隔为 4 小时。在这种情况下,Extract 进程最长恢复时间不会超过 8 小时(<=4*2),可能会小于该时间。这取决于 Extract 进程停止的时间点和最后一个有效的 Bounded Recovery 检查点以及 Extract 进程在该期间的活动情况。
利用磁盘的持久性来存储和恢复长时间运行的事务,这种情形很少发生,但是一旦发生,这一特性将显著地提高 Extract 进程执行恢复的性能。当 Extract 进程停止时其正在处理的长时间运行的事务在 redo log 中的起始位置通常都在一个非常早(距离当前时间非常久远)的位置。一个长时间运行的事务很可能跨越了大量的老旧的日志文件(online和archived log),这些比较早的日志文件,有些早已通过备份转移到其他的存储设备或者直接删除了。如果通过读取日志从长时间运行的事务在日志当中的起始位置开始进行恢复,则需要大量的时间成本,其实在数据库中长时间运行的事务时非常少的,在此过程中大部分的工作实际上是又捕获了一遍已经写入 trail 或Discarded 文件的其他事务。利用 bounded recovery 可以restore 磁盘上存留的长时间运行的事务信息(同 Oracle 数据库中的 restore 操作类似),可以避免上述的额外重复工作。
下图显示的时间轴上,随着时间的推进,一系列的事务开始处理。该图清晰地演示了长时间运行的事务如何以特定的时间间隔持久化(写入或存留)到磁盘,然后在发生故障后进行恢复,它可以帮助我们理解本例中所使用的相关术语:
Ø 持久化对象persisted object是指缓存中已在 Bounded Recovery 检查点过程中持久化的任何对象。通常情况下,此对象就是事务的状态或数据,不过缓存中还应包含一些Extract 进程专用对象。这些对象统称为持久化对象。
Ø 最早的非持久化对象oldest non-persisted object是指当前 Bounded Recovery 检查点之前最近的一个 BR 间隔内,缓存中最早的 open对象。通常情况下,该对象就是该时间间隔内最早的 open 事务。Bounded Recovery 重新开始时,运行时处理就是从最早的非持久化对象的起始位置开始恢复的,在一般的事务处理中,该位置就是该事务在 redo log 中的起始位置。
在本例中,BoundedRecovery 间隔为4小时。如果open 事务开始的时间点距离当前的 Bounded Recovery 检查点超过一个 Bounded Recovery 间隔,则该事务就会在当前的 Bounded Recovery 检查点被持久化。
在 BR 检查点 n 处:
● 有 5 个处于open 状态的事务: T(27), T(45), T(801), T(950),T(1024)。所有其他的事务均已提交或回滚。这些事务都从其起始点开始延时间轴不断运行。
● 运行的时间超过一个 Bounded Recovery 间隔的事务有:T(27) 和 T(45),在BR 检查点 n 处,这些事务都会被持久化(写入)磁盘。
● 最早的非持久化对象是 T(801)。该事务不符合持久化到磁盘的条件,因为其运行的时间还没有超过一个 Bounded Recovery 间隔。作为最早的非持久化对象,T(801) 在日志中的起点位置存储在BR 检查点 n的检查点文件中。如果 Extract 进程在 BR 检查点 n 之后意外停止,则该进程将恢复到该日志位置,然后才能重新开始读取解析日志的内容。如果在BR 检查点 n之前的 BoundedRecovery 间隔中没有最早的非持久化的对象,则 Extract 进程就会从当前 Bounded Recovery 检查点的日志位置重新开始读取日志。
在 BR 检查点 n+1 处:
● T(45) 在前一个BoundedRecovery间隔内已经变脏(发生过更新) ,因此该事务将写入到一个新的持久化对象文件中。旧的持久化对象文件将在 BR 检查点 n+1完成后删除。
● 如果 Extract 进程在写 BR检查点 n+1时或在 BR检查点 n 和 BR检查点 n+1 之间的任意BoundedRecovery 检查点间隔内失败,则 Extract 进程将从上一个有效的 BR 检查点 n开始进行恢复。BR检查点 n重新开始的位置就是最早的非持久化事务T(801). 的起点。因此,在最坏情况下,Extract 进程的恢复停止的时间点所需的时间不会超过两个 Bounded Recovery 间隔,在本例中,恢复的最长时间不会超过 8 小时。
在 BR 检查点n+3000 处:
● 系统已经运行了很长时间了。T(27) 和 T(45) 是仅存的持久化事务。T(801) 和 T(950) 已在 BR Checkpoint n+2999 之前的某个时间点提交并写入trail文件。现在仅存的非持久化 open 事务为 T(208412) 和T(208863)。
● BRCheckpoint n+3000已写完。
● 在 BRCheckpoint n+3000 之后的 BR 间隔内,发生了电源故障。
● 新Extract 进程恢复到BRCheckpoint n+3000。 (27) 和T(45) 从包含BRCheckpoint n 的状态的持久化检查点文件还原出来。日志读取从 T(208412) 的起点开始恢复。
Default BR BRINTERVAL 4, BRDIR BR
Syntax BR
[, BRDIR
[, BRINTERVAL
[, BRKEEPSTALEFILES]
[, BROFF]
[, BROFFONFAILURE]
[, BRRESET]
Argument |
Description |
BRDIR |
Specifies the relative or full path name of the parent directory that will contain the BR directory. The BR directory contains the Bounded Recovery checkpoint files, and the name of this directory cannot be changed. The default parent directory for the BR directory is a directory named BR in the root directory that contains the Oracle GoldenGate installation files. Each Extract group within a given Oracle GoldenGate installation will have its own sub-directory under the directory that is specified with BRDIR. Each of those directories is named for the associated Extract group. For |
BRINTERVAL |
Specifies the time between Bounded Recovery checkpoints. This is known as theBounded Recovery interval. This interval is an integral multiple of the standard Extract checkpoint interval, as controlled by the CHECKPOINTSECS parameter. However, it need not be set exactly. Bounded Recovery will adjust any legal BRINTERVAL parameter internally as it requires. The minimum for ◆ M for minutes ◆ H for hours The default interval is 4 hours. |
BRKEEPSTALEFILES |
Causes old Bounded Recovery checkpoint files to be retained. By default, only current checkpoint files are retained. Extract cannot recover from old Bounded Recovery checkpoint files. Retain old files only at the request of an Oracle support analyst. |
BROFF |
Turns off Bounded Recovery for the run and for recovery. Consult Oracle Support before using this option. In most circumstances, when there is a problem with Bounded Recovery, it turns itself off. |
BROFFONFAILURE |
Disables Bounded Recovery after an error. By default, if Extract encounters an error during Bounded Recovery processing, it reverts to normal recovery, but then enables Bounded Recovery again after recovery completes. BROFFONFAILURE turns Bounded Recovery off for the runtime processing. |
BRRESET |
command line. To run Extract from the command line: replicat paramfile Where: paramfile pf. reportfile BRRESET forces Extract to use normal recovery for the current run, and then turn Bounded Recovery back on after the recovery is complete. Its purpose is for the rare cases when Bounded Recovery does not revert to normal recovery if it encounters an error. Bounded Recovery will be enabled during runtime. Consult Oracle before using this option. |
Example BR BRDIR /user/checkpt/br specifies that the Bounded Recovery checkpoint files willbe created in the /user/checkpt/br directory.
Oracle GoldenGate - Version: 11.1.1.1.0 andlater [Release: 11.1.1.1 and later ]
Information in this document applies to anyplatform.
How can the Extract BR (bounded recovery)checkpoint information in Oracle GoldenGate (OGG) be seen?
The BR checkpoint information is shown inthe SHOWCH output starting with OGG v11.1.1.1
Following is an example showing the BRcheckpoint information on the extract along with the
standard recovery checkpoint details.
GGSCI (cdb2) 3> info ext1, showch
EXTRACT EXT1 Last Started 2014-05-28 09:26 Status RUNNING
CheckpointLag 00:00:00 (updated 00:00:10 ago)
Log ReadCheckpoint Oracle Redo Logs
2014-05-28 14:30:07 Thread 1, Seqno 6115, RBA 58318336
Log ReadCheckpoint Oracle Redo Logs
2014-05-28 14:30:05 Thread 2, Seqno 6260, RBA 2146304
CurrentCheckpoint Detail:
Read Checkpoint #1
Oracle RAC Redo Log
Startup Checkpoint (starting position in thedata source):
Thread #: 1
Sequence #: 6108
RBA: 88603664
Timestamp: 2014-05-28 09:26:38.000000
SCN: Not available
Redo File: /dev/raw/rredo2_3
Recovery Checkpoint (position of oldestunprocessed transaction in the data source):
Thread #: 1
Sequence #: 6115
RBA: 58316816
Timestamp: 2014-05-28 14:30:07.000000
SCN: 0.377889779 (377889779)
Redo File: /dev/raw/rredo1_1
Current Checkpoint (position of last recordread in the data source):
Thread #: 1
Sequence #: 6115
RBA: 58318336
Timestamp: 2014-05-28 14:30:07.000000
SCN: 0.377889779 (377889779)
RedoFile: /dev/raw/rredo1_1
BR Previous RecoveryCheckpoint:
Thread #: 1
Sequence #: 0
RBA: 0
Timestamp: 2014-05-28 09:26:41.285711
SCN: Not available
Redo File:
BR Begin RecoveryCheckpoint:
Thread #: 1
Sequence #: 6108
RBA: 89926672
Timestamp: 2014-05-28 09:26:58.000000
SCN: 0.377259452 (377259452)
Redo File:
BR End RecoveryCheckpoint:
Thread #: 1
Sequence #: 6113
RBA: 93254656
Timestamp: 2014-05-28 13:26:40.000000
SCN: 0.377726964 (377726964)
Redo File:
Read Checkpoint #2
Oracle RAC Redo Log
Startup Checkpoint (starting position in thedata source):
Thread #: 2
Sequence #: 6258
RBA: 162832
Timestamp: 2014-05-28 09:26:38.000000
SCN: Not available
Redo File: /dev/raw/rredo2_3
Recovery Checkpoint (position of oldestunprocessed transaction in the data source):
Thread #: 2
Sequence #: 6260
RBA: 2145808
Timestamp: 2014-05-28 14:30:05.000000
SCN: 0.377889243 (377889243)
Redo File: /dev/raw/rredo2_2
Current Checkpoint (position of last recordread in the data source):
Thread #: 2
Sequence #: 6260
RBA: 2146304
Timestamp: 2014-05-28 14:30:05.000000
SCN: 0.377889243 (377889243)
Redo File: /dev/raw/rredo2_2
BR Previous RecoveryCheckpoint:
Thread #: 2
Sequence #: 0
RBA: 0
Timestamp: 2014-05-28 09:26:41.285711
SCN: Not available
Redo File:
BR Begin RecoveryCheckpoint:
Thread #: 2
Sequence #: 6260
RBA: 1013248
Timestamp: 2014-05-28 13:26:36.000000
SCN: 0.377726947 (377726947)
Redo File:
BR End RecoveryCheckpoint:
Thread #: 2
Sequence #: 6260
RBA: 1013248
Timestamp: 2014-05-28 13:26:36.000000
SCN: 0.377726947 (377726947)
Redo File:
WriteCheckpoint #1
GGS Log Trail
Current Checkpoint (current write position):
Sequence #: 13
RBA: 1014
Timestamp: 2014-05-28 14:30:09.276093
Extract Trail: /orabak/ggstrail/exttrail/gh/e1
Header:
Version = 2
Record Source = A
Type = 6
# Input Checkpoints = 2
# Output Checkpoints = 1
FileInformation:
Block Size = 2048
Max Blocks = 100
Record Length = 4096
Current Offset = 0
Configuration:
Data Source = 3
Transaction Integrity = 1
Task Type = 0
Status:
Start Time = 2014-05-28 09:26:42
Last Update Time = 2014-05-28 14:30:09
Stop Status = A
Last Result = 400
GGSCI (cdb2)4>
11.1.1.1.0以前的版本是查不到BRcheckpoint信息的,如下:
GGSCI (db2) 2> info ext2,showch
EXTRACT EXT2 Last Started 2014-03-23 01:36 Status RUNNING
CheckpointLag 00:00:02 (updated 00:00:07 ago)
Log ReadCheckpoint Oracle Redo Logs
2014-05-28 14:35:37 Thread 1, Seqno 140773, RBA 25207544
Log ReadCheckpoint Oracle Redo Logs
2014-05-28 14:35:37 Thread 2, Seqno 134457, RBA 89554604
CurrentCheckpoint Detail:
Read Checkpoint #1
Oracle RAC Redo Log
Startup Checkpoint (starting position in thedata source):
Thread #: 1
Sequence #: 130336
RBA: 161296
Timestamp: 2014-03-23 01:35:57.000000
SCN: Not available
Redo File: /arch44/2_122788_751481221.dbf
Recovery Checkpoint (position of oldestunprocessed transaction in the data source):
Thread #: 1
Sequence #: 140773
RBA: 25201168
Timestamp: 2014-05-28 14:35:37.000000
SCN: 9.2695260201 (41349965865)
Redo File: Not Avaliable
Current Checkpoint (position of last recordread in the data source):
Thread #: 1
Sequence #: 140773
RBA: 25207544
Timestamp: 2014-05-28 14:35:37.000000
SCN: 9.2695260219 (41349965883)
Redo File: Not Available
Read Checkpoint #2
Oracle RAC Redo Log
Startup Checkpoint (starting position in thedata source):
Thread #: 2
Sequence #: 122940
RBA: 13349392
Timestamp: 2014-03-23 01:35:57.000000
SCN: Not available
Redo File: /arch44/2_122788_751481221.dbf
Recovery Checkpoint (position of oldestunprocessed transaction in the data source):
Thread #: 2
Sequence #: 134457
RBA: 89551888
Timestamp: 2014-05-28 14:35:37.000000
SCN: 9.2695260287 (41349965951)
Redo File: /dev/rrac_redo2_3_1g
Current Checkpoint (position of last recordread in the data source):
Thread #: 2
Sequence #: 134457
RBA: 89554604
Timestamp: 2014-05-28 14:35:37.000000
SCN: 9.2695260290 (41349965954)
Redo File: Not Available
WriteCheckpoint #1
GGS Log Trail
Current Checkpoint (current write position):
Sequence #: 10343
RBA: 140976867
Timestamp: 2014-05-28 14:35:39.050084
Extract Trail: /ggstrail/exttrail/xib/e1
Header:
Version = 2
Record Source = A
Type = 6
# Input Checkpoints = 2
#Output Checkpoints = 1
FileInformation:
Block Size = 2048
Max Blocks = 100
Record Length = 4096
Current Offset = 0
Configuration:
Data Source = 3
Transaction Integrity = 1
Task Type = 0
Status:
Start Time = 2014-03-23 01:36:09
Last Update Time = 2014-05-28 14:35:39
Stop Status = A
Last Result = 0
GGSCI (db2)3>
在Oracle GoldenGate版本11.x中,引入了Bounded Recovery(BR)的概念,即允许extract对于长事务(long running transaction 比BRINTERVAL指定值更长的事务)写入到本地BR目录。当extract重启时,它会首先读取BR文件,取而代之读取恢复检查点指定的归档日志,这样有助于提升性能以及减少对旧归档文件的依赖。
但是当在RAC环境中使用Bounded Recovery(BR)特性来恢复一个异常abend掉的extract的话,小概率可能会遇到extract hang住或丢失特性的事务。该BUG仅在RAC环境中或者单实例情况下使用多个thread设置时出现。
When a transaction is committed, it will beflushed to trail file. But when BR writing started (after the transaction commit)and extract abends abnormally, the extract may not have chance to flush thecommitted transaction to trail. When extract restarted, it will read from BR,and leave that committed transaction as persist committed transaction in memoryand never be written to trail. So this committed transaction may be lost.
The problem will not happen when theextract stops in normal mode.
With BR setup, when new objects (table,sequence, DDL, et al) are including in the extract, restarted extract will pickup more data that causes the producer queue limit (a fixed number) used by BRbe reached. Because the extract is still in BR recovery, the consumer thread isstopped and not processing data from the producer queues. This caused adeadlock, and the extract will appear hung.
1. 对于BUG 12532428引起的事务丢失,该BUG在11.1.1.1中被修复,且会在11.1.1.0中被backport。
2. 对于BUG 10408077 引起的extracthang,该BUG在11.1.1.1和 11.1.1.0.30中被修复,也可以如下workaround绕过:
A workaround with earlier 11.1.1.0 versionis to start extract with BRRESET, when new object is added to an extract. Allthe archived logs since recovery checkpoint need to be available.
ggsci> start extract, BRRESET
Sharon
2014.05.28
--------------------------------------------------------------------
转载须注明出处!
http://blog.csdn.net/sharqueen_wu/article/details/27349865
转载须注明出处!
http://blog.csdn.net/sharqueen_wu/article/details/27349865
转载须注明出处!
http://blog.csdn.net/sharqueen_wu/article/details/27349865