先来看看这几个参数在官方文档中定义是怎样的
FAST_START_MTTR_TARGET
Property |
Description |
Parameter type |
Integer |
Default value |
0 |
Modifiable |
ALTER SYSTEM |
Range of values |
0 to 3600 seconds |
Basic |
No |
Real Application Clusters |
Multiple instances can have different values, and you can change the values at runtime. |
FAST_START_MTTR_TARGET enables you to specify the number of seconds the database takes to perform crash recovery of a single instance. When specified, FAST_START_MTTR_TARGET is overridden by LOG_CHECKPOINT_INTERVAL.
LOG_CHECKPOINT_INTERVAL
Property |
Description |
Parameter type |
Integer |
Default value |
0 |
Modifiable |
ALTER SYSTEM |
Range of values |
0 to 231 - 1 |
Basic |
No |
Real Application Clusters |
Multiple instances can have different values. |
LOG_CHECKPOINT_INTERVAL specifies the frequency of checkpoints in terms of the number of redo log file blocks that can exist between an incremental checkpoint and the last block written to the redo log. This number refers to physical operating system blocks, not database blocks.
Regardless of this value, a checkpoint always occurs when switching from one online redo log file to another. Therefore, if the value exceeds the actual redo log file size, checkpoints occur only when switching logs. Checkpoint frequency is one of the factors that influence the time required for the database to recover from an unexpected failure.
Notes:
· Specifying a value of 0 (zero) for LOG_CHECKPOINT_INTERVAL has the same effect as setting the parameter to infinity and causes the parameter to be ignored. Only nonzero values of this parameter are considered meaningful.
· Recovery I/O can also be limited by setting the LOG_CHECKPOINT_TIMEOUT parameter or by the size specified for the smallest redo log. For information on which mechanism is controlling checkpointing behavior, query the V$INSTANCE_RECOVERY view.
LOG_CHECKPOINT_TIMEOUT
Property |
Description |
Parameter type |
Integer |
Default value |
1800 |
Modifiable |
ALTER SYSTEM |
Range of values |
0 to 231 - 1 |
Basic |
No |
Real Application Clusters |
Multiple instances can have different values. |
LOG_CHECKPOINT_TIMEOUT specifies (in seconds) the amount of time that has passed since the incremental checkpoint at the position where the last write to the redo log (sometimes called the tail of the log) occurred. This parameter also signifies that no buffer will remain dirty (in the cache) for more than integer seconds.
Specifying a value of 0 for the timeout disables time-based checkpoints. Hence, setting the value to 0 is not recommended unless FAST_START_MTTR_TARGET is set.
Notes:
· A checkpoint scheduled to occur because of this parameter is delayed until the completion of the previous checkpoint if the previous checkpoint has not yet completed.
· Recovery I/O can also be limited by setting the LOG_CHECKPOINT_INTERVAL parameter or by the size specified for the smallest redo log. For information on which mechanism is controlling checkpointing behavior, query the V$INSTANCE_RECOVERY view.
实际上还有一个参数FAST_START_IO_START,不过我在我的10G官方文档Reference中已经找不到了,估计已经不支持了吧。在一个帖子里看到过一个朋友写过关于FAST_START_IO_START、LOG_CHECKPOINT_INTERVAL、LOG_CHECKPOINT_TIMEOUT这三者的区别,觉得挺好的,在这里引用一下。
#########################################################################
首先明确几个概念:
1.Data block 是包含几个os block,也就是一对多的关系。
2.Data files 中的是 data block
Redo log files 中的 redo block 是os block
3.Data block 中记录的是完整的信息
Redo block 中记录的是最简单的信息
在8i 以前的版本,只有两个参数用来影响recovery,那就是log_checkpoint_interval和log_checkpoint_time。以log_checkpoint_interval=10000举例。意思很简单,就是经过10000个redo block后就引发checkpoint,这样恢复时,就绝对是在这10000个redo block中。
但是这样做有个缺点,就是因为redo block(os block)不等于data block。假如datablock中的数据很小,比如1,修改成2,那么10000个redo block中包含的信息却可以是远远大于10000个data block,假如data block中的数据很大,比如123456789,修改成987654321,那么10000个redo block中包含的信息却可以是远远小于10000个data block。
这样,在恢复时,尽管是读10000个redo blcok,但是这个10000个redo block中所包含的data block有可能是很少,也有可能是很多,很难把握恢复的时间。
于是,在8i中就引进了fast_start_io_target这个参数来弥补这个不足。计算机自动计算redo block中所包含的data block的多少。例如fast_start_io_target=10000,t1(第一次checkpoint),在t1后有5000个redo block记录了,但是这个5000个redo block中只包含了7000个data block,那么redo block继续记录,到了redo block到8000个时,计算机发现这8000个redo block中包含了10000个data block,那么就引起t2(第二次checkpoint),凡是在t1和t2之间的任何crash,都可以保证recovery的时间肯定在这 10000个data block的读写时间之内(因为1个data block的I/O 是可以估算的)
所以说,设置fast_start_io_target比设置log_checkpoint_interval,log_checkpoint_time这个两个参数更为准确。假如3个参数一起设置,那么只要达到任一参数值时,就引发checkpoint。
例如:fast_start_io_target=10000,log_checkpoint_interval=10000,log_checkpoint_time=1800。
在redo block=10000时,所包含的data block却只有8000,就触发log_checkpoint_interval
在redo block=8000时,所包含的data block 却有10000,就触发fast_start_io_target
#########################################################################
他这里说说的log_checkpoint_time估计就是log_checkpoint_timeout,因为我在文档里也查不到log_checkpoint_time了,呵呵。
而现在一般都推荐使用FAST_START_MTTR_TARGET参数,直接将ORACLE在恢复的过程中花费的时间限定在一个特定的时间段内。
然而我在查看我数据库中此参数的时候又产生了新的疑惑
SQL> show parameter mttr
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_mttr_target integer 0
SQL>
为什么我的fast_start_mttr_target 会是0呢?如果是0的话,那么我数据库启动时进行恢复的时间又是被哪种机制来控制的呢?总不能无限期长时间恢复吧?上网搜到EYGLE写过的一篇文章,这里涉及到一个新的概念:自动调整的检查点
下面摘取文章中部分内容看看
#########################################################################
从Oracle10gR2开始,数据库可以实现自动调整的检查点.
使用自动调整的检查点,Oracle数据库可以利用系统的低I/O负载时段写出内存中的脏数据,从而提高数据库的效率。
因此,即使数据库管理员设置了不合理的检查点相关参数,Oracle仍然能够通过自动调整将数据库的Crash Recovery时间控制在合理的范围之内。
当FAST_START_MTTR_TARGET参数未设置时,自动检查点调整生效。
通常,如果我们必须严格控制实例或节点恢复时间,那么我们可以设置FAST_START_MTTR_TARGET为期望时间值;如果恢复时间不需要严格控制,那么我们可以不设置FAST_START_MTTR_TARGET参数,从而启用Oracle10g的自动检查点调整特性。
当取消FAST_START_MTTR_TARGET参数设置之后:
SQL> show parameter fast_start_mttr NAME TYPE VALUE |
在启动数据库的时候,我们可以从alert文件中看到如下信息:
Wed Jan 11 16:28:12 2006 |
检查v$instance_recovery视图,我们可以发现Oracle10g中的改变:
SQL> select RECOVERY_ESTIMATED_IOS REIOS,TARGET_MTTR TMTTR, REIOS TMTTR EMTTR WMTTR WOSET CKPTBW WAUTO WFTCKPT |
在以上视图中,WRITES_AUTOTUNE字段值就是指由于自动调整检查点执行的写出次数,
而CKPT_BLOCK_WRITES指的则是由于检查点写出的Block的数量。
#########################################################################
了解了吧,这里虽然我的FAST_START_MTTR_TARGET没有设置,值为0,可是ORACLE是会通过自动检查点调整特性来控制启动时恢复的时间的,使它不会太久。
文档中还有两个相关的参数,既然写了,在这里也顺便提一下吧。
LOG_CHECKPOINTS_TO_ALERT
Property |
Description |
Parameter type |
Boolean |
Default value |
false |
Modifiable |
ALTER SYSTEM |
Range of values |
true | false |
Basic |
No |
LOG_CHECKPOINTS_TO_ALERT lets you log your checkpoints to the alert file. Doing so is useful for determining whether checkpoints are occurring at the desired frequency.
FAST_START_PARALLEL_ROLLBACK
Property |
Description |
Parameter type |
String |
Syntax |
FAST_START_PARALLEL_ROLLBACK = { HI | LO | FALSE } |
Default value |
LOW |
Modifiable |
ALTER SYSTEM |
Basic |
No |
FAST_START_PARALLEL_ROLLBACK determines the maximum number of processes that can exist for performing parallel rollback. This parameter is useful on systems in which some or all of the transactions are long running.
Values:
第一个参数LOG_CHECKPOINTS_TO_ALERT就不说了,我们来看看第二个参数FAST_START_PARALLEL_ROLLBACK。
讲到这个参数,就不能不谈一下INSTANCE RECOVERY的过程。
Instance and crash recovery occur in two steps: cache recovery followed by transaction recovery.
看一下CACHE RECOVERY都做了些什么
The database can be opened as soon as cache recovery completes, so improving the performance of cache recovery is important for increasing availability.
The duration of cache recovery processing is determined by two factors: the number of data blocks that have changes at SCNs higher than the SCN of the checkpoint, and the number of log blocks that need to be read to find those changes.
Frequent checkpointing writes dirty buffers to the datafiles more often than otherwise, and so reduces cache recovery time in the event of an instance failure. If checkpointing is frequent, then applying the redo records in the redo log between the current checkpoint position and the end of the log involves processing relatively few data blocks. This means that the cache recovery phase of recovery is fairly short.
However, in a high-update system, frequent checkpointing can reduce runtime performance, because checkpointing causes DBWn processes to perform writes.
从这里可以看到,CHECKPOINT也并不是越频繁越好,当然,也不是间隔越久越好,关键是要根据系统的具体情况来把握一个合理的“度”。
Cache Recovery (Rolling Forward)
During the cache recovery step, Oracle applies all committed and uncommitted changes in the redo log files to the affected data blocks. The work required for cache recovery processing is proportional to the rate of change to the database (update transactions each second) and the time between checkpoints.
Transaction Recovery (Rolling Back)
To make the database consistent, the changes that were not committed at the time of the crash must be undone (in other words, rolled back). During the transaction recovery step, Oracle applies the rollback segments to undo the uncommitted changes.
ORACLE在恢复的过程中,首先读取日志,从最后完成的检查点开始,应用所有重做记录,这个过程成为前滚。也就是CACHE RECOVERY过程,完成前滚之后,就可以打开数据库提供访问和使用了。
此后进入实例恢复的第二阶段,ORACLE回滚未提交的事务,也就是TRANSACTION RECOVERY。ORACLE使用两个特点来增加这个恢复阶段的效率,这两个特点是FAST-START ON-DEMAND ROLLBACK和FAST-START PARALLEL ROLLBACK(这些特点是FAST-START FAULT RECOVERY的组成部分,仅在oracle 8i之后的企业版中可用)。
使用FAST-START ON-DEMAND ROLLBACK特点,ORACLE自动允许在数据库打开之后开始新的事务,这通常只需要很短的CACHE RECOVERY时间。如果一个用户试图访问被异常中止进程锁定的记录,ORACLE回滚那些新事务请求的记录,也就是说,因需求而回滚。因而,新事务不需要等待漫长的事务回滚时间。在FAST-START ON-DEMAND ROLLBACK中,后台进程SMON充当一个调度员,使用多个服务器进程并行回滚一个事务集。
FAST-START PARALLEL ROLLBACK主要对于长时间运行的未提交事务有效,尤其是并行INSERT,UPDATE和DELETE操作。SMON自动决定何时开始并行回滚并且自动在多个进程之间分散工作。
FAST-START PARALLEL ROLLBACK的一个特殊形式是内部事务恢复(INTRA-TRANSACTION RECOVERY)。在内部事务恢复中,一个大的事务可以被拆分,分配给几个服务器进程并行回滚。这个时候就可以通过我们前面说到的那个FAST_START_PARALLEL_ROLLBACK来控制并行回滚。
前面还涉及到一个概念,FAST-START FAULT RECOVERY,在这里引用网上一个朋友的文章简单的解释一下。
######################################################
从Oracle8i开始,Oracle在企业版中引入了Fast-Start Fault Recovery选项。
该选项包含三个主要增强:
1.Fast-Start Checkpointing
2.Fast-Start On-Demand Rollback
3.Fast-Start Parallel Rollback.
这三个选项,都是为了加快系统在故障后的恢复,提高系统的可用性。
从v$option视图中,我们可以找到这个选项:
SQL> select * from v$version where rownum <2;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
SQL> select * from v$option
2 where Parameter='Fast-Start Fault Recovery';
PARAMETER VALUE
-------------------------------------------------- ---------------
Fast-Start Fault Recovery TRUE
######################################################
在使用FAST_START_MTTR_TARGET的时候需要注意的事情
You must disable or remove the FAST_START_IO_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUT initialization parameters when using FAST_START_MTTR_TARGET. Setting these parameters interferes with the mechanisms used to manage cache recovery time to meet FAST_START_MTTR_TARGET.
Practical Values for FAST_START_MTTR_TARGET
The maximum value for FAST_START_MTTR_TARGET is 3600 seconds (one hour). If you set the value to more than 3600, then Oracle rounds it to 3600.