借助SCN的变化来理解oracle备份与恢复的基本原理

SCN是oracle挂在墙上的时钟。早上起床,曰“起床SCN”;吃早餐,名“早餐SCN”;出门上班,称之为“出门SCN”。我们的任何活动,都会对应一个SCN。我们可借助oracle内部的一个包来获取系统的SCN(注意:这里只是系统的scn,因为,oracle还有commit scn,checkpoint scn,select scn等等)。

SQL> select dbms_flashback.get_system_change_number "system's scn" from dual;

system's scn
------------
      555956
oracle内部只有一个SCN,其他的都是来自它。我们还可以看一下数据库里面最小的SCN。

SQL> select creation_change# "oracle内部最小scn" from v$datafile where file#=1;

oracle内部最小scn
-----------------
                9
我们加在oracle身上的事,无论好坏,oracle都会依据SCN,一一记在心里(日志),莫敢相忘。由于SCN是递增的,我们对应到相关的SCN,就能找到那个时刻,我们对oracle所做的事。这便是SCN的重要性。
我们对oracle所在的事,她会记在当前日志组。我们可以用v$log来查询。

SQL> select group#,sequence#,status from v$log;

    GROUP#  SEQUENCE# STATUS
---------- ---------- ----------------
         1          5 CURRENT
         2          3 INACTIVE
         3          4 INACTIVE
接下来,我们对oracle做件事。我们建个表t,有两个字段。其中,字段scn可以约等于事务开始的scn。

  SQL> create table t(id int,scn number) tablespace users;

Table created.

SQL> insert into t values(1,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from t;

        ID        SCN
---------- ----------
         1     585887
我们先把这件事缓缓,来看看v$log里面的first_change#。

SQL> alter session set nls_date_format='yyyy/mm/dd hh24:mi:ss';

Session altered.
SQL> select group#,status,first_change#,first_time from v$log;

    GROUP# STATUS           FIRST_CHANGE# FIRST_TIME
---------- ---------------- ------------- -------------------
         1 CURRENT                 583374 2012/07/17 19:59:23
         2 INACTIVE                560959 2012/07/17 17:13:32
         3 INACTIVE                560981 2012/07/17 17:14:33
这里的first_change#和first_time是一样的,都是SCN的两种表现形式。first_change#是日志组成为当前日志组时所取的系统的SCN,来作为这一组最小或者开始的SCN。我们所做的事,对应的SCN,都会比first_change#来得大。
继续我们的事,我们把当前日志组归档。

SQL> alter system switch logfile;

System altered.

再瞧瞧v$log里面的first_change#

SQL> select group#,status,first_change#,first_time from v$log;

    GROUP# STATUS           FIRST_CHANGE# FIRST_TIME
---------- ---------------- ------------- -------------------
         1 ACTIVE                  583374 2012/07/17 19:59:23
         2 CURRENT                 586090 2012/07/18 09:35:40
         3 INACTIVE                560981 2012/07/17 17:14:33
现在当前日志组变成了第2组,first_change#也发生了变化。
再来继续我们未完的事。

SQL> insert into t values(2,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from t;

        ID        SCN
---------- ----------
         1     585887
         2     586129

从这里我们可以看出,586129比当前日志组2的first_change#(586090)大。从而,证明了first_change#是当前日志组最小的SCN,之后,我们所做的任何事,产生的SCN,都会比这个来得大。
我们再日志却,将日志组2归档。

SQL> alter system switch logfile;

System altered.

SQL> select group#,status,first_change#,first_time from v$log;

    GROUP# STATUS           FIRST_CHANGE# FIRST_TIME
---------- ---------------- ------------- -------------------
         1 ACTIVE                  583374 2012/07/17 19:59:23
         2 ACTIVE                  586090 2012/07/18 09:35:40
         3 CURRENT                 586181 2012/07/18 09:39:21

现在,日志组3变成了当前日志组了,相应的first_change#也发生了变化。
再来继续我们事情。为了产生更多的归档日志,我们不断的插入,提交,却换。

SQL> insert into t values(3,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

SQL> alter system switch logfile;

System altered.

SQL> insert into t values (4,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

SQL> alter system switch logfile;

System altered.

SQL> insert into t values (5,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;     

Commit complete.

SQL> alter system switch logfile;

System altered.

SQL> insert into t values(6,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

SQL> alter system switch logfile;

System altered.

SQL> insert into t values (7,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

SQL> alter system switch logfile;

System altered.

SQL> insert into t values (8,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

SQL> alter system switch logfile;

System altered.


SQL> select * from t;

        ID        SCN
---------- ----------
         1     585887
         2     586129
         3     586643
         4     586666
         5     586692
         6     586722
         7     586751
         8     586805

我们再来看一下,当前日志组是哪一组?

SQL> select group#,status,first_change#,first_time from v$log;

    GROUP# STATUS           FIRST_CHANGE# FIRST_TIME
---------- ---------------- ------------- -------------------
         1 ACTIVE                  586734 2012/07/18 09:45:12
         2 ACTIVE                  586762 2012/07/18 09:46:15
         3 CURRENT                 586816 2012/07/18 09:47:15

当前的日志组是3.那么,我们再来插入。

SQL> insert into t values(9,dbms_flashback.get_system_change_number);

1 row created.

SQL> commit;

Commit complete.

这时,我们并没有却换日志组。然后,再插入。

SQL> insert into t values(10,dbms_flashback.get_system_change_number);

1 row created.

注意了,此时,我们没有提交也没有却换。那么,第9,10条的数据都在日志组3上面。
这里,我们模拟一个实验来阐述备份与恢复的基本原理。
实验:正常关机下,数据文件损坏的完全恢复。

[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jul 17 20:48:19 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

SQL> conn / as sysdba
Connected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

[oracle@localhost ORCL]$ cd datafile/
[oracle@localhost datafile]$ ls
o1_mf_example_8050jhm7_.dbf  o1_mf_temp_8050j34j_.tmp
o1_mf_sysaux_8050fk3w_.dbf   o1_mf_undotbs1_8050fkc6_.dbf
o1_mf_system_8050fk2z_.dbf   o1_mf_users_8050fkdh_.dbf
[oracle@localhost datafile]$ rm o1_mf_system_8050fk2z_.dbf
[oracle@localhost datafile]$ rm o1_mf_sysaux_8050fk3w_.dbf
[oracle@localhost datafile]$ rm o1_mf_users_8050fkdh_.dbf 
[oracle@localhost datafile]$ rm o1_mf_undotbs1_8050fkc6_.dbf

这个时候,假如我们要启动数据库会报什么错呢?

[oracle@localhost ~]$ sqlplus /nolog

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Jul 17 20:57:00 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.

Total System Global Area  419430400 bytes
Fixed Size                  1219760 bytes
Variable Size             142607184 bytes
Database Buffers          272629760 bytes
Redo Buffers                2973696 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1:
'/u01/app/oracle/oradata/ORCL/datafile/o1_mf_system_8050fk2z_.dbf'

因为,有控制文件,所以,我们会到mount状态,这是个oracle的介态。这个状态,我们可以做很多事。这时,它报文件1不能锁定。那么,我们一个个来。先把冷备的文件1拷来。
[oracle@localhost datafile]$ cp o1_mf_system_8050fk2z_.dbf /u01/app/oracle/oradata/ORCL/datafile
然后,再来打开数据库,看会报什么错?

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1:
'/u01/app/oracle/oradata/ORCL/datafile/o1_mf_system_8050fk2z_.dbf'

这个时候报的错误不一样了。报文件1需要媒介恢复。oracle是根据什么报这个错误的呢?要了解这个,我们需要借助两个视图。

SQL> select file#,checkpoint_change# from v$datafile;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1             587004
         2             587004
         3             587004
         4             587004
         5             587004

SQL> select file#,checkpoint_change# from v$datafile_header;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1             583375
         2                  0
         3                  0
         4                  0
         5             587004

这两个视图所取的信息来源完全不一样。v$datafile的信息来自控制文件;v$datafile_header的信息则来自每个数据文件的文件头。我们刚刚已经把file 1拷回来,所以,oracle可以读到它头上的scn。而2,3,4已经被删了便是读不到的。但是,file 1在两处的scn不一致。记住了,oracle会横向比较,纵向是不会比较的。即:不会拿file 1和file 3比较。oracle打开的必要条件是控制文件和数据文件的文件头的scn要一致。那么大于583375,而小于585469的scn都在归档日志里面。每个scn对应相关的操作。

SQL> select sequence#,first_change#,next_change# from v$archived_log;

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
         5        544404       558719
         6        558719       559931
         7        559931       560709
         8        560709       560959
         9        560959       560981
        10        560981       583374

什么是next_change#?日志组由当前日志组却换到非当前日志组时,所取的系统scn,来作为它的最大scn。first_change#是它成为current的开始;而next_change#则是它结束了current生涯的标志。
我们知道,比583375小的scn都已经写入数据文件。现在,我们需要确定583375是落在哪对first_change#和next_change#之间。从而确定广义前滚的起点。

SQL> select sequence#,first_change#,next_change# from v$archived_log
  2  where 583375>=first_change# and
  3  583375<=next_change#;

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
        11        583374       586090

由此,我们知道,583375落在归档日志11的first_change#和next_change#之间。我们恢复的时候,就从归档日志11开始。那么,我们到底需要多少的归档日志呢?

SQL> select sequence#,first_change#,next_change# from v$archived_log
  2  where sequence#>=11;

 SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
        11        583374       586090
        12        586090       586181
        13        586181       586656
        14        586656       586676
        15        586676       586704
        16        586704       586734
        17        586734       586762
        18        586762       586816

从上面可知,如果我们想把数据全部找回,我们需要借助到归档日志18.我们看一下这些first_change#和next_change#有什么特色?
归档日志11的next_change#是归档日志12的first_change#。以此类推,所以,这么多的归档日志,其实,逻辑上就只是一个归档日志。因此,归档日志必须连续!如果,你归档日志13坏了,那么只能恢复到12的next_change#。后面再多的归档也是徒然。
接下来,我们开始恢复。

SQL> recover datafile 1;
ORA-00279: change 583375 generated at 07/17/2012 19:59:23 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_11_%u_.ar
c
ORA-00280: change 583375 for thread 1 is in sequence #11


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

oracle告诉我们,583375 对于实例是需要的。并且,归档日志11在闪回区。如果敲回车,则采纳oracle的建议,oracle会自己到闪回区里面去找。我们敲一下回车键采纳oracle的建议。第二个选项,是不在默认路径里面,由你来告诉oracle,归档日志身在何处。你只要告诉oracle,归档日志的绝对路径+名称,就可以了。第三个选项,如果归档日志很多,一个个挨着去找,显得很麻烦,那么我们就去auto。第四个选项,如果恢复到一半,或者,没有了归档日志,那么你可以敲cancel。

ORA-00279: change 586090 generated at 07/18/2012 09:35:40 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_12_%u_.ar
c
ORA-00280: change 586090 for thread 1 is in sequence #12
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_11_80d4q
dmh_.arc' no longer needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

这时,oracle会再告诉我们,日志12是实例所需要的。我们先把这事给搁着。先去数据文件文件头,把scn给取出来瞧瞧。

SQL> select file#,checkpoint_change# from v$datafile_header;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1             586090
         2                  0
         3                  0
         4                  0
         5             587004

发现没?file 1 的scn变成了586090。而586090是归档日志12的first_change#。难怪oracle告诉我们日志12是实例必须的。接下来,我们敲auto。

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: change 586181 generated at 07/18/2012 09:39:21 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_13_%u_.ar
c
ORA-00280: change 586181 for thread 1 is in sequence #13
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_12_80d4y
9fp_.arc' no longer needed for this recovery


ORA-00279: change 586656 generated at 07/18/2012 09:42:06 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_14_%u_.ar
c
ORA-00280: change 586656 for thread 1 is in sequence #14
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_13_80d53
g42_.arc' no longer needed for this recovery


ORA-00279: change 586676 generated at 07/18/2012 09:42:59 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_15_%u_.ar
c
ORA-00280: change 586676 for thread 1 is in sequence #15
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_14_80d55
41h_.arc' no longer needed for this recovery


ORA-00279: change 586704 generated at 07/18/2012 09:44:03 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_16_%u_.ar
c
ORA-00280: change 586704 for thread 1 is in sequence #16
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_15_80d57
315_.arc' no longer needed for this recovery


Log applied.
Media recovery complete.

发现没?oracle只运用了归档日志到16。接下来的17,18就没有再提示了。为什么?我们先去数据文件的文件头把file 1的scn再取来看看。

SQL> select file#,checkpoint_change# from v$datafile_header;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1             587003
         2                  0
         3                  0
         4                  0
         5             587004

587003是不是比归档日志18的next_change#(586816)来得大呢。我们再来看看,当前日子组是哪一组。

SQL> select group#,sequence#,status,first_change# from v$log;

    GROUP#  SEQUENCE# STATUS           FIRST_CHANGE#
---------- ---------- ---------------- -------------
         1         17 INACTIVE                586734
         3         19 CURRENT                 586816
         2         18 INACTIVE                586762

可以看出,归档日志19的first_change#为586816。而数据文件头的scn是587003。当我们敲recover datafile 1时,oracle在做完全恢复。完全恢复的起点和终点是已经确定了。起点在数据文件的文件头,终点在控制文件里获取。因为,归档重做日志文件17,18是从联机重做日志文件1,2里面读出来的。oracle会优先去找联机重做日志文件。或者说,完全恢复时,oracle会自己去找联机重做日志文件;不完全恢复,我们可以把online redo log file的绝对路径和名称输进去。当前日志组是3,它的first_change#为586816,而587003比这个数大。可见,oracle也将当前日志文件给用上了。
接着恢复。这次,我们把剩余的数据文件全部拷回。然后大家一起往前走,直到步伐一致时,才能够同时停下来,这样子,oracle就处于一致的状态了。

SQL> recover database;
ORA-00279: change 583375 generated at 07/17/2012 19:59:23 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_11_%u_.ar
c
ORA-00280: change 583375 for thread 1 is in sequence #11


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: change 586090 generated at 07/18/2012 09:35:40 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_12_%u_.ar
c
ORA-00280: change 586090 for thread 1 is in sequence #12
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_11_80d4q
dmh_.arc' no longer needed for this recovery


ORA-00279: change 586181 generated at 07/18/2012 09:39:21 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_13_%u_.ar
c
ORA-00280: change 586181 for thread 1 is in sequence #13
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_12_80d4y
9fp_.arc' no longer needed for this recovery


ORA-00279: change 586656 generated at 07/18/2012 09:42:06 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_14_%u_.ar
c
ORA-00280: change 586656 for thread 1 is in sequence #14
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_13_80d53
g42_.arc' no longer needed for this recovery


ORA-00279: change 586676 generated at 07/18/2012 09:42:59 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_15_%u_.ar
c
ORA-00280: change 586676 for thread 1 is in sequence #15
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_14_80d55
41h_.arc' no longer needed for this recovery


ORA-00279: change 586704 generated at 07/18/2012 09:44:03 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_16_%u_.ar
c
ORA-00280: change 586704 for thread 1 is in sequence #16
ORA-00278: log file
'/u01/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_18/o1_mf_1_15_80d57
315_.arc' no longer needed for this recovery


Log applied.
Media recovery complete.

接下来,我们就可以打开数据库了。

SQL> alter database open;

Database altered.


事务对应的scn如果落在了哪个archivelog里,那么这个archivelog在恢复时就被用到.

总结,这篇博客里,我利用SCN和正常关机下数据文件损坏的完全恢复来帮助自己和大家一起理解oracle备份与恢复的原理。如果有足,希望走过路过的网友,给力批评。oracle的备份与恢复是门艺术。大家一起成长。go for it。

你可能感兴趣的:(oracle)