Oracle Undo tablespace恢复

系统环境:

  操作系统:RedHat EL55

  Oracle:  Oracle 11gR2


  Oracle 9i后,采用了undo tablespace管理undo数据,实现undo的自动管理,本案例演示了undo表空间被破坏后如何恢复;如果有备份,通过备份恢复非常容易,但在没有备份的情况下,就需要采用非常规手段来恢复了,呵呵。

1、案例应用环境

undo表空间undo segments:

14:34:44 SYS@ prod>select segment_name,segment_type,tablespace_name from dba_segments
14:35:03   2  where tablespace_name='UNDOTBS1';
SEGMENT_NAME         SEGMENT_TYPE       TABLESPACE_NAME
-------------------- ------------------ ------------------------------
_SYSSMU10_3550978943$ TYPE2 UNDO        UNDOTBS1
_SYSSMU9_1424341975$ TYPE2 UNDO         UNDOTBS1
_SYSSMU8_2012382730$ TYPE2 UNDO         UNDOTBS1
_SYSSMU7_3286610060$ TYPE2 UNDO         UNDOTBS1
_SYSSMU6_2443381498$ TYPE2 UNDO         UNDOTBS1
_SYSSMU5_1527469038$ TYPE2 UNDO         UNDOTBS1
_SYSSMU4_1152005954$ TYPE2 UNDO         UNDOTBS1
_SYSSMU3_2097677531$ TYPE2 UNDO         UNDOTBS1
_SYSSMU2_2232571081$ TYPE2 UNDO         UNDOTBS1
_SYSSMU1_3780397527$ TYPE2 UNDO         UNDOTBS1
10 rows selected.
Elapsed: 00:00:00.19

模拟应用环境:

14:43:16 SYS@ prod>select count(*) from scott.emp1;
  COUNT(*)
----------
         1
Elapsed: 00:00:00.06
14:43:54 SYS@ prod>insert into scott.emp1 select * from scott.emp;
14 rows created.
Elapsed: 00:00:00.08
14:44:04 SYS@ prod>commit;
Commit complete.
Elapsed: 00:00:00.03
14:44:06 SYS@ prod>select count(*) from scott.emp1;                    
  COUNT(*)
----------
        15
Elapsed: 00:00:00.01
14:44:20 SYS@ prod>insert into scott.emp1 select * from scott.emp where rownum <4;
3 rows created.
Elapsed: 00:00:00.03
14:44:40 SYS@ prod>select count(*) from scott.emp1;   
  COUNT(*)
----------
        18
Elapsed: 00:00:00.01

关库前,事务未提交!


开启新的会话,模拟断电,将数据库非正常关闭:


[oracle@rh6 dbs]$ sqlplus '/as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Thu Jun 26 14:46:11 2014

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

14:46:12 SYS@ prod>shutdown abort;

ORACLE instance shut down.


Undo tablesapce datafile被破坏:

[oracle@rh6 dbs]$ rm /u01/app/oracle/oradata/prod/undotbs01.dbf 


2、Open database时数据库报错

[oracle@rh6 prod]$ sqlplus '/as sysdba'
SQL*Plus: Release 11.2.0.1.0 Production on Thu Jun 26 14:47:26 2014
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to an idle instance.
14:47:26 SYS@ prod>startup
ORACLE instance started.
Total System Global Area  835104768 bytes
Fixed Size                  2217952 bytes
Variable Size             775948320 bytes
Database Buffers           54525952 bytes
Redo Buffers                2412544 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/app/oracle/oradata/prod/undotbs01.dbf'

告警日志:

ALTER DATABASE OPEN
Errors in file /u01/app/oracle/diag/rdbms/prod/prod/trace/prod_dbw0_7481.trc:
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/app/oracle/oradata/prod/undotbs01.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/prod/prod/trace/prod_ora_7503.trc:
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/app/oracle/oradata/prod/undotbs01.dbf'
ORA-1157 signalled during: ALTER DATABASE OPEN...

14:47:37 SYS@ prod>show parameter undo
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     900
undo_tablespace                      string      UNDOTBS1
14:48:18 SYS@ prod>

3、在没有备份的情况下进行修复:


创建pfile:

14:48:18 SYS@ prod>create pfile from spfile;

File created.

Elapsed: 00:00:00.09

查看undo tablespace使用的undo segments:

[oracle@rh6 dbs]$ cd /u01/app/oracle/oradata/prod/

[oracle@rh6 prod]$ strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u

              and substr(drs.segment_name,1,7) != '_SYSSMU' 

D'              and substr(drs.segment_name,1,7) != ''_SYSSMU'' ' );

_SYSSMU10_3550978943

_SYSSMU10_3904554333

_SYSSMU11_286947212

_SYSSMU12_3068564564

_SYSSMU13_2761193625

_SYSSMU1_3780397527

_SYSSMU14_2421411996

_SYSSMU15_1683924174

_SYSSMU16_2313212396

_SYSSMU17_2041439332

_SYSSMU1_783380902

_SYSSMU18_2800789714

_SYSSMU19_53723967

_SYSSMU20_3850939844

_SYSSMU2_2232571081

_SYSSMU2_3138176977

_SYSSMU3_1645411166

_SYSSMU3_2097677531

_SYSSMU4_1152005954

_SYSSMU4_870421980

_SYSSMU5_1527469038

_SYSSMU5_2525172762

_SYSSMU6_2443381498

_SYSSMU6_3753507049

_SYSSMU7_1260614213

_SYSSMU7_3286610060

_SYSSMU8_2012382730

_SYSSMU8_2806087761

_SYSSMU9_1424341975

_SYSSMU9_973944058

注意:通过system01.dbf查出了,正在使用的undo segment,以上按使用时间做了排序,注意只选择那些排在最前面的(相同回滚段);默认每个undo tablespace 会应用10个undo segments。

编辑pfile,添加Oracle隐含参数,跳过undo segments的检测:

[oracle@rh6 dbs]$ cat initprod.ora

*.undo_tablespace='SYSTEM'

*.undo_management='MANUAL'

*._corrupted_rollback_segments=(_SYSSMU10_3550978943$,_SYSSMU9_1424341975$,_SYSSMU8_2012382730$,_SYSSMU7_3286610060$,_SYSSMU6_2443381498$,_SYSSMU5_1527469038$,_SYSSMU4_1152005954$,_SYSSMU3_2097677531$,_SYSSMU2_2232571081$,_SYSSMU1_3780397527$)

*._allow_resetlogs_corruption=true


通过pfile启动instance:


14:56:33 SYS@ prod>startup force pfile=$ORACLE_HOME/dbs/initprod.ora

ORACLE instance started.

Total System Global Area  835104768 bytes

Fixed Size                  2217952 bytes

Variable Size             775948320 bytes

Database Buffers           54525952 bytes

Redo Buffers                2412544 bytes

Database mounted.

ORA-01157: cannot identify/lock data file 3 - see DBWR trace file

ORA-01110: data file 3: '/u01/app/oracle/oradata/prod/undotbs01.dbf'


14:56:52 SYS@ prod>alter database datafile 3 offline drop;

Database altered.


14:57:11 SYS@ prod>alter database open;

Database altered.

Elapsed: 00:00:05.25

14:57:23 SYS@ prod>


告警日志:

alter database open
Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 157 KB redo, 43 data blocks need recovery
Started redo application at
 Thread 1: logseq 3, block 451
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /dsk1/oradata/prod/redo03a.log
  Mem# 1: /dsk2/oradata/prod/redo03b.log
Completed redo application of 0.04MB
Completed crash recovery at
 Thread 1: logseq 3, block 766, scn 1878512
 43 data blocks read, 43 data blocks written, 157 redo k-bytes read
Thu Jun 26 14:57:19 2014
LGWR: STARTING ARCH PROCESSES
Thu Jun 26 14:57:19 2014
ARC0 started with pid=20, OS id=7638 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 4 (thread open)
Thu Jun 26 14:57:20 2014
ARC1 started with pid=21, OS id=7641 
Thread 1 opened at log sequence 4
  Current log# 1 seq# 4 mem# 0: /dsk1/oradata/prod/redo01a.log
  Current log# 1 seq# 4 mem# 1: /dsk2/oradata/prod/redo01b.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jun 26 14:57:20 2014
SMON: enabling cache recovery
Thu Jun 26 14:57:21 2014
ARC2 started with pid=22, OS id=7643 
ARC1: Archival started
ARC2: Archival started
Thu Jun 26 14:57:21 2014
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
ARC3 started with pid=23, OS id=7645 
Archived Log entry 31 added for thread 1 sequence 3 ID 0xfaf1e60 dest 2:
Thu Jun 26 14:57:21 2014
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Starting background process QMNC
Thu Jun 26 14:57:23 2014
QMNC started with pid=24, OS id=7647 
Completed: alter database open
Thu Jun 26 14:57:26 2014
Starting background process CJQ0
Thu Jun 26 14:57:26 2014
CJQ0 started with pid=27, OS id=7661

4、创建新的undo tablespace

14:59:27 SYS@ prod>show parameter undo
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      MANUAL
undo_retention                       integer     900
undo_tablespace                      string      SYSTEM
14:59:34 SYS@ prod>create undo tablespace undotbs2
14:59:53   2  datafile '/u01/app/oracle/oradata/prod/undotbs2.dbf' size 100m
15:00:09   3  autoextend on;
15:01:28 SYS@ prod>create spfile from pfile;
File created.
Elapsed: 00:00:00.12
15:02:23 SYS@ prod>shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
15:02:39 SYS@ prod>startup
ORACLE instance started.
Total System Global Area  835104768 bytes
Fixed Size                  2217952 bytes
Variable Size             775948320 bytes
Database Buffers           54525952 bytes
Redo Buffers                2412544 bytes
Database mounted.
Database opened.
15:03:36 SYS@ prod>alter system set undo_management=auto scope=spfile;
System altered.
Elapsed: 00:00:00.09
15:03:49 SYS@ prod>alter system set undo_tablespace=undotbs2 scope=spfile;
System altered.
Elapsed: 00:00:00.04
15:04:09 SYS@ prod>startup force;
ORACLE instance started.
Total System Global Area  835104768 bytes
Fixed Size                  2217952 bytes
Variable Size             775948320 bytes
Database Buffers           54525952 bytes
Redo Buffers                2412544 bytes
Database mounted.
Database opened.
15:04:28 SYS@ prod>show parameter undo
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     900
undo_tablespace                      string      UNDOTBS2
15:04:36 SYS@ prod>select usn,name from v$rollname;
       USN NAME
---------- ------------------------------
         0 SYSTEM
        21 _SYSSMU21_2312338076$
        22 _SYSSMU22_3375463809$
        23 _SYSSMU23_4084707454$
        24 _SYSSMU24_386518199$
        25 _SYSSMU25_2810228709$
        26 _SYSSMU26_2968904537$
        27 _SYSSMU27_3269963619$
        28 _SYSSMU28_707429450$
        29 _SYSSMU29_2754652023$
        30 _SYSSMU30_1737877121$
11 rows selected.
Elapsed: 00:00:00.05


15:04:44 SYS@ prod>create pfile from spfile;
File created.

将隐含参数从pfile删除:

[oracle@rh6 dbs]$ cat initprod.ora 
prod.__db_cache_size=16777216
prod.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
#*._allow_resetlogs_corruption=true
#*._corrupted_rollback_segments='_SYSSMU10_3550978943$','_SYSSMU9_1424341975$','_SYSSMU8_2012382730$','_SYSSMU7_3286610060$','_SYSSMU6_2443381498$','_SYSSMU5_1527469038$','_SYSSMU4_1152005954$','_SYSSMU3_2097677531$','_SYSSMU2_2232571081$','_SYSSMU1_3780397527$'
*.audit_file_dest='/u01/app/oracle/admin/prod/adump'
*.audit_trail='db'
*.compatible='11.2.0.0.0'
*.control_files='/u01/app/oracle/oradata/prod/control01.ctl','/dsk1/oradata/prod/control02.ctl'#Restore Controlfile
*.db_16k_cache_size=25165824
*.db_block_size=8192
*.db_cache_size=16777216
*.db_domain=''
*.db_keep_cache_size=0
*.db_name='prod'
*.db_recycle_cache_size=12582912
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=prodXDB)'
*.log_archive_dest_1='location=/dsk4/arch_prod'
*.log_archive_dest_2='location=/dsk4/arch1'
*.log_archive_dest_state_1='DEFER'
*.log_archive_format='arch_%t_%s_%r.log'
*.memory_target=0
*.O7_DICTIONARY_ACCESSIBILITY=TRUE
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_max_size=838860800
*.shared_pool_reserved_size=12582912
*.shared_pool_size=200886080
*.undo_management='AUTO'
*.undo_tablespace='UNDOTBS2'

重新创建spfile:

15:04:44 SYS@ prod>create spfile from pfile;
File created.

@至此,undo tablespace恢复完毕!