谈谈Oracle数据库启动时禁用smon事务恢复

数据库在异常宕机之后,数据库再次启动时,smon会进行crash recover之后再进行tx recover,这在alert日志可以清晰的看到smon的操作过程。
但事情发展往往不如人意,试想一下当回滚段出现异常时,数据库势必不能启动,这时候我们往往需要借助一些异常手段,将数据库open。
Oracle的一些内部事件允许设置debug_mode,以便允许在AUM模式下,手工进行回滚段的处理:
alter session set "_smu_debug_mode"=4;
alter rollback segment "_SYSSMU7$" ONLINE;
或者使用隐含参数offline_rollback_segments进行指定回滚段处理:
_offline_rollback_segments
drop rollback segment ‘xxx’
如果回滚段正常,某些恢复可能需要很长的时间,在此期间的一些异常可能导致SMON Crash,进而数据库Instance崩溃。这时候我们可以引进一些事件进行诊断。
如:
禁用smon恢复
event = '10513 trace name context forever,level 2'
跟踪smon进程
event = '10500 trace name context forever, level 1'

测试案例。
不设置event 系统为初始状态
SQL> show parameter event

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event
并产生大批量脏数据
SQL> insert into testsmon1 select * from testsmon1;

81920 rows created
数据库重启,注意这里使用的是force选项
SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             494931372 bytes
Database Buffers          327155712 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.
我们在alert日志里可以看到

ALTER DATABASE OPEN
Fri Dec 17 16:30:49 2010
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
Fri Dec 17 16:30:49 2010
Started redo scan
Fri Dec 17 16:30:49 2010
Completed redo scan
28142 redo blocks read, 1885 data blocks need recovery
Fri Dec 17 16:30:49 2010
Started redo application at
Thread 1: logseq 66211, block 3
Fri Dec 17 16:30:49 2010
Recovery of Online Redo Log: Thread 1 Group 5 Seq 66211 Reading mem 0
  Mem# 0: /oradata/mcstar/mcstar/redo05.log
Fri Dec 17 16:30:49 2010
Completed redo application
Fri Dec 17 16:30:49 2010
db_recovery_file_dest_size of 10240 MB is 79.77% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Fri Dec 17 16:30:50 2010
Completed crash recovery at
Thread 1: logseq 66211, block 28145, scn 10995135133737
1885 data blocks read, 1885 data blocks written, 28142 redo blocks read
Fri Dec 17 16:30:50 2010
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=29, OS id=16094
Fri Dec 17 16:30:50 2010
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=30, OS id=16111
Fri Dec 17 16:30:50 2010
Thread 1 advanced to log sequence 66212
Thread 1 opened at log sequence 66212
  Current log# 4 seq# 66212 mem# 0: /oradata/mcstar/mcstar/redo04.log
Successful open of redo thread 1
Fri Dec 17 16:30:50 2010
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Dec 17 16:30:50 2010
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Fri Dec 17 16:30:50 2010
SMON: enabling cache recovery
Fri Dec 17 16:30:50 2010
ARC0: Becoming the heartbeat ARCH
Fri Dec 17 16:30:50 2010
Successfully onlined Undo Tablespace 12.
Fri Dec 17 16:30:50 2010
SMON: enabling tx recovery

[ora10g@mcprod bdump]$ more mcstar_smon_15889.trc
/ora10g/app/admin/mcstar/bdump/mcstar_smon_15889.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /ora10g/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      mcprod
Release:        2.6.18-53.el5PAE
Version:        #1 SMP Mon Nov 12 02:55:09 EST 2007
Machine:        i686
Instance name: mcstar
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 15889, image: oracle@mcprod (SMON)

*** SERVICE NAME:() 2010-12-17 16:30:51.329
*** SESSION ID:(549.1) 2010-12-17 16:30:51.329
Dead transaction 0x000a.02d.0006936b recovered by 16 server(s)
*** 2010-12-17 16:30:51.332
SMON: Parallel transaction recovery tried

现在我们创建一个测试表格
SQL> select OBJECT_ID from dba_objects where owner='ZHOU' and OBJECT_NAME='T';

OBJECT_ID
----------
   2275996

SQL> declare
  2  begin
  3  for i in 1..100000 loop
  4  insert into t values(i,i+1);
  5  end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.


SQL> select count(*) from v$bh where objd=2275996;

  COUNT(*)
----------
       434
      
现在将其重启

SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             503319980 bytes
Database Buffers          318767104 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.

后台我们看到了smon在进行并行恢复,对象2275996的block已经在内存中,并以current状态存在

SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           212

我们将其从内存中刷出
SQL> alter system flush buffer_cache;

System altered.
可以看到都为内存free状态
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
free           392

现在我们查询表格T,数据会从哪里获取呢?从业务表空间还是回滚表空间呢?

SQL> alter session set db_file_multiblock_read_count=1;

Session altered.

SQL> alter session set events '10200 trace name context forever, level 1';

Session altered.

SQL> select count(*) from t;

  COUNT(*)
----------
         0
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           425
free           275

观察后台trc进程可以看到,数据库都从业务表空间获取,可见smon进程在数据库启动时已经完成tx恢复。

现在测试的重点来了,引进一个事件
SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile;

System altered.

继续在业务表空间创建大量脏数据

SQL> declare
  2  begin
  3  for i in 1..100000 loop
  4  insert into t values(i,i+1);
  5  end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           368

将系统重启,可以看到event已经生效
SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             528485804 bytes
Database Buffers          293601280 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.
SQL> show parameter event

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event                                string      10513 trace name context forev
                                                 er,level 2

进一步观察alert日志,依然出现SMON: enabling tx recovery字段。
Fri Dec 17 17:02:23 2010
ALTER DATABASE OPEN
Fri Dec 17 17:02:23 2010
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
Fri Dec 17 17:02:23 2010
Started redo scan
Fri Dec 17 17:02:23 2010
Completed redo scan
91999 redo blocks read, 2752 data blocks need recovery
Fri Dec 17 17:02:23 2010
Started redo application at
Thread 1: logseq 66217, block 3
Fri Dec 17 17:02:23 2010
Recovery of Online Redo Log: Thread 1 Group 5 Seq 66217 Reading mem 0
  Mem# 0: /oradata/mcstar/mcstar/redo05.log
Fri Dec 17 17:02:24 2010
Completed redo application
Fri Dec 17 17:02:24 2010
Completed crash recovery at
Thread 1: logseq 66217, block 92002, scn 10995135225524
2752 data blocks read, 2752 data blocks written, 91999 redo blocks read
Fri Dec 17 17:02:24 2010
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=29, OS id=467
Fri Dec 17 17:02:24 2010
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=30, OS id=469
Fri Dec 17 17:02:24 2010
Thread 1 advanced to log sequence 66218
Thread 1 opened at log sequence 66218
  Current log# 4 seq# 66218 mem# 0: /oradata/mcstar/mcstar/redo04.log
Successful open of redo thread 1
Fri Dec 17 17:02:24 2010
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Dec 17 17:02:24 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Dec 17 17:02:24 2010
ARC1: Becoming the heartbeat ARCH
Fri Dec 17 17:02:24 2010
SMON: enabling cache recovery
Fri Dec 17 17:02:24 2010
Successfully onlined Undo Tablespace 12.
Fri Dec 17 17:02:24 2010
SMON: enabling tx recovery
Fri Dec 17 17:02:24 2010
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Dec 17 17:02:24 2010
Streams APPLY A001 started with pid=31, OS id=471
Fri Dec 17 17:02:24 2010
Streams APPLY A002 started with pid=32, OS id=473
Fri Dec 17 17:02:24 2010
Starting background process QMNC
QMNC started with pid=33, OS id=475
Fri Dec 17 17:02:25 2010
Completed: ALTER DATABASE OPEN                                              

但是没有相应trace文件产生,如果要产生trace文件,我们可以加另外一个event 10500,重复上述过程。
SQL> alter system set event='10513 trace name context forever,level 2','10500 trace name context forever, level 1' scope=spfile;

System altered.
数据库重启之后
SQL> startup force
ORACLE instance started.

Total System Global Area  838860800 bytes
Fixed Size                  1262164 bytes
Variable Size             536874412 bytes
Database Buffers          285212672 bytes
Redo Buffers               15511552 bytes
Database mounted.
Database opened.
可以看到跟踪文件smon的状态
[ora10g@mcprod bdump]$ more mcstar_smon_11666.trc
/ora10g/app/admin/mcstar/bdump/mcstar_smon_11666.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /ora10g/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      mcprod
Release:        2.6.18-53.el5PAE
Version:        #1 SMP Mon Nov 12 02:55:09 EST 2007
Machine:        i686
Instance name: mcstar
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 11666, image: oracle@mcprod (SMON)

*** SERVICE NAME:() 2010-12-17 17:08:12.897
*** SESSION ID:(549.1) 2010-12-17 17:08:12.897
*** 2010-12-17 17:08:12.897
SMON: Event 10513 is level 2, trans recovery disabled.
kglScanDependencyHandles4Unpin():
  cumscan=1 cumupin=2 time=64424509440 upinned=0
*** 2010-12-17 17:08:12.976
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.245
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.299
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.469
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.477
SMON: Event 10513 is level 2, trans recovery disabled.
*** 2010-12-17 17:08:13.593
SMON: Event 10513 is level 2, trans recovery disabled.

但是alert日志里面显示,这跟trace矛盾。
Fri Dec 17 17:08:12 2010
SMON: enabling tx recovery

但是在数据库打开之后,我们看到内存中并没有对象2275996
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

no rows selected
观察业务数据从哪里获取呢?
SQL>  alter session set db_file_multiblock_read_count=1;

Session altered.

SQL> alter session set events '10200 trace name context forever, level 1';

Session altered.

SQL> select count(*) from t;

  COUNT(*)
----------
         0

可以看到有cr状态,推断出有部分数据从undo表空间中获取
SQL> select status,count(*) from v$bh  where objd=2275996 group by status;

STATUS    COUNT(*)
------- ----------
xcur           501
cr             408
        
查看跟踪文件,可以看到和cr一样
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep "uba: 0x008"|wc -l
408

最后将event恢复初始
SQL> alter system set event='' scope=spfile;

System altered.
再次重启系统

可以看到smon将0x000a.02d.00069371和0x0001.021.0006881d作为死事务处理
[ora10g@mcprod bdump]$ more mcstar_smon_14715.trc
/ora10g/app/admin/mcstar/bdump/mcstar_smon_14715.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /ora10g/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      mcprod
Release:        2.6.18-53.el5PAE
Version:        #1 SMP Mon Nov 12 02:55:09 EST 2007
Machine:        i686
Instance name: mcstar
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 14715, image: oracle@mcprod (SMON)

*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-17 17:32:48.011
*** SESSION ID:(549.1) 2010-12-17 17:32:48.011
Dead transaction 0x000a.02d.00069371 recovered by 15 server(s)
Dead transaction 0x0001.021.0006881d recovered by 15 server(s)
*** 2010-12-17 17:32:48.048
SMON: Parallel transaction recovery tried

回过头去看之前的跟踪日志,完全印证之前的事务。
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep 00069371|wc -l
204
You have new mail in /var/spool/mail/ora10g
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep 0006881d|wc -l
204
[ora10g@mcprod udump]$ cat mcstar_ora_26883.trc|grep "uba: 0x008"|wc -l
408

当然在设了event 10513之后,我们可以从x$ktuxe获取死事务的相关详细,详见
我的另外一篇文章http://itspace.iteye.com/blog/644509

你可能感兴趣的:(thread,oracle,sql,SQL Server,OS)