Oracle enq: CF - contention导致数据挂起

接到客户电话,数据库失去响应,所有业务挂起
数据库版本为

[quote]SQL> select * from v$version where rownum=1;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi[/quote]
主机版本为
[quote]$ uname -a
HP-UX rp7410 B.11.23 U 9000/800 1943000290 unlimited-user license[/quote]
查看数据库等待信息,可以看到enq: CF - contention导致数据挂起
[quote]SQL> select sid,event from v$session_wait where event not like '%rdbms%' and event not like '%SQL%';
SID EVENT
---------- ----------------------------------------------------------------
164 control file sequential read
200 Streams AQ: qmn slave idle wait
205 Streams AQ: qmn coordinator idle wait
206 Streams AQ: waiting for time management or cleanup tasks
211 enq: CF - contention
212 enq: CF - contention
216 enq: CF - contention
219 smon timer
221 enq: CF - contention
225 pmon timer[/quote]

于是马上查看阻塞会话信息
[quote]SQL> select SID,SERIAL# from v$session where sid in (select sid from v$lock where block=1);

SID SERIAL#
---------- ----------
164 7097
220 1[/quote]
首先在数据库层面将会话杀死
[quote]SQL> alter system kill session '220,1';
alter system kill session '220,1'
*
ERROR at line 1:
ORA-00029: session is not a user session[/quote]
提示会话220不是用户会话,那它是什么呢?
[quote]SQL> select SID,AUDSID,USERNAME,TYPE,PROGRAM
2 FROM V$SESSION WHERE SID='220';

SID AUDSID USERNAME TYPE
---------- ---------- ------------------------------ ----------
PROGRAM
------------------------------------------------
220 0 BACKGROUND
oracle@rp4440 (CKPT)[/quote]
可以看到是Oracle后台CKPT进程。
[quote]SQL> alter system kill session '164,7097';
alter system kill session '164,7097'
*
ERROR at line 1:
ORA-00031: session marked for kill[/quote]

进一步查看操作系统进程
[quote]SQL> select SPID from v$process where addr=(select PADDR from v$session where sid=164);

SPID
------------
24660

SQL> !ps -ef|grep 24660
oracle 24660 1 0 12:49:16 ? 0:00 [color=red]oraclexxxi (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))[/color]
oracle 24706 24701 0 12:56:33 pts/ta 0:00 /usr/bin/sh -c ps -ef|grep 24660
oracle 24708 24706 0 12:56:33 pts/ta 0:00 grep 24660[/quote]

红色标注进程为引起数据库挂住的元凶,只要将其杀死,数据库恢复正常
[quote]$ kill -9 24660[/quote]

事后检查alert日志发现客户创建standby controlfile至nfs目录,导致数据库挂起。进一步检查nfs参数
[quote]rw,bg,hard,nointr,rsize=32768,wsize=32768,vers=3,timeo=600,actimeo=0 0 2[/quote]
发现并无异样,通过metalink解释,此为Oracle 10.2.0.1写nfs文件系统bug,Oracle号称通过添加nfs参数可以解决,但事实证明,至少在HP-UX 11.21上又是一句空话。
我们来看看control file sequential read的官方解释
[quote]Reading from the control file. This happens in many cases. For example, while:

[color=red]*Making a backup of the control files[/color]
*Sharing information (between instances) from the control file
*Reading other blocks from the control files
*Reading the header block
[/quote]

你可能感兴趣的:(ORACLE管理,Oracle,SQL,HP)