一次数据库hang,大量enqueue等待事件的问题

现象:数据库hang,大量enqueue等待事件,大约20分钟后数据库数据库自己恢复
平台 oracle 9.2.0.7 + hpux 11.31


收到告警:数据库断连和非空闲等待事件2948个,经查询bomc监控日志,2948个等待事件具体如下,因此原因基本可以确定为大量enqueue(锁等待)等待导致数据库异常,从p1=1397817350可以推断出lock mode为SQ,及sequence相关。Enqueue的进一步原因和相关应用sql还需要进一步确认。

SID SERIAL# OSUSER USERNAME SVRPROC PROCESS EVENT P1 P2 P3
---------- ---------- ------------------------------ ------------------------------ ------------ ------------ ---------------------------------------------------------------- ---------- ---------- ----------
25 12810 7832 enqueue 1397817350 149 0
63 40789 8166 enqueue 1397817350 149 0
59 32332 20857 enqueue 1397817350 149 0
43 60179 18928 enqueue 1397817350 149 0
399 29721 19165 enqueue
.。。。。。

经查询,enqueue类型为SQ
SELECT chr(to_char(bitand(1397817350,-16777216))/16777215)||
2 chr(to_char(bitand(1397817350, 16711680))/65535) "Lock",
3 to_char( bitand(1397817350, 65535) ) "Mode" from dual;

Lock Mode
-- -
SQ 6

SQL> select OWNER,OBJECT_NAME,OBJECT_TYPE from dba_objects where object_id=149;

OWNER OBJECT_NAME OBJECT_TYPE
------------------------------ -------------------- ------------------
SYS AUDSES$ SEQUENCE

通过收集故障时段的statspack报告,得出执行速度比较慢的语句,特别是含有squence的部分主要如下。但这些语句理论上该seq为循环模式,不应该导致enqueue等待。很可能由于连接数超过最大数,导致部分业务不能正常连接。p2=149看,应该是AUDSES$ 导致enqueue,而AUDSES$正是登录使用的sequeue,对应v$session中的AUDSID列

SQL> SELECT text FROM ALL_SOURCE where TYPE='FUNCTION' AND NAME='DSM_GET_MG_ID';

TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
function DSM_GET_MG_ID return varchar2 is
V_Result varchar2(32);
V_SEQ Varchar2(20);
begin
select dsm_seq_mgid.nextval into v_seq from dual;

V_Result :=to_char(sysdate,'YYYYMMDDHH24miSS')||v_seq;
return(V_Result);
end DSM_GET_MG_ID;


10 rows selected.


SQL> select * from dba_sequences where SEQUENCE_NAME = 'DSM_SEQ_MSGID';

SEQUENCE_OWNER SEQUENCE_NAME MIN_VALUE MAX_VALUE INCREMENT_BY C O CACHE_SIZE LAST_NUMBER
------------------------------ ------------------------------ ---------- ---------- ------------ - - ---------- -----------
CBOSS DSM_SEQ_MGID 100000 999999 1 Y Y 10 489050


继续分析,感觉sequeue不是最终的原因,查看系统日志,看出了别的问题
经过继续分析,昨日问题现在看根本原因应该是日志文件不能正常切换导致数据库hang,进而导致各种等待事件。已1_59448.dbf文件为例,17:01就开始归档,17:39:27才最终完成,正常情况下应该1分钟内完成,由于未知原因导致花费40分钟才完成一个日志文件的归档动作。我看了下相关文件系统,未出现归档表空间满的情况,故障时间点利用率只有15%。



Mon Mar 26 17:01:41 2012
ARC1: Evaluating archive log 3 thread 1 sequence 59448
ARC1: Unable to archive log 3 thread 1 sequence 59448
Log actively being archived by another process
Mon Mar 26 17:01:46 2012
Beginning log switch checkpoint up to RBA [0xe83a.2.10], SCN: 0x0b09.08dc9044
Thread 1 advanced to log sequence 59450
Current log# 2 seq# 59450 mem# 0: /dev/vgcboss02/rredo02
Mon Mar 26 17:01:46 2012
ARC1: Evaluating archive log 3 thread 1 sequence 59448
ARC1: Unable to archive log 3 thread 1 sequence 59448
Log actively being archived by another process
ARC1: Evaluating archive log 1 thread 1 sequence 59449
ARC1: Beginning to archive log 1 thread 1 sequence 59449
Creating archive destination LOG_ARCHIVE_DEST_1: '/cboss/arch/1_59449.dbf'
Mon Mar 26 17:04:49 2012
Completed checkpoint up to RBA [0xe83a.2.10], SCN: 0x0b09.08dc9044
Mon Mar 26 17:14:22 2012
ORACLE Instance cbossdb - Can not allocate log, archival required
Mon Mar 26 17:14:22 2012
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 59451
All online logs needed archiving
Current log# 2 seq# 59450 mem# 0: /dev/vgcboss02/rredo02
Mon Mar 26 17:39:25 2012
ARC1: Completed archiving log 1 thread 1 sequence 59449
Mon Mar 26 17:39:25 2012
Beginning log switch checkpoint up to RBA [0xe83b.2.10], SCN: 0x0b09.09609366


NAME COMPLETION_TIME S
-------------------------------------------------------------------------------- -------------------- -
/cboss/arch/1_59436.dbf 2012/03/26 14:22:34 A
/cboss/arch/1_59437.dbf 2012/03/26 14:30:44 A
/cboss/arch/1_59438.dbf 2012/03/26 14:45:22 A
/cboss/arch/1_59439.dbf 2012/03/26 14:55:34 A
/cboss/arch/1_59440.dbf 2012/03/26 15:05:23 A
/cboss/arch/1_59441.dbf 2012/03/26 15:16:23 A
/cboss/arch/1_59442.dbf 2012/03/26 15:26:59 A
/cboss/arch/1_59443.dbf 2012/03/26 15:42:02 A
/cboss/arch/1_59444.dbf 2012/03/26 15:57:22 A
/cboss/arch/1_59445.dbf 2012/03/26 16:12:52 A
/cboss/arch/1_59446.dbf 2012/03/26 16:27:01 A

NAME COMPLETION_TIME S
-------------------------------------------------------------------------------- -------------------- -
/cboss/arch/1_59447.dbf 2012/03/26 16:40:53 A
/cboss/arch/1_59449.dbf 2012/03/26 17:39:25 A
/cboss/arch/1_59448.dbf 2012/03/26 17:39:27 A
/cboss/arch/1_59450.dbf 2012/03/26 17:40:32 A
/cboss/arch/1_59451.dbf 2012/03/26 17:51:06 A
/cboss/arch/1_59452.dbf 2012/03/26 18:06:51 A
/cboss/arch/1_59453.dbf 2012/03/26 18:33:36 A
/cboss/arch/1_59454.dbf 2012/03/26 19:12:46 A
/cboss/arch/1_59455.dbf 2012/03/26 19:54:10 A
/cboss/arch/1_59456.dbf 2012/03/26 20:33:57 A

总结分析:最终归档慢的原因没有找到,但原因却不是当初表现的seq问题,所以分析问题还需要全面进行系统查看。主机、数据库的日志是必须要分析的

你可能感兴趣的:(Queue)