1.现象:
客户10.2.0.4 RAC环境,出现大量的library cache lock和cursor: pin S wait on X等待,经分析是由于统计信息收集僵死导致的。
数据库在8点到9点期间,数据库两个节点都存在明显的cursor: pin S wait on X和library cache lock的等待:
TOP 5 EVENT:
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
cursor: pin S wait on X |
1,573,056 |
30,651 |
19 |
146.2 |
Concurrency |
library cache lock |
31,757 |
7,009 |
221 |
33.4 |
Concurrency |
CPU time |
6,416 |
30.6 |
|||
DFS lock handle |
12,381 |
2,979 |
241 |
14.2 |
Other |
latch: library cache |
1,646 |
1,974 |
1,199 |
9.4 |
Concurrency |
Event |
Waits |
Time(s) |
Avg Wait(ms) |
% Total Call Time |
Wait Class |
cursor: pin S wait on X |
17,586,451 |
342,437 |
19 |
812.1 |
Concurrency |
library cache lock |
63,657 |
30,153 |
474 |
71.5 |
Concurrency |
CPU time |
3,820 |
9.1 |
|||
db file sequential read |
241,761 |
1,094 |
5 |
2.6 |
User I/O |
inactive session |
两个节点的等待现象基本一致,而节点1上还存在明显的DFS lock handle等待事件。
通过分析ASH信息,发现library cache lock和cursor: pin S wait on X等待基本上都是6点之后才开始出现:
2.分析原因
SELECT trunc(sample_time, 'hh24') TIME, COUNT(*) FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en WHERE ash.event_id = en.event_id AND sample_time>= to_timestamp('20130703', 'yyyymmdd') AND event_name IN ('cursor: pin S wait on X', 'library cache lock') GROUP BY trunc(sample_time, 'hh24') ORDER BY 1;
而观察出现cursor: pin S wait on X和library cache lock等待会话的BLOCKING SESSION,发现大部分处于等待的会话被会话37锁定:
SELECT to_char(sample_time, 'hh24:mi:ss.ff') TIME, session_id sid, decode(en.event_name, 'cursor: pin S wait on X', to_number(substr(ltrim(to_char(p2, '0xxxxxxxxxxxxxxx')), 1, 8), 'xxxxxxxx'), blocking_session) b_sid, en.event_name FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en WHERE ash.event_id = en.event_id(+) AND instance_number = 1 AND sample_time >= to_timestamp('201407140800', 'yyyymmddhh24mi') AND sample_time <= to_timestamp('201407140801', 'yyyymmddhh24mi') ORDER BY 1;
TIME SID B_SID EVENT_NAME ------------------ ---------- ---------- ------------------------------ 08:00:09.569 457 4294967291 latch: library cache 08:00:09.569 52 981 cursor: pin S wait ON X 08:00:09.569 168 73 library cache LOCK 08:00:09.569 1763 4294967291 SQL*Net message FROM client 08:00:09.569 123 4294967295 kksfbc child completion 08:00:09.569 476 4294967291 kst: async disk IO 08:00:09.569 500 4294967292 db file sequential READ 08:00:09.569 327 4294967291 SQL*Net message TO client 08:00:09.569 233 4294967291 db file scattered READ 08:00:09.569 73 73 DFS LOCK handle 08:00:09.569 1889 4294967291 gc CURRENT block 2-way 08:00:09.569 1969 981 cursor: pin S wait ON X 08:00:09.569 857 981 cursor: pin S wait ON X 08:00:09.569 812 168 cursor: pin S wait ON X 08:00:09.569 981 73 library cache LOCK 08:00:19.669 52 981 cursor: pin S wait ON X 08:00:19.669 812 168 cursor: pin S wait ON X 08:00:19.669 857 981 cursor: pin S wait ON X 08:00:19.669 1969 981 cursor: pin S wait ON X 08:00:19.669 233 4294967291 db file scattered READ 08:00:19.669 500 4294967292 db file sequential READ 08:00:19.669 1763 4294967291 db file sequential READ 08:00:19.669 476 4294967291 kst: async disk IO 08:00:19.669 457 4294967291 kksfbc child completion 08:00:19.669 73 73 DFS LOCK handle 08:00:19.669 654 4294967291 SQL*Net message TO client 08:00:19.669 168 73 library cache LOCK 08:00:19.669 981 73 library cache LOCK 08:00:29.769 1969 981 cursor: pin S wait ON X 08:00:29.769 52 981 cursor: pin S wait ON X 08:00:29.769 457 4294967291 kksfbc child completion 08:00:29.769 981 73 library cache LOCK 08:00:29.769 168 73 library cache LOCK 08:00:29.769 500 4294967292 db file sequential READ 08:00:29.769 73 73 DFS LOCK handle 08:00:29.769 654 4294967291 SQL*Net message TO client 08:00:29.769 327 4294967291 SQL*Net message TO client 08:00:29.769 1763 4294967291 db file sequential READ 08:00:29.769 857 981 cursor: pin S wait ON X 08:00:29.769 123 4294967295 latch: library cache 08:00:29.769 196 4294967295 latch: library cache 08:00:29.769 233 4294967295 latch: library cache 08:00:29.769 812 168 cursor: pin S wait ON X 08:00:29.769 476 4294967291 kst: async disk IO 08:00:39.879 476 4294967291 kst: async disk IO 08:00:39.879 658 4294967291 db file sequential READ 08:00:39.879 500 4294967292 db file sequential READ 08:00:39.879 1969 981 cursor: pin S wait ON X 08:00:39.879 857 981 cursor: pin S wait ON X 08:00:39.879 812 168 cursor: pin S wait ON X 08:00:39.879 52 981 cursor: pin S wait ON X 08:00:39.879 981 73 library cache LOCK 08:00:39.879 168 73 library cache LOCK 08:00:39.879 1849 4294967291 SQL*Net message TO client 08:00:39.879 327 4294967291 SQL*Net message TO client 08:00:39.879 235 4294967291 rdbms ipc reply 08:00:39.879 19 4294967291 rdbms ipc reply 08:00:39.879 73 73 DFS LOCK handle 08:00:39.879 123 4294967294 enq: WF - contention 08:00:49.987 476 4294967291 kst: async disk IO 08:00:49.987 500 4294967292 db file sequential READ 08:00:49.987 382 4294967291 gc cr block busy 08:00:49.987 1969 981 cursor: pin S wait ON X 08:00:49.987 857 981 cursor: pin S wait ON X 08:00:49.987 812 168 cursor: pin S wait ON X 08:00:49.987 235 4294967295 kksfbc child completion 08:00:49.987 658 4294967295 latch: library cache 08:00:49.987 19 4294967291 latch: library cache 08:00:49.987 981 73 library cache LOCK 08:00:49.987 168 73 library cache LOCK 08:00:49.987 1889 4294967291 SQL*Net message FROM client 08:00:49.987 73 73 DFS LOCK handle 08:00:49.987 52 981 cursor: pin S wait ON X
可以判断,导致数据库实例HANG死的原因是因为SID为73的进程:
SELECT to_char(sample_time, 'hh24:mi:ss'), program, action, event_name FROM wrh$_active_session_history ash, wrh$_event_name en WHERE session_id = 73 AND ash.event_id = en.event_id(+) AND sample_time >= to_date('201307140550', 'yyyymmddhh24mi') AND sample_time <= to_date('201307140601', 'yyyymmddhh24mi') ORDER BY 1;
TO_CHAR( PROGRAM ACTION EVENT_NAME -------- ---------------------------------------- ------------------------------ --------------------------- 05:50:07 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:50:17 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:50:27 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:50:37 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:50:47 oracle@orcldb5 (J002) GATHER_STATS_JOB gc cr block 2-way 05:50:57 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:51:08 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:51:18 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:51:28 oracle@orcldb5 (J002) GATHER_STATS_JOB gc cr multi block request 05:51:38 oracle@orcldb5 (J002) GATHER_STATS_JOB direct path WRITE temp 05:51:48 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:51:58 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:08 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:18 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:52:28 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:52:38 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:49 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:52:59 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:53:09 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:53:19 oracle@orcldb5 (J002) GATHER_STATS_JOB db file scattered READ 05:53:29 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:53:39 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:53:49 oracle@orcldb5 (J002) GATHER_STATS_JOB gc CURRENT GRANT 2-way 05:53:59 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:09 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:20 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:30 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:40 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:54:50 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:00 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:10 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:55:20 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:30 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:40 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:55:51 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:01 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:11 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:21 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:31 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:41 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:56:51 oracle@orcldb5 (J002) GATHER_STATS_JOB gc CURRENT GRANT 2-way 05:57:01 oracle@orcldb5 (J002) GATHER_STATS_JOB library cache pin 05:57:11 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 05:57:21 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 05:57:32 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:57:42 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:57:52 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 05:58:02 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:58:12 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:58:22 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:58:32 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:58:42 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:58:52 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:03 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:13 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:23 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:33 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 05:59:43 oracle@orcldb5 (J002) GATHER_STATS_JOB db file sequential READ 05:59:53 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 06:00:03 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:13 oracle@orcldb5 (J002) GATHER_STATS_JOB latch: library cache 06:00:23 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:34 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:44 oracle@orcldb5 (J002) GATHER_STATS_JOB DFS LOCK handle 06:00:54 oracle@orcldb5 (J002) GATHER_STATS_JOB
根据MOS文档,Bug 6011045 – DBMS_STATS causes deadlock between ‘cursor: pin S wait on X’ and ‘library cache lock’ [ID 6011045.8],这个问题是RAC环境中Oracle收集统计信息的BUG,在自动收集数据字典信息时,可能会出现进程HANG死,并导致大量的library cache lock和cursor: pin S wait on X等待,此外还可能出现DFS lock handle以及row cache lock的等待。
这个BUG确认影响版本包括10.2.0.4和10.2.0.5,这个BUG Oracle在10.2.0.5.5 PSU中进行了修复。