数据库无法增删改,包括v$transaction视图无法查询,类似于HANG的状态,我首先我通过查询v$session_wait视图,情况如下:
SQL> select sid,event,p1,p2,p3,wait_time,seconds_in_wait,state from v$session_wait where wait_class <> 'Idle';
SID EVENT P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT STATE
---------- ----------------------------------------- ---------- ---------- ---------- ---------- --------------- -------------------
125 library cache lock 2130013560 2158412560 301 0 7850 WAITING
126 library cache lock 2130014088 2158399144 301 0 7850 WAITING
127 library cache lock 2130014088 2158495528 301 0 9231 WAITING
128 library cache lock 2130013560 2158380576 301 0 9231 WAITING
129 library cache lock 2130013560 2158307736 301 0 10611 WAITING
130 library cache lock 2130014088 2158498840 301 0 10611 WAITING
131 buffer busy waits 1 11170 1 0 11839 WAITING
132 library cache lock 2130013560 2158391432 301 0 11992 WAITING
133 log file switch (checkpoint incomplete) 0 0 0 0 12616 WAITING
136 log file switch (checkpoint incomplete) 0 0 0 0 12947 WAITING
138 enq: TX - row lock contention 1415053318 589854 665 0 13321 WAITING
139 buffer busy waits 2 9 17 0 12616 WAITING
141 enq: WF - contention 1464205318 0 0 0 1650 WAITING
144 enq: CI - contention 1128857606 1 5 0 15355 WAITING
150 log file switch (checkpoint incomplete) 0 0 0 0 12891 WAITING
159 switch logfile command 0 0 0 0 17051 WAITING
161 log file switch (checkpoint incomplete) 0 0 0 0 12715 WAITING
164 rdbms ipc reply 7 21457644 0 0 0 WAITING
18 rows selected
以上看到有三个log file switch (checkpoint incomplete)等待事件,意味着checkpoint没完成,于是查询v$log检查在线日志的情况:
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 113 52428800 1 NO CURRENT 8590086940 21-MAR-15 <<<<
CURRENT
2 1 111 52428800 1 YES ACTIVE 8590086619 21-MAR-15 <<<<
ACTIVE
3 1 112 52428800 1 YES ACTIVE 8590086938 21-MAR-15 <<<<
ACTIVE
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled <<<<<<
归模式档
Archive destination /home/oracle/arch
Oldest online log sequence 111
Next log sequence to archive 113
Current log sequence 113
SQL> !
[oracle@ora10g bdump]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 95G 14G 77G 15% / <<<<<<<<<<<<
磁盘空间足够,说明不是磁盘空间不够无法归档造成的
/dev/sda1 99M 12M 82M 13% /boot
tmpfs 1006M 0 1006M 0% /dev/shm
[oracle@ora10g bdump]$ ls -lrt /home/oracle/arch/
total 107176
-rw-r----- 1 oracle oinstall 952832 Mar 15 12:02 1_103_847657195.dbf
-rw-r----- 1 oracle oinstall 29585920 Mar 17 21:35 1_104_847657195.dbf
-rw-r----- 1 oracle oinstall 14306816 Mar 21 12:02 1_105_847657195.dbf
-rw-r----- 1 oracle oinstall 22298112 Mar 21 12:19 1_106_847657195.dbf
-rw-r----- 1 oracle oinstall 42112000 Mar 21 17:14 1_107_847657195.dbf
-rw-r----- 1 oracle oinstall 159232 Mar 21 17:20 1_108_847657195.dbf
-rw-r----- 1 oracle oinstall 1536 Mar 21 17:21 1_109_847657195.dbf
-rw-r----- 1 oracle oinstall 15360 Mar 21 17:24 1_110_847657195.dbf
-rw-r----- 1 oracle oinstall 148480 Mar 21 17:30 1_111_847657195.dbf
-rw-r----- 1 oracle oinstall 1024 Mar 21 17:30 1_112_847657195.dbf <<<<<<<<
自17:30时间以后不再有归档日志产生
[oracle@ora10g bdump]$ date
Sat Mar 21 22:24:09 CST 2015
在线日志切换之后没有并归档之前都是处于ACTIVE状态,该状态下无法被复用,没有在线日志组可用时候,数据库会挂起.
做了HANGANALYZE:
==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 : :
<0/
133
/2/0x83a69028/4790/log file switch (checkpoint inco>
-- <0/
129
/1/0x83a6afc8/4877/library cache lock>
-- <0/
125
/1/0x83a6cf68/4993/library cache lock>
-- <0/
128
/3/0x83a6b7b0/4931/library cache lock>
-- <0/
132
/3/0x83a69810/4815/library cache lock> <<<<<<<<<<<<sid为
132,128,125,129被sid为133的会话阻塞,而133又在等检查点完成
Chain 2 : :
<0/
164
/1/0x83a5f208/3048/
rdbms ipc reply
>
-- <0/
144
/47/0x83a660b8/4402/
enq: CI - contention
>
-- <0/
138
/137/0x83a68058/4757/enq: TX - row lock contention> <<<<<<
sid为138,144的会话被sid为164的会话阻塞,而144在等待的
enq: CI - contention却被
Other chains found:
Chain 3 : :
<0/123/6/0x83a6d750/5378/No Wait>
Chain 4 : :
<0/126/2/0x83a6c780/4991/library cache lock>
-- <0/127/1/0x83a6bf98/4933/library cache lock>
-- <0/130/1/0x83a6a7e0/4875/library cache lock>
Chain 5 : :
<0/131/1/0x83a69ff8/4821/buffer busy waits>
-- <0/126/2/0x83a6c780/4991/library cache lock>
-- <0/127/1/0x83a6bf98/4933/library cache lock>
-- <0/130/1/0x83a6a7e0/4875/library cache lock>
Chain 6 : :
<0/136/7/0x83a68840/4770/log file switch (checkpoint inco>
Chain 7 : :
<0/139/10/0x83a67870/4609/buffer busy waits>
Chain 8 : :
<0/141/6/0x83a668a0/4748/enq: WF - contention>
Chain 9 : :
<0/147/3/0x83a658d0/3079/Streams AQ: qmn slave idle wait>
Chain 10 : :
<0/148/4/0x83a650e8/3077/Streams AQ: waiting for time man>
Chain 11 : :
<0/150/166/0x83a64900/4752/log file switch (checkpoint inco>
Chain 12 : :
<0/152/1/0x83a64118/3068/Streams AQ: qmn coordinator idle>
Chain 13 : :
<0/159/3/0x83a62960/3062/switch logfile command>
Chain 14 : :
<0/161/1/0x83a609c0/3054/log file switch (checkpoint inco>
Extra information that will be dumped at higher levels:
[level 4] : 2 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level 5] : 8 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 6] : 12 node dumps -- [NLEAF]
[level 10] : 12 node dumps -- [IGN]
State of nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):
[122]/0/123/6/0x83b52750/5378/SINGLE_NODE_NW/1/2//none
[124]/0/125/1/0x83b55220/4993/NLEAF/3/8/[128][132]/127
[125]/0/126/2/0x83b56788/4991/NLEAF/9/12/[130]/126
[126]/0/127/1/0x83b57cf0/4933/NLEAF/13/14/[125][130]/129
[127]/0/128/3/0x83b59258/4931/NLEAF/15/16/[128][124][132]/131
[128]/0/129/1/0x83b5a7c0/4877/NLEAF/4/7/[132]/124
[129]/0/130/1/0x83b5bd28/4875/NLEAF/17/18/[125][126][130]/none
[130]/0/131/1/0x83b5d290/4821/NLEAF/10/11/[132]/125
[131]/0/132/3/0x83b5e7f8/4815/NLEAF/19/20/[128][124][127][132]/none
[
132
]/0/
133
/2/0x83b5fd60/
4790
/
LEAF
/5/6//128 <<<<<<<
sid为
133
的阻塞了sid为
128
的会话
[135]/0/136/7/0x83b63d98/4770/SINGLE_NODE/21/22//none
[137]/0/138/137/0x83b66868/4757/NLEAF/23/28/[143]/none
[138]/0/139/10/0x83b67dd0/4609/NLEAF/29/30/[132]/none
[139]/0/140/8/0x83b69338/4605/IGN/31/32//none
[140]/0/141/6/0x83b6a8a0/4748/NLEAF/33/34/[143]/none
[
143
]/0/
144
/47/0x83b6e8d8/
4402
/NLEAF/24/27/[163]/137
[146]/0/147/3/0x83b72910/3079/SINGLE_NODE/35/36//none
[147]/0/148/4/0x83b73e78/3077/SINGLE_NODE/37/38//none
[149]/0/150/166/0x83b76948/4752/SINGLE_NODE/39/40//none
[151]/0/152/1/0x83b79418/3068/SINGLE_NODE/41/42//none
[154]/0/155/1/0x83b7d450/3066/IGN/43/44//none
[155]/0/156/1/0x83b7e9b8/3064/IGN/45/46//none
[158]/0/159/3/0x83b829f0/3062/SINGLE_NODE/47/48//none
[159]/0/160/1/0x83b83f58/3056/IGN/49/50//none
[160]/0/161/1/0x83b854c0/3054/SINGLE_NODE/51/52//none
[161]/0/162/1/0x83b86a28/3052/IGN/53/54//none
[162]/0/163/1/0x83b87f90/3050/IGN/55/56//none
[163]/0/
164
/1/0x83b894f8/
3048
/
LEAF
/25/26//
143
<<<<<<<
sid为164的阻塞了sid为143的会话
[164]/0/165/1/0x83b8aa60/3046/IGN/57/58//none
[165]/0/166/1/0x83b8bfc8/3044/IGN/59/60//none
[166]/0/167/1/0x83b8d530/3042/IGN/61/62//none
[167]/0/168/1/0x83b8ea98/3040/IGN/63/64//none
[168]/0/169/1/0x83b90000/3038/IGN/65/66//none
[169]/0/170/1/0x83b91568/3036/IGN/67/68//none
做了SYSTEMSTATE DUMP:
用ass对SYSTEMSTATE处理以后:
System State 1 ~~~~~~~~~~~~~~~~ 1: 2: waiting for 'pmon timer' wait 3: waiting for 'rdbms ipc message' wait 4: waiting for 'rdbms ipc message' wait 5: waiting for 'rdbms ipc message' wait 6: waiting for 'rdbms ipc message' wait 7: waiting for 'rdbms ipc message' wait 8: waiting for 'rdbms ipc reply' wait 9: waiting for 'rdbms ipc message' wait 10: waiting for 'rdbms ipc message' wait 11: waiting for 'log file switch (checkpoint incomplete)' wait 12: waiting for 'rdbms ipc message' wait 13: 14: 15: waiting for 'switch logfile command' wait 16: waiting for 'rdbms ipc message' wait 17: waiting for 'rdbms ipc message' wait 18: waiting for 'Streams AQ: qmn coordinator idle wait' wait 19: waiting for 'log file switch (checkpoint incomplete)' wait Cmd: Insert 20: for 'Streams AQ: waiting for time management or cleanup tasks' wait 21: waiting for 'Streams AQ: qmn slave idle wait' wait 22: waiting for 'enq: CI - contention' [Enqueue CI-00000001-00000005] wait 23: waiting for 'enq: WF - contention' [Enqueue WF-00000000-00000000] wait Cmd: PL/SQL Execute 24: waiting for 'SQL*Net message from client' wait 25: waiting for 'buffer busy waits' (2,9,11) wait Cmd: Select 26: waiting for 'enq: TX - row lock contention'[Enqueue TX-0009001E-00000299] wait 27: waiting for 'log file switch (checkpoint incomplete)' wait Cmd: Insert 28: waiting for 'log file switch (checkpoint incomplete)' wait 29: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 30: waiting for 'buffer busy waits' (1,2ba2,1) wait 31: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait 32: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 33: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 34: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait 35: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait 36: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 37: last wait for 'ksdxexeotherwait' Blockers ~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource then it will be given in square brackets. Below is a summary of the waited upon resources, together with the holder of that resource. Notes: ~~~~~ o A process id of '???' implies that the holder was not found in the systemstate.
Resource Holder State Enqueue CI-00000001-00000005 8: waiting for 'rdbms ipc reply' Enqueue WF-00000000-00000000 22: 22: is waiting for 8: Enqueue TX-0009001E-00000299 22: 22: is waiting for 8: LOCK: handle=7ef56d78 28: waiting for 'log file switch (checkpoint incomplete)' LOCK: handle=7ef56f88 30: waiting for 'buffer busy waits' (1,2ba2,1)
Object Names ~~~~~~~~~~~~ Enqueue CI-00000001-00000005 Enqueue WF-00000000-00000000 Enqueue TX-0009001E-00000299 LOCK: handle=7ef56d78 TABL:EXFSYS.RLM$SCHDNEGACTION LOCK: handle=7ef56f88 TABL:EXFSYS.RLM$EVTCLEANUP
System State 2 ~~~~~~~~~~~~~~~~ 1: 2: waiting for 'pmon timer' wait 3: waiting for 'rdbms ipc message' wait 4: waiting for 'rdbms ipc message' wait 5: waiting for 'rdbms ipc message' wait 6: waiting for 'rdbms ipc message' wait 7: waiting for 'rdbms ipc message' wait 8: waiting for 'rdbms ipc reply' wait 9: waiting for 'rdbms ipc message' wait 10: waiting for 'rdbms ipc message' wait 11: waiting for 'log file switch (checkpoint incomplete)' wait 12: waiting for 'rdbms ipc message' wait 13: 14: 15: waiting for 'switch logfile command' wait 16: waiting for 'rdbms ipc message' wait 17: waiting for 'rdbms ipc message' wait 18: waiting for 'Streams AQ: qmn coordinator idle wait' wait 19: waiting for 'log file switch (checkpoint incomplete)' wait Cmd: Insert 20: for 'Streams AQ: waiting for time management or cleanup tasks' wait 21: waiting for 'Streams AQ: qmn slave idle wait' wait 22: waiting for 'enq: CI - contention' [Enqueue CI-00000001-00000005] wait 23: waiting for 'enq: WF - contention' [Enqueue WF-00000000-00000000] wait Cmd: PL/SQL Execute 24: waiting for 'SQL*Net message from client' wait 25: waiting for 'buffer busy waits' (2,9,11) wait Cmd: Select 26: waiting for 'enq: TX - row lock contention'[Enqueue TX-0009001E-00000299] wait 27: waiting for 'log file switch (checkpoint incomplete)' wait Cmd: Insert 28: waiting for 'log file switch (checkpoint incomplete)' wait 29: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 30: waiting for 'buffer busy waits' (1,2ba2,1) wait 31: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait 32: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 33: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 34: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait 35: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait 36: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait 37: last wait for 'ksdxexeotherwait' Blockers ~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource then it will be given in square brackets. Below is a summary of the waited upon resources, together with the holder of that resource. Notes: ~~~~~ o A process id of '???' implies that the holder was not found in the systemstate.
Resource Holder State Enqueue CI-00000001-00000005 8: waiting for 'rdbms ipc reply' Enqueue WF-00000000-00000000 22: 22: is waiting for 8: Enqueue TX-0009001E-00000299 22: 22: is waiting for 8: LOCK: handle=7ef56d78 28: waiting for 'log file switch (checkpoint incomplete)' LOCK: handle=7ef56f88 30: waiting for 'buffer busy waits' (1,2ba2,1)
Object Names ~~~~~~~~~~~~ Enqueue CI-00000001-00000005 Enqueue WF-00000000-00000000 Enqueue TX-0009001E-00000299 LOCK: handle=7ef56d78 TABL:EXFSYS.RLM$SCHDNEGACTION LOCK: handle=7ef56f88 TABL:EXFSYS.RLM$EVTCLEANUP
List of Processes That May Be Stuck ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8: waiting for 'rdbms ipc reply' wait 11: waiting for 'log file switch (checkpoint incomplete)' wait 15: waiting for 'switch logfile command' wait 18: waiting for 'Streams AQ: qmn coordinator idle wait' wait 19: waiting for 'log file switch (checkpoint incomplete)' wait 20: for 'Streams AQ: waiting for time management or cleanup tasks' wait 21: waiting for 'Streams AQ: qmn slave idle wait' wait 22: waiting for 'enq: CI - contention' wait 23: waiting for 'enq: WF - contention' wait 24: waiting for 'SQL*Net message from client' wait 25: waiting for 'buffer busy waits' (2,9,11) wait 26: waiting for 'enq: TX - row lock contention' wait 27: waiting for 'log file switch (checkpoint incomplete)' wait 28: waiting for 'log file switch (checkpoint incomplete)' wait 29: waiting for 'library cache lock' wait 30: waiting for 'buffer busy waits' (1,2ba2,1) wait 31: waiting for 'library cache lock' wait 32: waiting for 'library cache lock' wait 33: waiting for 'library cache lock' wait 34: waiting for 'library cache lock' wait 35: waiting for 'library cache lock' wait 36: waiting for 'library cache lock' wait |
SO: 0x83b5fd60, type: 4, owner: 0x83a69028, flag: INIT/-/-/0x00
(session) sid: 133 trans: 0x822d5790, creator: 0x83a69028, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-001C-00000002, short-term DID: 0000-0000-00000000
txn branch: (nil)
oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
service name: SYS$USERS
O/S info: user: oracle, term: UNKNOWN,
ospid: 4790, machine: ora10g
program: oracle@ora10g (
J001)
waiting for 'log file switch (checkpoint incomplete)' blocking sess=0x(nil) seq=1 wait_time=0 seconds since wait started=12130
=0, =0, =0
Dumping Session Wait History
for 'log file switch (
checkpoint incomplete)' count=1 wait_time=1107898
=0, =0, =0
for 'log file switch (
checkpoint incomplete)' count=1 wait_time=336
...
PROCESS 8:
----------------------------------------
SO: 0x83a5f208, type: 2, owner: (nil), flag: INIT/-/-/0x00
(process) Oracle pid=8, calls cur/top: 0x83ba9a28/0x83ba9a28, flag: (16) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 33
last post received-location: ksrpublish
last process to post me: 83a68840 1 0
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 83a5ea20 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 0x83aa67e8
O/S info: user: oracle, term: UNKNOWN, ospid: 3048
OSD pid info: Unix process
pid: 3048, image: oracle@ora10g (
SMON)
Short stack dump:
...
PROCESS 22:
----------------------------------------
SO: 0x83a660b8, type: 2, owner: (nil), flag: INIT/-/-/0x00
(process) Oracle pid=22, calls cur/top: 0x83bafe00/0x83bafb40, flag: (2) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 83a5e238 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 0x83aa67e8
O/S info: user: oracle, term: UNKNOWN, ospid:
4402
OSD pid info: Unix process pid: 4402, image: oracle@ora10g (
m000)
Short stack dump:
自此,把注意力转移至SMON进程,首先检查SMON的trace文件:
/u01/app/oracle/admin/ora10g/bdump/ora10g_smon_3048.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0
System name: Linux
Node name: ora10g
Release: 2.6.18-194.el5
Version: #1 SMP Tue Mar 16 21:52:39 EDT 2010
Machine: x86_64
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 3048, image: oracle@ora10g (
SMON)
*** 2015-03-21 17:34:17.611
*** SERVICE NAME:(SYS$BACKGROUND) 2015-03-21 17:34:17.611
*** SESSION ID:(164.1) 2015-03-21 17:34:17.611
Waited for detached process: CKPT for 300 seconds:
*** 2015-03-21 17:34:17.611
Dumping diagnostic information for CKPT:
OS pid = 3046
loadavg : 0.00 0.00 0.00
memory info: free memory = 0.00M
swap info: free = 0.00M alloc = 0.00M total = 0.00M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g
[Thread debugging using libthread_db enabled]
0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#1 0x0000000003be9b09 in sskgpwwait ()
#2 0x0000000003bccdf0 in skgpwwait ()
#3 0x0000000000855f4a in ksdxsus ()
#4 0x0000000000857118 in ksdxffrz ()
#5 0x0000000000853003 in ksdxcb ()
#6 0x0000000001ebd0cf in sspuser ()
#7
#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#9 0x0000000003be9b09 in sskgpwwait ()
#10 0x0000000003bccdf0 in skgpwwait ()
#11 0x0000000000798319 in kslwaitns_timed ()
#12 0x00000000008c3b1d in kskthbwt ()
#13 0x0000000000797e54 in kslwait ()
#14 0x00000000029f3f3b in ksarcv ()
#15 0x000000000082e8bf in ksbabs ()
#16 0x0000000000835822 in ksbrdp ()
#17 0x0000000002f4d840 in opirip ()
#18 0x000000000132b016 in opidrv ()
#19 0x0000000001eb3146 in sou2o ()
#20 0x0000000000723245 in opimai_real ()
#21 0x00000000007230fc in main ()
A debugging session is active.
Inferior 1 [process 3046] will be detached.
Quit anyway? (y or n) [answered Y; input not from terminal]
*** 2015-03-21 17:34:19.125
*** 2015-03-21 17:34:29.129
Waited for detached process: CKPT for 310 seconds:
*** 2015-03-21 17:34:29.129
Dumping diagnostic information for CKPT:
OS pid = 3046
loadavg : 0.07 0.02 0.00
memory info: free memory = 0.00M
swap info: free = 0.00M alloc = 0.00M total = 0.00M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g
[Thread debugging using libthread_db enabled]
0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#1 0x0000000003be9b09 in sskgpwwait ()
#2 0x0000000003bccdf0 in skgpwwait ()
#3 0x0000000000855f4a in ksdxsus ()
#4 0x0000000000857118 in ksdxffrz ()
#5 0x0000000000853003 in ksdxcb ()
#6 0x0000000001ebd0cf in sspuser ()
#7
#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#9 0x0000000003be9b09 in sskgpwwait ()
#10 0x0000000003bccdf0 in skgpwwait ()
#11 0x0000000000798319 in kslwaitns_timed ()
#12 0x00000000008c3b1d in kskthbwt ()
#13 0x0000000000797e54 in kslwait ()
#14 0x00000000029f3f3b in ksarcv ()
#15 0x000000000082e8bf in ksbabs ()
#16 0x0000000000835822 in ksbrdp ()
#17 0x0000000002f4d840 in opirip ()
#18 0x000000000132b016 in opidrv ()
#19 0x0000000001eb3146 in sou2o ()
#20 0x0000000000723245 in opimai_real ()
#21 0x00000000007230fc in main ()
A debugging session is active.
Inferior 1 [process 3046] will be detached.
Quit anyway? (y or n) [answered Y; input not from terminal]
*** 2015-03-21 17:34:29.895
*** 2015-03-21 17:34:39.899
Waited for detached process: CKPT for 320 seconds:
*** 2015-03-21 17:34:39.899
Dumping diagnostic information for CKPT:
OS pid = 3046
loadavg : 0.06 0.01 0.00
memory info: free memory = 0.00M
swap info: free = 0.00M alloc = 0.00M total = 0.00M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g
[Thread debugging using libthread_db enabled]
0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#1 0x0000000003be9b09 in sskgpwwait ()
#2 0x0000000003bccdf0 in skgpwwait ()
#3 0x0000000000855f4a in ksdxsus ()
#4 0x0000000000857118 in ksdxffrz ()
#5 0x0000000000853003 in ksdxcb ()
#6 0x0000000001ebd0cf in sspuser ()
#7
#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#9 0x0000000003be9b09 in sskgpwwait ()
#10 0x0000000003bccdf0 in skgpwwait ()
#11 0x0000000000798319 in kslwaitns_timed ()
#12 0x00000000008c3b1d in kskthbwt ()
#13 0x0000000000797e54 in kslwait ()
#14 0x00000000029f3f3b in ksarcv ()
#15 0x000000000082e8bf in ksbabs ()
#16 0x0000000000835822 in ksbrdp ()
#17 0x0000000002f4d840 in opirip ()
#18 0x000000000132b016 in opidrv ()
#19 0x0000000001eb3146 in sou2o ()
#20 0x0000000000723245 in opimai_real ()
#21 0x00000000007230fc in main ()
A debugging session is active.
Inferior 1 [process 3046] will be detached.
Quit anyway? (y or n) [answered Y; input not from terminal]
*** 2015-03-21 17:34:40.598
从日志文件看出,SMON进程在等待与CKPT进程通信,难道CKPT进程异常吗?再将注意力转移至CKPT,查看CKPT的trace文件:
*** 2015-03-21 17:26:15.972
*** SERVICE NAME:(SYS$BACKGROUND) 2015-03-21 17:26:15.972
*** SESSION ID:(165.1) 2015-03-21 17:26:15.972
Received ORADEBUG command 'suspend' from process Unix process pid: 3062, image: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<问题出在这里!!!!!
Received ORADEBUG command 'tracefile_name' from process Unix process pid: 3062, image:
自此,我们就明白了事情的缘由经过:
1. 由于CKPT进程被挂起,使得检查点无法完成、SMON进程长时间持有资源Enqueue CI-00000001-00000005,最后导致数据库HANG住;
2. 随着资源Enqueue CI-00000001-00000005被SMON进程持有,导致一系列等待事件library cache lock,enq: CI - contention,log file switch (checkpoint incomplete)的出现
重新把CKPT进程恢复:
SQL> oradebug setospid 3046
Oracle pid: 7, Unix process pid: 3046, image: oracle@ora10g (CKPT)
SQL> oradebug resume;
Statement processed.
SQL> select sid,event,p1,p2,p3,wait_time,seconds_in_wait,state from v$session_wait where wait_class <> 'Idle';
SID EVENT P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT STATE
---------- ---------------------------------------------------------------- ---------- ---------- ---------- ---------- --------------- -------------------
到此,数据库恢复正常
---------------------------------------------------
道行尚浅,欢迎拍砖。
转载请注明出处.