环境:
rhel6.5 oracle 11.2.0.3.8 rac 两个节点
客户说db无法使用,需要dba介入处理。
dba登陆任一个节点的os,执行了如下的命令:
[oracle@hosta ~]$ sqlplus '/ as sysdba' SQL*Plus: Release 11.2.0.3.0 Production on Fri Nov 13 19:19:30 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> oradebug setorapname reco Oracle pid: 22, Unix process pid: 13200, image: oracle@hosta (RECO) SQL> oradebug unlimit Statement processed. SQL> oradebug -g all hanganalyze 3 Hang Analysis in /u02/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_13164.trc SQL> oradebug -g all hanganalyze 3 Hang Analysis in /u02/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_13164.trc SQL> oradebug -g all dump systemstate 266 Statement processed. SQL> oradebug -g all dump systemstate 266 Statement processed. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options [oracle@hosta ~]$ date Fri Nov 13 19:22:22 CST 2015 [oracle@hosta ~]$ --->注意:如上命令的来源: How to Collect Diagnostics for Database Hanging Issues (文档 ID 452358.1)
/u02/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_13164.trc的内容截取如下:
*** 2015-11-13 19:20:11.392 DIAG sends out oradebug command to nodes [0,1] *** 2015-11-13 19:20:11.454 **************************************************************** PORADEBUG BEGIN ORIGINATING INST:1 SERIAL:1 PID:24794 PORADATA COMMAND:hanganalyze 3 session (kjzddbx) switches to a new action PORADATA TRACEFILE /u02/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_13164.trc PORADEBUG END ORIGINATING INST:1 SERIAL:1 PID:24794 **************************************************************** *** 2015-11-13 19:20:12.060 =============================================================================== HANG ANALYSIS: instances (db_name.oracle_sid): orcl.orcl1, orcl.orcl2 oradebug_node_dump_level: 3 analysis initiated by oradebug os thread scheduling delay history: (sampling every 1.000000 secs) 0.000000 secs at [ 19:20:11 ] NOTE: scheduling delay has not been sampled for 0.494255 secs 0.000000 secs from [ 19:20:07 - 19:20:12 ], 5 sec avg 0.000000 secs from [ 19:19:12 - 19:20:12 ], 1 min avg 0.000124 secs from [ 19:15:12 - 19:20:12 ], 5 min avg =============================================================================== Chains most likely to have caused the hang: [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 1 Signature Hash: 0x38c48850 [b] Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 2 Signature Hash: 0x38c48850 [c] Chain 3 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 3 Signature Hash: 0x38c48850 =============================================================================== Non-intersecting chains: ------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (orcl.orcl1) os id: 24694 process id: 44, oracle@hosta session id: 10 session serial #: 193 } is waiting for 'enq: TX - row lock contention' with wait info: { p1: 'name|mode'=0x54580006 p2: 'usn<<16 | slot'=0x8000e p3: 'sequence'=0x452 time in wait: 6 min 10 sec timeout after: never wait id: 37 blocking: 0 sessions current sql: update scott.emp set ename=ename|| '1+1' where empno='7369' short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+1865<-kslwaitctx()+163<-kjusuc()+3619<-ksipgetctxi()+1759<-ksqcmi()+24096<-ksqgtlctx()+3488<-ksqgelctx()+569<-ktuGetTxForXid()+131<-ktcwit1()+313<-kdddgb()+8525<-kdusru()+461<-updrowFastPath()+1209<-qerupFetch()+2554<-updaul()+1309<-updThreePhaseExe()+324<-updexe()+443<-opiexe()+10085<-kpoal8()+2224<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103 wait history: * time between current wait and wait #1: 0.000239 sec 1. event: 'gc current block 2-way' time waited: 0.001155 sec wait id: 36 p1: ''=0x4 p2: ''=0x97 p3: ''=0x2000001 * time between wait #1 and #2: 0.000273 sec 2. event: 'Disk file operations I/O' time waited: 0.000010 sec wait id: 35 p1: 'FileOperation'=0x2 p2: 'fileno'=0x4 p3: 'filetype'=0x2 * time between wait #2 and #3: 0.002898 sec 3. event: 'SQL*Net message from client' time waited: 2.574032 sec wait id: 34 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } and is blocked by => Oracle session identified by: { instance: 1 (orcl.orcl1) os id: 18342 process id: 49, oracle@hosta session id: 137 session serial #: 2227 } which is waiting for 'SQL*Net message from client' with wait info: { p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 time in wait: 7 min 44 sec timeout after: never wait id: 877 blocking: 3 sessions current sql: <none> short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-read()+14<-nttfprd()+343<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+485<-opikndf2()+978<-opitsk()+847<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<-__libc_start_main()+253<-_start()+36 wait history: * time between current wait and wait #1: 0.000301 sec 1. event: 'SQL*Net message to client' time waited: 0.000016 sec wait id: 876 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.001011 sec 2. event: 'SQL*Net message from client' time waited: 269 min 8 sec wait id: 875 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #2 and #3: 0.000097 sec 3. event: 'SQL*Net message to client' time waited: 0.000006 sec wait id: 874 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 1 Signature Hash: 0x38c48850 ------------------------------------------------------------------------------- =============================================================================== Intersecting chains: ------------------------------------------------------------------------------- Chain 2: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 2 (orcl.orcl2) os id: 24814 process id: 44, oracle@hostb session id: 10 session serial #: 3041 } is waiting for 'enq: TX - row lock contention' with wait info: { p1: 'name|mode'=0x54580006 p2: 'usn<<16 | slot'=0x8000e p3: 'sequence'=0x452 time in wait: 343 min 6 sec timeout after: never wait id: 133 blocking: 0 sessions current sql: update scott.emp set ename=ename|| '2' where empno='7369' short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+1865<-kslwaitctx()+163<-kjusuc()+3619<-ksipgetctxi()+1759<-ksqcmi()+24096<-ksqgtlctx()+3488<-ksqgelctx()+569<-ktuGetTxForXid()+131<-ktcwit1()+313<-kdddgb()+8525<-kdusru()+461<-updrowFastPath()+1209<-qerupFetch()+2554<-updaul()+1309<-updThreePhaseExe()+324<-updexe()+443<-opiexe()+10085<-kpoal8()+2224<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103 wait history: * time between current wait and wait #1: 0.000407 sec 1. event: 'gc cr block 2-way' time waited: 0.002959 sec wait id: 132 p1: ''=0x3 p2: ''=0xf0 p3: ''=0x1f * time between wait #1 and #2: 0.000215 sec 2. event: 'Disk file operations I/O' time waited: 0.000011 sec wait id: 131 p1: 'FileOperation'=0x2 p2: 'fileno'=0x3 p3: 'filetype'=0x2 * time between wait #2 and #3: 0.000245 sec 3. event: 'gc current block 2-way' time waited: 0.001150 sec wait id: 130 p1: ''=0x4 p2: ''=0x97 p3: ''=0x2000001 } and is blocked by 'instance: 1, os id: 18342, session id: 137', which is a member of 'Chain 1'. Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 2 Signature Hash: 0x38c48850 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Chain 3: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 2 (orcl.orcl2) os id: 1112 process id: 50, oracle@hostb session id: 255 session serial #: 3581 } is waiting for 'enq: TX - row lock contention' with wait info: { p1: 'name|mode'=0x54580006 p2: 'usn<<16 | slot'=0x8000e p3: 'sequence'=0x452 time in wait: 7 min 12 sec timeout after: never wait id: 23 blocking: 0 sessions current sql: update scott.emp set ename=ename|| '2+1' where empno='7369' short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+1865<-kslwaitctx()+163<-kjusuc()+3619<-ksipgetctxi()+1759<-ksqcmi()+24096<-ksqgtlctx()+3488<-ksqgelctx()+569<-ktuGetTxForXid()+131<-ktcwit1()+313<-kdddgb()+8525<-kdusru()+461<-updrowFastPath()+1209<-qerupFetch()+2554<-updaul()+1309<-updThreePhaseExe()+324<-updexe()+443<-opiexe()+10085<-kpoal8()+2224<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570<-sou2o()+103 wait history: * time between current wait and wait #1: 0.005374 sec 1. event: 'SQL*Net message from client' time waited: 42.635240 sec wait id: 22 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000005 sec 2. event: 'SQL*Net message to client' time waited: 0.000005 sec wait id: 21 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #2 and #3: 0.000340 sec 3. event: 'SQL*Net message from client' time waited: 0.000568 sec wait id: 20 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } and is blocked by 'instance: 1, os id: 18342, session id: 137', which is a member of 'Chain 1'. Chain 3 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention' Chain 3 Signature Hash: 0x38c48850 ------------------------------------------------------------------------------- =============================================================================== Extra information that will be dumped at higher levels: [level 4] : 1 node dumps -- [LEAF] [LEAF_NW] [level 5] : 3 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW] State of ALL nodes-->可以看到,在执行“oradebug -g all hanganalyze 3”命令的节点的diag文件中,才有这个“State of ALL nodes”字样。 ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): [9]/1/10/193/0x95231978/24694/NLEAF/[136]---->nodenum=9的 session(即:1号实例上sid为10的session)的blocker(阻塞者)为nodenum=136的 session [136]/1/137/2227/0x95376400/18342/LEAF/------>nodenum=136的 session是blocker(阻塞者) [481]/2/10/3041/0x95231978/24814/NLEAF/[136]->nodenum=481的 session(即:2号实例上sid为10的session)的blocker(阻塞者)为nodenum=136的 session [726]/2/255/3581/0x94d0a760/1112/NLEAF/[136]->nodenum=726的 session(即:2号实例上sid为255的session)的blocker(阻塞者)为nodenum=136的 session *** 2015-11-13 19:20:12.061 =============================================================================== END OF HANG ANALYSIS =============================================================================== *** 2015-11-13 19:20:12.064 =============================================================================== HANG ANALYSIS DUMPS: oradebug_node_dump_level: 3 =============================================================================== State of LOCAL nodes ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): [9]/1/10/193/0x95231978/24694/NLEAF/[136] [136]/1/137/2227/0x95376400/18342/LEAF/ No processes qualify for dumping. =============================================================================== HANG ANALYSIS DUMPS: END =============================================================================== session (kjzddbx) switches to a new action PORADATA SERIAL:1 RINST:2 RDIAG:31036 PORADATA SERIAL:1 RINST:2 REMOTE EXEC OSPID:31036 PORADATA SERIAL:1 RINST:2 TRACEFILE:/u02/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_31036.trc PORADATA SERIAL:1 RINST:2 DONE
/u02/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_13164.trc的内容摘录完毕。
/u02/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_31036.trc的内容摘录如下:
*** 2015-11-13 19:20:11.423 **************************************************************** PORADEBUG BEGIN ORIGINATING INST:1 SERIAL:1 PID:24794 PORADATA COMMAND:hanganalyze 3 session (kjzddbx) switches to a new action PORADATA TRACEFILE /u02/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_31036.trc PORADEBUG END ORIGINATING INST:1 SERIAL:1 PID:24794 **************************************************************** *** 2015-11-13 19:20:12.064 =============================================================================== HANG ANALYSIS DUMPS: oradebug_node_dump_level: 3 =============================================================================== State of LOCAL nodes ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): [481]/2/10/3041/0x95231978/24814/NLEAF/[136]---->可以看到,sid为10的session的blocker(阻塞者)是 nodenum=136,而这个136不在State of LOCAL nodes中,这说明这个nodenum=136在对方实例上。 [726]/2/255/3581/0x94d0a760/1112/NLEAF/[136]---->可以看到,sid为255的session的blocker(阻塞者)是 nodenum=136,而这个136不在State of LOCAL nodes中,这说明这个nodenum=136在对方实力上。 ->上面的2是指2号实例,该编号从1开始。 No processes qualify for dumping. =============================================================================== HANG ANALYSIS DUMPS: END =============================================================================== session (kjzddbx) switches to a new action *** 2015-11-13 19:20:12.284 Oradebug command console output: <none>
/u02/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_31036.trc的内容摘录完毕。