客户打电话告之数据库再次出现挂起现象,于是部署oswatcher在其主机上运行,oswatcher确实是好工具,强烈推荐。
操作系统平台为
引用
$ uname -a
HP-UX hpuxa B.11.23 U ia64 4101409079 unlimited-user license
数据库版本为
引用
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for HPUX: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
用sysdump,并用ass.awk格式化trace文件,发现数据库一切正常
引用
[root@mcprod udump]# awk -f ass109.awk radinfo.trc
Starting Systemstate 1
...................................................................
Ass.Awk Version 1.0.9 - Processing radinfo.trc
System State 1
~~~~~~~~~~~~~~~~
1:
2: waiting for 'pmon timer' wait
3: waiting for 'rdbms ipc message' wait
4: waiting for 'rdbms ipc message' wait
5: last wait for 'rdbms ipc message'
6: waiting for 'rdbms ipc message' wait
7: waiting for 'rdbms ipc message' wait
8: last wait for 'rdbms ipc message'
9: waiting for 'log file parallel write' wait
10: waiting for 'rdbms ipc message' wait
11: waiting for 'smon timer' wait
12: waiting for 'rdbms ipc message' wait
13: waiting for 'rdbms ipc message' wait
14: waiting for 'rdbms ipc message' wait
15: waiting for 'rdbms ipc message' wait
16: for 'Streams AQ: waiting for time management or cleanup tasks' wait
17: waiting for 'Streams AQ: qmn coordinator idle wait' wait
18: waiting for 'db file scattered read' (7,d0279,10) wait
19: waiting for 'Streams AQ: qmn slave idle wait' wait
20: waiting for 'db file sequential read' (3,7e33,1) wait
21: last wait for 'ksdxexeotherwait'
22: waiting for 'SQL*Net message from client' wait
23: waiting for 'SQL*Net message from client' wait
24: waiting for 'db file sequential read' (9,1,1) wait
25: waiting for 'SQL*Net message from client' wait
29: waiting for 'SQL*Net message from client' wait
31: waiting for 'SQL*Net message from client' wait
32: waiting for 'SQL*Net message from client' wait
33: waiting for 'SQL*Net message from client' wait
35: waiting for 'SQL*Net message from client' wait
36: last wait for 'SQL*Net message to client'
Cmd: Select
37: waiting for 'SQL*Net message from client' wait
38: last wait for 'SQL*Net more data from client'
Cmd: Update
40: waiting for 'SQL*Net message from client' wait
42: waiting for 'SQL*Net message from client' wait
43: waiting for 'SQL*Net message from client' wait
44: waiting for 'SQL*Net message from client' wait
45: waiting for 'SQL*Net message from client' wait
46: waiting for 'SQL*Net message from client' wait
47: waiting for 'SQL*Net message from client' wait
48: last wait for 'SQL*Net message to client'
Cmd: Select
49: waiting for 'SQL*Net message from client' wait
50: waiting for 'SQL*Net message from client' wait
51: waiting for 'SQL*Net message to client' wait
Cmd: Select
52: waiting for 'SQL*Net message from client' wait
53: waiting for 'SQL*Net message from client' wait
54: waiting for 'SQL*Net message from client' wait
55: waiting for 'SQL*Net message from client' wait
56: waiting for 'SQL*Net message from client' wait
57: waiting for 'SQL*Net message from client' wait
60: waiting for 'SQL*Net message from client' wait
64: waiting for 'SQL*Net message from client' wait
65: waiting for 'SQL*Net message from client' wait
66: waiting for 'SQL*Net message from client' wait
73: waiting for 'SQL*Net message from client' wait
74: waiting for 'SQL*Net message from client' wait
78: waiting for 'SQL*Net message from client' wait
83: waiting for 'SQL*Net message from client' wait
100:waiting for 'SQL*Net message from client' wait
109:waiting for 'SQL*Net message from client' wait
112:waiting for 'SQL*Net message from client' wait
113:waiting for 'SQL*Net message from client' wait
116:waiting for 'SQL*Net message from client' wait
117:waiting for 'SQL*Net message from client' wait
119:waiting for 'SQL*Net message from client' wait
147:waiting for 'SQL*Net message from client' wait
148:waiting for 'SQL*Net message from client' wait
NO BLOCKING PROCESSES FOUND
61789 Lines Processed.
经过一系列确认,系统挂起时系统资源一切正常。
引用
zzz ***Mon Mar 15 17:01:38 EAT 2010
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
1 0 0 416503 35084 150 8 0 0 0 0 1 1807 9128 365 3 1 96
1 0 0 416503 33641 674 37 0 0 0 0 0 1440 18941 347 2 1 97
1 0 0 416503 35668 540 29 0 0 0 0 0 1410 16083 322 1 0 99
zzz ***Mon Mar 15 17:02:08 EAT 2010
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
3 1 0 425983 27232 150 8 0 0 0 0 1 1807 9128 365 3 1 96
3 1 0 425983 26154 693 29 0 0 0 0 0 1715 21596 462 6 4 90
3 1 0 425983 27765 727 23 0 0 0 0 0 1725 22723 471 3 1 96
zzz ***Mon Mar 15 17:02:38 EAT 2010
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
1 0 0 397676 28238 150 8 0 0 0 0 1 1807 9128 365 3 1 96
1 0 0 397676 28772 317 38 0 0 0 0 0 1223 9243 218 5 2 93
1 0 0 397676 30018 340 30 0 0 0 0 0 1225 9961 211 4 2 94
观察oswatcher一栏oswps中发现listener竟然派生出一个子进程
引用
1401 S oracle
7054 2286 0 154 20 e00000012bfc34c0 588 e00000012a764d00 17:05:03 ? 0:00 /oracle/product/10.2.0/db_1/bin/tnslsnr LISTENER -inherit
1401 R oracle
2286 1 0 152 20 e00000013f3984c0 588 - Feb 23 ? 28:08 /oracle/product/10.2.0/db_1/bin/tnslsnr LISTENER -inherit
经查是Oracle bug所致。
解决方法如下
1、在监听文件里添加SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF
引用
$ more listener.ora
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = RADINFO)
(ORACLE_HOME = /oracle/product/10.2.0/db_1)
(GLOBAL_DBNAME = RADINFO)
)
)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = hpuxa)(PORT = 1521))
)
)
SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF
2、或者在$ORACLE_HOME/opmn/conf目录下将ons.config文件移除
引用
$ pwd
/oracle/product/10.2.0/db_1/opmn/conf
$ ls -rtl
total 6
-rw-r--r-- 1 oracle dba 71 Feb 21 2006 ons.config.tmp
-rw------- 1 oracle dba 44 Feb 28 2007 ons.config.backup.10203
-rw------- 1 oracle dba 66 Mar 22 15:11 ons.config.bak
详见metalink。340091.1,现主要摘录如下
Solution
引用
Bug 4518443 is fixed in 10.2.0.3
- OR -
Apply Patch 4518443 for the problem (if a patch is available)
- OR -
As a workaround, the following parameter can be added to listener.ora
SUBSCRIBE_FOR_NODE_DOWN_EVENT_<listener_name>=OFF
Where <listener_name> should be replaced with the actual listener name configured in the LISTENER.ORA file.
For example, if the listener name is LISTENER (default), the parameter would be:
SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF
This will prevent the listener from registering against ONS (Oracle Notification Services), which is the area affected by bug:4518443. For more information on ONS, please refer to eg. the Oracle10g Release 2 documentation ("Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide").
Please note, adding SUBSCRIBE_FOR_NODE_DOWN_EVENT_<listener_name> to listener.ora file on RAC, will mean that FAN (fast application notification) will not be possible. See Note 220970.1 RAC: Frequently Asked Questions for further information on FAN