主机耗光资源Bugs 4612267

公司的两台rhl5.1+10.2.0.1 x86的数据库服务器接连两天遇到系统资源被耗光的问题,导致主机操作卡死数据库无法使用,查了下原来是bug。
症状:
当主机运行时间达198天或248天,cpu占用率就突然达到100%。此时操作系统命令可以执行,但Oracle的命令象lsnrctl、sqlplus、dbca等都会被hang住,不能执行。
另外的说法是主机运行天数是是24.8的倍数都有可能引发该bug,因为time()函数值为null,造成无限死循环,从而耗尽cpu。
根本原因是os的bug导致的。
另外可以strace -aef -o aaa.txt -p sqlplus的进程不停地调用times(NULL) 。

首先看了下系统资源:
top - 09:42:25 up 50 days,  1:09,  1 user,  load average: 160.96, 139.47, 106.07
Tasks: 334 total, 175 running, 159 sleeping,   0 stopped,   0 zombie
Cpu(s): 45.0%us, 55.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4014072k total,  3987848k used,    26224k free,    87124k buffers
Swap:  8193140k total,  1774948k used,  6418192k free,  3051132k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                          
 5145 oracle    25   0 2140m  75m  72m R    9  1.9   0:11.27 oracle                                                                                                                            
 5931 oracle    25   0 2139m  34m  31m R    9  0.9   0:12.20 oracle                                                                                                                            
10529 oracle    25   0 2140m  23m  21m R    9  0.6  34:10.17 oracle                                                                                                                            
10863 oracle    25   0 2139m  17m  16m R    9  0.4  36:11.37 oracle                                                                                                                            
11077 oracle    25   0 92324 6844 5872 R    9  0.2  35:18.66 oracle                                                                                                                            
11416 oracle    25   0 92324 6840 5868 R    9  0.2  32:35.48 oracle                                                                                                                            
17976 oracle    25   0 92328 6848 5872 R    9  0.2   0:40.30 oracle                                                                                                                            
18245 oracle    25   0 92324 6848 5872 R    9  0.2   0:08.50 oracle                                                                                                                            
 3784 oracle    25   0 2139m  12m  11m R    8  0.3  36:02.14 oracle                                                                                                                            
 3800 oracle    25   0 2140m 175m 173m R    8  4.5  36:15.87 oracle                                                                                                                            
 3803 oracle    25   0 2139m  17m  17m R    8  0.4  34:57.08 oracle                                                                                                                            
 3805 oracle    25   0 2140m  55m  55m R    8  1.4  37:03.20 oracle                                                                                                                            
 3809 oracle    25   0 2139m  23m  22m R    8  0.6  36:49.64 oracle                                                                                                                            
 3857 oracle    25   0 2140m 159m 158m R    8  4.1  36:37.39 oracle                                                                                                                            
 3883 oracle    25   0 2139m  11m  11m R    8  0.3  36:36.03 oracle                                                                                                                            
 5514 oracle    25   0 2139m  17m  16m R    8  0.5   0:11.94 oracle                                                                                                                            
 5516 oracle    25   0 2139m  26m  24m R    8  0.7   0:10.48 oracle                                                                                                                            
 5820 oracle    25   0 2140m  34m  32m R    8  0.9   0:12.19 oracle                                                                                                                            
 5929 oracle    25   0 2140m  40m  37m R    8  1.0   0:12.32 oracle                                                                                                                            
 5973 oracle    25   0 2139m  28m  26m R    8  0.7   0:12.22 oracle                                                                                                                            
 5975 oracle    25   0 2139m  33m  31m R    8  0.9   0:10.38 oracle                                                                                                                            
10136 oracle    25   0 2139m  16m  15m R    8  0.4  34:40.02 oracle                                                                                                                            
10138 oracle    25   0 2140m  21m  18m R    8  0.5  34:14.65 oracle                                                                                                                            
10279 oracle    25   0 2139m  16m  15m R    8  0.4  35:22.77 oracle                                                                                                                            
10558 oracle    25   0 2139m  16m  14m R    8  0.4  35:05.94 oracle                                                                                                                            
10870 oracle    25   0 2139m  17m  16m R    8  0.4  36:24.09 oracle                                                                                                                            
10880 oracle    25   0 2140m  21m  19m R    8  0.6  36:25.60 oracle                                                                                                                            
11422 oracle    25   0 92328 6844 5868 R    8  0.2  32:38.69 oracle                                                                                                                            
17763 oracle    25   0 92328 6844 5868 R    8  0.2   1:28.36 oracle                                                                                                                            
17765 oracle    25   0 92328 6848 5872 R    8  0.2   1:25.37 oracle                                                                                                                            
17907 oracle    25   0 92324 6848 5872 R    8  0.2   0:51.05 oracle                    
可以发现大量的oracle进程占用了主机的资源,从vmstat和free也可以看出:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
258  1 1774944  22984  73908 2994968    3    3    29    27    2    2  1  0 97  1  0
258  1 1774944  22984  73908 2995000    0    0     0     0  412  306 43 57  0  0  0
258  1 1774944  22984  73908 2995000    0    0     0     0  425  302 47 54  0  0  0
259  1 1774944  22116  73912 2995000    0    0     4     0  410  321 46 54  0  0  0
259  1 1774944  22116  73912 2995000    0    0     0    88  449  321 50 50  0  0  0
259  1 1774944  22116  73928 2994984    0    0     0   116  423  331 48 52  0  0  0
259  1 1774944  22116  73928 2994984    0    0     0     0  426  313 48 52  0  0  0
259  1 1774944  22116  73928 2995000    0    0     0     0  438  300 41 59  0  0  0
259  1 1774944  21620  73928 2995000    0    0     0     0  425  313 46 54  0  0  0
259  1 1774944  21620  73928 2995000    0    0     0     0  410  298 41 59  0  0  0

而且主机的运行时间是50天,再次印证了这个时间是吻合的:
 10:03:43 up 50 days,  1:30,  2 users,  load average: 275.42, 255.46, 202.70
 
解决办法是重启主机,或者打oracle的补丁4612267或者升级到10.2.0.4以上版本。
该bug的描述:
#-------------------------------------------------------------------------
# Interim Patch for Base Bugs: 4612267
#-------------------------------------------------------------------------
#
# DATE: Wed Oct 5 10:17:13 2005
# -------------------------------
# Platform Patch for : Linux x86
# Product Version # : 10.2.0.1
# Product Patched    : ORACORE
#
# Bugs Fixed by this patch:
# -------------------------
# 4612267:OCI CLIENT IS IN AN INFINITE LOOP WHEN MACHINE UPTIME HITS 248 DAYS
#-------------------------------------------------------------------------
知道问题原因之后就好办,找个时间upgrade到10.2.0.5就好了。
-The End-

你可能感兴趣的:(oracle,command,patch,bugs,数据库服务器)