2008/03/14
skate
相应的版本:
os: (cent os)Linux movo2u 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686 i386 GNU nux
db: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
4000开头的600错误大多数都和回滚段有关
今天早晨发现应用程序无法登录,返回给前端应用程序的错误是:
OERR: ORA-12519 TNS:no appropriate service handler found
这个错误前几天遇到过,推断是连接数太多,
select count(*) from v$process;
select * from v$process;
select name,value from v$parameter where name = 'processes';
select * from v$license
当我用下面的命令生成的命令:
select 'alter system kill session '''||s.SID||','||s.SERIAL#||''';'
from v$session s
where s.USERNAME='USERNAME';
删除用户会话,前端程序就不提示错误了.但由于程序的问题,这种删除会话只是暂时的,一会会话数量又上来了
有的时候我通过sql developer都登录不上去,而我用通过ssh登录服务器, 通过sqlplus登录,依旧提示
超出连接数,我于是和开发人员沟通,重启自应用服务.
我登录数据库:
我在查日志的时候发现:
alter.log:
Fri Mar 14 11:47:15 2008
Errors in file /home/oracle/admin/omovo/udump/omovo_ora_31788.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Fri Mar 14 11:47:23 2008
Errors in file /home/oracle/admin/omovo/udump/omovo_ora_31720.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Fri Mar 14 11:47:34 2008
/home/oracle/admin/omovo/udump/omovo_ora_31788.trc:
----------------------------------------------
*** 2008-03-14 11:32:45.614
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
update b_weblog set click_count=:1, updateTime=:2 where id=:3
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+27 call ksedst1() 0 ? 1 ?
ksedmp()+557 call ksedst() 0 ? 0 ? 0 ? 0 ? 0 ? 0 ?
ksfdmp()+19 call ksedmp() 3 ? BFFF07A4 ? ACC92A0 ?
C806D00 ? 3 ? C7C88D8 ?
kgeriv()+188 call 00000000 C806D00 ? 3 ?
kgesiv()+118 call kgeriv() C806D00 ? B72B11BC ? 1001 ?
0 ? BFFF0810 ?
ksesic0()+44 call kgesiv() C806D00 ? B72B11BC ? 1001 ?
0 ? BFFF0810 ? 1001 ? 0 ?
BFFF0810 ?
ktugti()+2510 call ksesic0() 1001 ? 0 ? 0 ? 0 ? 0 ? 0 ?
ktbgfi()+6705 call ktugti() 21BD602C ? BFFF0A90 ? 0 ?
BFFF0AAC ? 0 ? 0 ? 0 ? 0 ?
kdisdelete()+10297 call ktbgfi() BFFF1388 ? BFFF1440 ? 103 ?
0 ? BFFF143C ? 0 ? 45 ?
__PGOSF323_kdisnew_ call kdisdelete() B72B23E0 ? BFFF1FA0 ?
bseg_srch_cbk()+295 181898D ? 0 ?
ktspfpblk()+400 call 00000000 BFFF1FA0 ? BFFF19A8 ?
ktspfsrch()+469 call ktspfpblk() BFFF1678 ? 6 ? 1813E09 ?
34C2 ? 34C2 ? 30008 ?
ktspscan_bmb()+274 call ktspfsrch() BFFF1678 ? 1813E09 ? 0 ?
1813E09 ?
ktspgsp_cbk1()+451 call ktspscan_bmb() BFFF1678 ? B72B23E8 ? 1 ? 2 ?
1813B7A ? 0 ?
ktspgsp_cbk()+70 call ktspgsp_cbk1() B72B23E8 ? 0 ? 0 ? 2 ?
BFFF1FA0 ? 0 ?
kdisnew()+184 call ktspgsp_cbk() B72B23E8 ? 0 ? 0 ? 2 ?
BFFF1FA0 ? 0 ?
kdisnewle()+81 call kdisnew() B72B23D8 ? 181898D ?
BFFF1FA0 ? 2 ? 0 ? 0 ?
kdisle()+3884 call kdisnewle() B72B23D8 ? 181898D ?
BFFF1FA0 ? BFFF2084 ?
B72A8B90 ? C8 ?
kdiins0()+28307 call kdisle() 77F8C74C ? BFFF226C ?
BFFF2CE4 ? 1 ? 2 ? BFFFA190 ?
kdiins()+95 call kdiins0() 77F8C74C ? 0 ? 0 ? BFFFA190 ?
0 ? 0 ?
kauupd()+4591 call kdiins() 77F8C74C ? 0 ? 0 ? BFFFA190 ?
2 ? FF ? 0 ? 0 ?
updrow()+1328 call kauupd() B72C866C ? 77F8C784 ?
B72DC9E0 ? 0 ? 77F8D700 ? 5 ?
333E ? 77F8CB10 ? 18 ?
BFFFB0A4 ? BFFFB05C ?
77F8D958 ?
__PGOSF357_qerupRow call updrow() 77F8F360 ? 7FFF ? 0 ? 48 ?
Procedure()+62 77F8D220 ? BFFFB388 ?
qerupFetch()+510 call 00000000 77F8CEB0 ? 7FFF ?
updaul()+900 call 00000000 77F8CEB0 ? 0 ? 77F8CD44 ?
7FFF ?
updThreePhaseExe()+ call updaul() 77F8F360 ? BFFFB830 ? 0 ?
274 C80E614 ? BFFFB898 ?
AF7187F ?
updexe()+283 call updThreePhaseExe() 77F8F360 ? 0 ? B72DC9E0 ?
BFFFB8FC ? 77F8F360 ? 1 ?
BFFFB8FC ? 0 ?
opiexe()+4025 call updexe() 77F8F360 ? BFFFC16C ?
kpoal8()+2089 call opiexe() 49 ? 3 ? BFFFC4CC ?
opiodr()+985 call 00000000 5E ? 17 ? BFFFEDD0 ?
ttcpip()+1093 call 00000000 5E ? 17 ? BFFFEDD0 ? 0 ?
opitsk()+1031 call ttcpip() C80E500 ? 5E ? BFFFEDD0 ? 0 ?
BFFFEAAC ? BFFFEEE0 ?
opiino()+821 call opitsk() 0 ? 0 ?
opiodr()+985 call 00000000 3C ? 4 ? BFFFF9A0 ?
opidrv()+466 call opiodr() 3C ? 4 ? BFFFF9A0 ? 0 ?
sou2o()+91 call opidrv() 3C ? 4 ? BFFFF9A0 ?
opimai_real()+117 call sou2o() BFFFF984 ? 3C ? 4 ?
BFFFF9A0 ?
main()+111 call opimai_real() 2 ? BFFFF9D0 ?
__libc_start_main() call 00000000 2 ? BFFFFA94 ? BFFFFAA0 ?
+227 699B46 ? 7CBFF4 ? 0 ?
--------------------- Binary Stack Dump ---------------------
通过查看日志文件,问题是处在update b_weblog表有错误,我于是就用exp/imp这个表,结果问题解决,
日志不在报ora-600错误,应用模块一可以访问了
[oracle@skate-test ~]$ export NLS_LANG=AMERICAN_AMERICA.ZHS32GB18030
expdp skate/skate dumpfile=mo_vo_b_weblog.dmp tables=b_weblog
impdp skate/skate dumpfile=mo_vo_b_weblog.dmp tables=b_weblog remap_schema=mo_vo:mo_vo table_exists_action= replace
或
exp skate/skate tables=(b_weblog) file=/home/oracle/online_script/dmp/mo_vo_b_weblog_exp.dmp grants=y
imp skate/skate fromuser=mo_vo touser=mo_vo file=/home/oracle/online_script/dmp/mo_vo_b_weblog_exp.dmp ignore=y
这样以后,在查询alter.log日志,发现正常了,就是日志切换了
还有一种方法,就是删除掉坏的回滚段,以前我的库就有回滚段坏了,然后我就用下面的方法,但这次我没查到有错误的回滚段
参考文档: http://space.8see.com/467/viewspace-4194
查看回滚段:
select SEGMENT_NAME,STATUS from dba_rollback_segs;
Summary:
Problem: Series of Ora-00600 [4097] and Ora-00600 [25012] errors
related to undotablespace "UNDOTBS" :tsn# 33 and undo segment#10.
Ora-00600[4097] started on Fri Oct 27 01:33:04 2006
Ora-00600[25012] started on Fri Oct 27 01:33:34 2006
Ora-00600[4097]
~~~~~~~~~~~~~~~ .
The error Ora-00600[4097] started occuring on segment number 10
in tsn# 33-undotbs.
xid dump in the trace files shows higher wrap# for the slot than what is present in undo segment header for usn# 10.
From dp06_ora_7324.trc,
----------------------- .
xid: 0x000a.032.00028790
.
undo segment header 10 shows wrap# 0x282ef for slot 0x32. could not find reference to the xid "0x000a.032.00028790" in any of the data blocks dumped in trace file.
Call stack
===========
ksedmp kgeriv kgesiv ksesic0 ktugti ktsftcmove
ktsf_gsp kdtgsp kdtgrs kdtInsRow insrow insdrv
insexe opiexe opiall0 opikpr opiodr rpidrus
skgmstack rpidru rpiswu2 kprball kqlsrcchg kqlchg
kglfls ktcrcm ktdcmt k2lcom k2send xctctl
xctcom opiexe opiosq0 kpooprx kpoal8 opiodr
ttcpip opitsk opiino opiodr opidrv sou2o main start
dbv on all undo datafiles were clean.
They tried switching to a new undo tablespace UNDOTBS2. But still the errors continued.
After dropping the undo segment 10 using _smu_debug_mode, there are no Ora-00600[4097] errors seen.
However,once the load increased in database,"_SYSSMU10$ " got automatically created and then they see lots of Ora-00600[25012] errors.
After dropping undo segment 10, they are still using UNDOTBS as active undo tablespace.
Ora-00600[25012]
~~~~~~~~~~~~~~~ .
The index block had an open itl which was referencing undo segment 10 and the uba was referring to datafile 583 which is not part of UNDOTBS.
Call stack (from dp06_ora_29049.trc)
===========
ksedmp kgeriv kgesiv ksesic2 kftr2ah kftr2bz
kturbk ktrgcm ktrgtc kdiixs1 qerixFetch qertbFetchByRowID
qersoFetch opifch2 opiall0 kpoal8 opiodr ttcpip
opitsk opiino opiodr opidrv sou2o main sta
From trace files,the symptoms are index blocks have open itls referencing undo segment number 10 and dba pointing to file 583 which is not part of UNDOTBS.
Hence the CR generation fails with Ora-00600 [25012].
This error occurs from different jobs and on different objects.
Dropping and recreating indexes solves the problem.
dbv does not report any problem on the affected index datafiles.
附:如何drop回滚段_SYSSMU10$
1、做之前先做一个全库备份
2、修改pfile:
*.undo_management='MANUAL';
_offline_rollback_segments=(_SYSSMU1$,_SYSSMU2$,_SYSSMU3$,_SYSSMU4$,_SYSSMU5$,_SYSSMU6$,_SYSSMU7$,_SYSSMU8$,_SYSSMU9$,_SYSSMU10$)
3、以pfile启动然后执行下述语句:
DROP ROLLBACK SEGMENT "_SYSSMU10$";
4、将pfile中隐含参数_offline_rollback_segments去掉后再重启数据库。