问题发生在一个用户上,使用这个用户登录需要等待很长时间,而使用其他的用户登录则不存在问题。
Alter日志文件一直报超时的问题。
Fatal NI connect error 12170. VERSION INFORMATION: TNS for Solaris: Version 11.2.0.3.0 - Production Oracle Bequeath NT Protocol Adapter for Solaris: Version 11.2.0.3.0 - Production TCP/IP NT Protocol Adapter for Solaris: Version 11.2.0.3.0 - Production Time: 15-MAR-2013 12:54:09 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12606 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.38.47.80)(PORT=60998)) WARNING: inbound connection timed out (ORA-3136)
首先检查了DBA_PROFILES,确认和密码以及登录有关的PROFILE是否存在限制,当前数据库已经都设置为UNLIMITED,那么问题应该和PROFILE无关。
检查出现问题的用户,也未发现任何特别之处。
在sqlplus上使用这个用户登录,经历了将近5分钟左右的等待,终于成功登录。同时检查到会话当时出现library cache lock等待事件。
想查看数据库中看下问题
SELECT sid, username, event, p1text, p1, p2text, p2, p3text, p3, seconds_in_wait 2 FROM gv$session 3 WHERE event = 'library cache lock'; SID USERNAME EVENT SECONDS_IN_WAIT ----------- ------------------------------ ------------------ --------------- 19 library cache LOCK 21 2017 library cache LOCK 20 2062 library cache LOCK 0 2147 library cache LOCK 10 2210 library cache LOCK
数据库的username都是null的,如果会话没有登录,则没有办法检查会话执行了哪些操作,Oracle的文档中也没有提到过,会话登录之前会进行哪些操作,经历哪些等待。
没有username说明用户没有登陆成功,为啥没有登陆成功的会出现libarary cache lock呢?
联系到alter日志里的超时问题,是什么东西在哪里一直连接?会不会和这个有关系呢?
网上搜索了一下,11g有个新的特性,就是密码延时验证。在11g中,Oracle新增密码错误后延迟验证的功能,这使得通过程序来破解密码变得更加的困难。
$ sqlplus /nolog SQL*Plus: Release 11.2.0.1.0 Production on Copyright (c) 1982, 2009, Oracle. All rights reserved. SQL> set time on 08:28:11 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied 08:28:12 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied 08:28:13 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied 08:28:14 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied 08:28:16 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied 08:28:18 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied 08:28:22 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied 08:28:27 SQL> conn test/a ERROR: ORA-01017: invalid username/password; logon denied
解决办法:
为了验证是否是猜测的这种密码延时验证,设置FAILED_LOGIN_ATTEMPTS的值为3,过一会用相应的用户登陆,如果显示用户被锁则证明就是这个问题。
g4as8031:11g:d0fls > sqlplus flsdata/pa888888 SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 14 20:37:28 2013 Copyright (c) 1982, 2011, Oracle. All rights reserved. ERROR: ORA-28000: the account is locked Enter user-name: ^C g4as8031:11g:d0fls > g4as8031:11g:d0fls > sqlplus / as sysdba SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 14 20:37:44 2013 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> alter user flsdata account unlock; User altered. SQL> quit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options g4as8031:11g:d0fls > sqlplus flsdata/pa888888 SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 14 20:38:09 2013 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> quit
这样果然登陆很快了,和猜测的一样。有个应用一直在连接数据库,为了防止这个应用继续连接,所以修改了listener的端口号,让应用连接不上数据库。
到此问题解决。
分析同时,SAM有将此问题在METALINK提交SR, Oracle consultant给出解决方案如下:
There seems to be an undetected deadlock between the processes 1252 and 1632 ( waiting for row cache and waiting for library cache pin) and blockign each other.Teh row cache waited for was dc_histograms_def
The 9i is now obsolete, but there was a bug BUG:7253029 - DATABASE HANG "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK" DC_HISTOGRAM_DEFS
For the same issue seen in 9.2.0.8
Solution
-kill th blocker
OR
- Upgrade to 10.2.0.4 or later
- Try the workarounds mentioned in Bug 7253029:
1) Increase shared_pool_size
2) set "_row_cache_cursors " = 20 in init.ora and restart
3) set timed_statistics = FALSE
3.最终解决方案:
总结以上解决方案,我们将于下周某个时间做如下动作:
1) Increase shared_pool_size
2) set "_row_cache_cursors " = 20 in init.ora and restart