11.2.0.4ADG备库遇到ORA-01555/cursor: pin S wait on X问题分析

在一个11204版本ADG备库(业务软件是ORACLE EBS)环境中,用户反映查询一个视图时直接报错ORA-01555,语句并没有开始执行; 登陆DG数据库查看ALERT日志有大量的执行几秒的语句报错ORA-01555的语句,做了一个hanganalyze分析可以发现是有一个业务会话阻塞了其它会话,导致其它会话等待cursor: pin S wait on X、library cache lock,KILL此会话后查询恢复正常;对于此会话执行的SQL为查询,为何阻塞了其它会话就没有排查到了。 从MOS上查看,也可能是由于ADG的BUG导致,如Bug 10018789 - Spin in kgllock / DB hang with high library cache lock waits on ADG (文档 ID 10018789.8),但是此问题在11203版本已经修复,目前使用的是11204版本;MOS上也有一些文档标题是修复了Bug 10018789还有问题的,此处没法进一步验证,就只处理了此问题,后续再观察了。

处理过程相关信息: 1.数据库状态及日志

STATUS       INSTANCE_NAME    START_TIME
------------ ---------------- --------------------
OPEN         PROD             2018/06/28 09:37:45
OPEN_MODE            NAME      DB_UNIQUE_NA DATABASE_ROLE          DBID
-------------------- --------- ------------ ---------------- ----------
LANG                       CURRT_TIME
-------------------------- --------------------
READ ONLY WITH APPLY PROD      PRODSTD1     PHYSICAL STANDBY  342495180
AMERICAN_AMERICA.AL32UTF8  2018/07/02 14:55:53

$ tail -n 100 alert_PROD.log 
ORA-01555 caused by SQL statement below (SQL ID: 58161b809rkhq, Query Duration=3 sec, SCN: 0x000d.d2928afb):
select count(*) from AALINES_ALL where LAST_UPDATE_DATE > to_date('2018-07
…………
Mon Jul 02 14:42:15 2018
ORA-01555 caused by SQL statement below (SQL ID: 6a4dr5y4kbsmj, Query Duration=7 sec, SCN: 0x000d.d2928afb):
select count(*), from AAALL where DATE > to_date('2018-07-02 10:01

2.通过hanganalyze分析问题找到blocker

SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /u02/prod/db^/trace/PROD_ora_11469346.trc

查看hanganalyze TRACE文件如下:
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: <='cursor: pin S wait on X'
     Chain 1 Signature Hash: 0x3a7b30c
 [b] Chain 2 Signature: <='library cache lock'
     Chain 2 Signature Hash: 0x24734cf
 [c] Chain 3 Signature: <='cursor: pin S wait on X'
     Chain 3 Signature Hash: 0x3a7b30c
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (prodstd1.prod)
                   os id: 13959918
              process id: 60,
              session id: 5
        session serial #: 91
    }
    is waiting for 'cursor: pin S wait on X' with wait info:
    {
                      p1: 'idn'=0xb9295c99
                      p2: 'value'=0x8da00000000
                      p3: 'where'=0x500000000
            time in wait: 6 min 13 sec
      heur. time in wait: 17 min 9 sec
           timeout after: never
                 wait id: 7
                blocking: 0 sessions  
                blocking: 0 sessions
             current sql: select count(*) from …………ITEMTBL where DATE > to_date('2018-01-1
             short stack: ksedsts…………<-sou2o()+13
            wait history:
              * time between current wait and wait #1: 0.000012 sec
              1.       event: 'cursor: pin S wait on X'
                 time waited: 10 min 56 sec
                     wait id: 6               p1: 'idn'=0xb9295c99
                                              p2: 'value'=0x8da00000000
                                              p3: 'where'=0x500000000
           …………
    }
    and is blocked by  <<<<<<-------------
 => Oracle session identified by:
    {
                instance: 1 (prodstd1.prod)
                   os id: 2949500 <<<<<<<----------------  
              process id: 46, 
              session id: 2266   <<<<<<<<<------------------
        session serial #: 3615
    }
    which is not in a wait:
    {
               last wait: 305 min 58 sec ago
                blocking: 5 sessions
             current sql: select count(*) from TBL where DATE > 
             short stack: ksedsts()**-opiosq0()+2064<-kpoop
            wait history:
              1.       event: 'SQL*Net message from client'
                 time waited: 0.001825 sec
                     wait id: 5               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000004 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000000 sec
                     wait id: 4               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000051 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 0.000979 sec
                     wait id: 3               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
    }
 
Chain 1 Signature: <='cursor: pin S wait on X'
Chain 1 Signature Hash: 0x3a7b30c

TRACE结尾的列表分析:
State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[1]/1/2/3/700010751bf98a8/13369432/SINGLE_NODE/
[4]/1/5/91/70001074dbafa18/13959918/NLEAF/[2265] --NLEAF 表明在等待,此时adjlist列会阻塞会话的[nodenum]值;
[379]/1/380/1/700010771c8c9c0/10747992/SINGLE_NODE/
[756]/1/757/1/700010759e02818/2753006/SINGLE_NODE/
[1134]/1/1135/1/700010755e6e138/9437520/SINGLE_NODE/
[1509]/1/1510/3/700010759fcfc38/12321708/SINGLE_NODE/
[1511]/1/1512/1/700010771f3ef80/8651010/SINGLE_NODE/
[1888]/1/1889/1/70001075a0b4dd8/12584132/SINGLE_NODE/
[2263]/1/2264/1/70001074e117678/12714402/SINGLE_NODE/
[2265]/1/2266/3615/70001075215e428/2949500/LEAF_NW/  ---LEAF_NW表示阻塞源头,SID2266
[2641]/1/2642/1/70001075a2821f8/16253402/SINGLE_NODE/
[2643]/1/2644/3177/7000107721f1540/8323676/SINGLE_NODE_NW/
[3017]/1/3018/1/7000107722d97c0/19595340/SINGLE_NODE/
[3394]/1/3395/1/70001075a44f618/11207384/SINGLE_NODE/
[3771]/1/3772/1/7000107524f8c68/15139302/SINGLE_NODE/
[4148]/1/4149/1/70001074e597058/11010422/SINGLE_NODE/
[4524]/1/4525/1/7000107526c6088/16974384/NLEAF/[2265] -------
[4525]/1/4526/1/700010756688358/16646406/SINGLE_NODE/
[4902]/1/4903/1/7000107727591a0/16318956/SINGLE_NODE/
[5279]/1/5280/1/70001075a8ceff8/12387402/SINGLE_NODE/
[5281]/1/5282/547/70001077283e340/3015824/NLEAF/[2265] ---------
[5656]/1/5657/1/700010752978648/15401432/SINGLE_NODE/
[6034]/1/6035/1/700010772a0b760/15270516/SINGLE_NODE/
[6411]/1/6412/1/70001075ab815b8/19071030/SINGLE_NODE/
[6413]/1/6414/411/700010772af0900/11600240/NLEAF/[2265]  ---------
[6787]/1/6788/9/700010772bd8b80/19398714/SINGLE_NODE/
[6789]/1/6790/933/700010756beced8/18219140/NLEAF/[2265] ----------
[7164]/1/7165/1/70001075ad4e9d8/14614642/SINGLE_NODE/

你可能感兴趣的:(ORACLE,故障排查)