转自:http://blog.itpub.net/195110/viewspace-688790/
下午接到维护发来的邮件,大体内存如下:
3月7号晚10点左右,资源管理系统数据库 gxdb1 发生大量锁等待,而且出现多个阻塞锁。活动session数达到350左右, 等待事件为 cursor : pin s wait on X 。经杀掉相关session后系统恢复正常。
希望找出问题发生的原因。
下面记录查找问题的过程:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production
SQL>
由于hist_active_sess_history表的数据比较大,把'2011030717' ~ '2011030722' 这个时间段里的session数据提取出来
SQL> create table temp_hist_active_sess_history nologging as
2 select * from dba_hist_active_sess_history
3 where to_char(sample_time, 'yyyymmddhh24') between '2011030717' and '2011030722';
alter table temp_hist_active_sess_history add record_time varchar2(50);
update temp_hist_active_sess_history set record_time = to_char(sample_time,'yyyy-mm-dd hh24:mi:ss');
commit;
...省略建立index的过程
SQL>col p2 format 9999999999999999
SQL> select distinct session_id,session_serial#,p1,p2
2 from temp_hist_active_sess_history
3 where event ='cursor: pin S wait on X'
4 and rownum<=50;
SESSION_ID SESSION_SERIAL# P1 P2
---------- --------------- ---------- -----------------
2861 52370 668057193 19387482374144
4173 62920 668057193 19155554140160
517 28649 3522657913 21333102559232
533 16512 3522657913 21333102559232
606 39602 3522657913 21333102559232
626 14637 3522657913 21333102559232
855 53933 3522657913 21333102559232
1062 46600 3522657913 21333102559232
1249 63282 3522657913 21333102559232
528 35867 3522657913 21333102559232
691 41082 3522657913 21333102559232
SESSION_ID SESSION_SERIAL# P1 P2
---------- --------------- ---------- -----------------
976 56788 3522657913 21333102559232
1196 41756 3522657913 21333102559232
1514 45826 3522657913 21333102559232
1525 16156 3522657913 21333102559232
1633 9783 3522657913 21333102559232
1677 52748 3522657913 21333102559232
933 13220 3522657913 21333102559232
942 17109 3522657913 21333102559232
979 33487 3522657913 21333102559232
1341 14797 3522657913 21333102559232
1398 15817 3522657913 21333102559232
SESSION_ID SESSION_SERIAL# P1 P2
---------- --------------- ---------- -----------------
1441 10065 3522657913 21333102559232
1566 7846 3522657913 21333102559232
1667 62940 3522657913 21333102559232
3794 41447 668057193 19155554140160
549 36165 3522657913 21333102559232
631 9934 3522657913 21333102559232
812 42615 3522657913 21333102559232
934 18908 3522657913 21333102559232
1365 5812 3522657913 21333102559232
1491 41510 3522657913 21333102559232
1539 9132 3522657913 21333102559232
SESSION_ID SESSION_SERIAL# P1 P2
---------- --------------- ---------- -----------------
851 35281 3522657913 21333102559232
1597 48481 3522657913 21333102559232
2824 37947 668057193 19155554140160
2961 17626 668057193 19155554140160
660 30581 3522657913 21333102559232
472 6095 3522657913 21333102559232
881 54009 3522657913 21333102559232
889 14408 3522657913 21333102559232
1098 36953 3522657913 21333102559232
1675 4908 3522657913 21333102559232
1720 46180 3522657913 21333102559232
SESSION_ID SESSION_SERIAL# P1 P2
---------- --------------- ---------- -----------------
640 61791 3522657913 21333102559232
817 13269 3522657913 21333102559232
951 20788 3522657913 21333102559232
970 57853 3522657913 21333102559232
1423 22083 3522657913 21333102559232
1709 10839 3522657913 21333102559232
50 rows selected.
我们来分析p2=21333102559232,前两个字节数据为 4967
SQL> select session_id,
2 session_serial#,
3 event,
4 blocking_session,
5 blocking_session_serial#
6 from temp_hist_active_sess_history
7 where session_id = 4967
8 order by record_time;
SESSION_ID SESSION_SERIAL# EVENT BLOCKING_SESSION BLOCKING_SESSION_SERIAL#
---------- --------------- ---------------------------------------------------------------- ---------------- ------------------------
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
4967 12169 library cache pin 2704 36420
.....
SQL> select session_id,
2 session_serial#,
3 event,
4 blocking_session,
5 blocking_session_serial#
6 from temp_hist_active_sess_history
7 where session_id = 2704
8 and session_serial# = 36420;
SESSION_ID SESSION_SERIAL# EVENT BLOCKING_SESSION BLOCKING_SESSION_SERIAL#
---------- --------------- ---------------------------------------------------------------- ---------------- ------------------------
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
2704 36420 enq: TX - row lock contention 3537 32344
这样循环找下去,得到如下
SQL> select session_id,
2 session_serial#,
3 event,
4 blocking_session,
5 blocking_session_serial#,
6 sql_id,
7 program,
8 record_time
9 from temp_hist_active_sess_history
10 where session_id = 3738
11 and session_serial# = 45559
12 order by record_time;
.....
SESSION_ID SESSION_SERIAL# EVENT BLOCKING_SESSION BLOCKING_SESSION_SERIAL# SQL_ID PROGRAM RECORD_TIME
---------- --------------- ------------------------------- ---------------- ------------------------ ------------- ---------------- -------------
3738 45559 db file sequential read 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:54:00
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:54:10
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:54:20
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:54:30
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:54:40
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:54:50
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:55:00
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:55:10
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:55:20
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:55:30
3738 45559 8qg7sa3nramwx plsqldev.exe 2011-03-07 17:55:40
....
SESSION_ID SESSION_SERIAL# EVENT BLOCKING_SESSION BLOCKING_SESSION_SERIAL# SQL_ID PROGRAM RECORD_TIME
---------- --------------- ------------------------------- ---------------- ------------------------ ------------- ---------------- ------------
3738 45559 SQL*Net break/reset to client 8qg7sa3nramwx plsqldev.exe 2011-03-07 18:30:57
3738 45559 SQL*Net break/reset to client 8qg7sa3nramwx plsqldev.exe 2011-03-07 18:31:07
3738 45559 SQL*Net break/reset to client 8qg7sa3nramwx plsqldev.exe 2011-03-07 18:31:17
3738 45559 SQL*Net break/reset to client 8qg7sa3nramwx plsqldev.exe 2011-03-07 18:31:27
3738 45559 SQL*Net break/reset to client 8qg7sa3nramwx plsqldev.exe 2011-03-07 18:31:37
3738 45559 SQL*Net break/reset to client 8qg7sa3nramwx plsqldev.exe 2011-03-07 18:31:48
现在我们来看看‘8qg7sa3nramwx ’在做什么
SQL> select to_char(sql_text) from dba_hist_sqltext where sql_id ='8qg7sa3nramwx'
2 ;
TO_CHAR(SQL_TEXT)
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
update rmgz.opt_ocf set ponmark=2 where ponmark is null and ocftype in (8,9)
现在很明显了:有维护人员在批量更新数据。
opt_ocf是一个比较大的表,里面ocftype in (8,9)的重复记录有很多,锁了很多记录,造成大量的Lock。
对于负载很大的应用系统,批量更新数据时一定要非常小心。
必须把大批量更新数据放在晚上闲时做或者把大事务划分成小事务来分批提交.
网上有许多关于cursor: pin S wait on X事件问题的描述,有的说是BUG,有的说停用ASMM就好了,有的说修改了某某参数就OK了。其实只要我们注意日常在生产库中的操作和维护,排除掉人为的因素,这样的问题是很少会出现的。