这是开发人员反馈的问题:
有个程序查询差不多需要返回20多万个对象,在DEV和SIT都没问题,但在pre-production就跑不出来
开始以为这个库没跑dbms_stats,结果执行了dbms_stats之后仍然老样子。
并且在执行这个查询的时候app server和db server负载都不高...怪了。
抓了一下当时的awr report
但是系统负载并不高:
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 4,535.44 2,351.49
Logical reads: 125.43 65.03
Block changes: 16.73 8.67
Physical reads: 24.76 12.84
Physical writes: 8.57 4.44
User calls: 28.02 14.53
Parses: 7.09 3.68
Hard parses: 0.01 0.01
Sorts: 1.03 0.53
Logons: 0.05 0.02
Top 等待事件倒是挺特殊:direct path write temp
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
direct path write temp 4,346 59 14 62.0 User I/O
CPU time 25 26.5
db file parallel write 187 15 82 16.0 System I/O
log file sync 1,887 15 8 16.0 Commit
log file parallel write 2,006 15 8 15.8 System I/O
官方说明:direct path write and direct path write temp
When a process is writing buffers directly from PGA (as opposed to the DBWR writing them from the buffer cache), the process waits on this event for the write call to complete. Operations that could perform direct path writes include when a sort goes to disk, during parallel DML operations, direct-path INSERTs, parallel create table as select, and some LOB operations.
Like direct path reads, the number of waits is not the same as number of write calls issued if the I/O subsystem supports asynchronous writes. The session waits if it has processed all buffers in the PGA and is unable to continue work until an I/O request completes.
Causes
This happens in the following situations:
1) Sorts are too large to fit in memory and are written to disk
2) Parallel DML are issued to create/populate objects
3) Direct path loads
1不符合,因为:In-memory Sort %: 100.00
2不符合,没有开parallel,执行的查询也是程序来完成的
3不符合,执行的操作是个查询
到这里陷入迷茫,继续往下看report:
Elapsed CPU Elap per % Total
Time (s) Time (s) Executions Exec (s) DB Time SQL Id
---------- ---------- ------------ ---------- ------- -------------
68 6 1 67.6 70.7 f2zymmdhuzvxj
Module: JDBC Thin Client
SELECT 'wt.part.WTPart',A0.blob$entrySetadHocAcl,A0.checkoutInfoIsNull,A0.classn
amekeyA2checkoutInfo,A0.idA3A2checkoutInfo,A0.statecheckoutInfo,A0.classnamekeyc
ontainerReferen,A0.idA3containerReference,A0.contractNumber,A0.classnamekeydomai
nRef,A0.idA3domainRef,A0.entrySetadHocAcl,A0.eventSet,A0.classnamekeyA2foldering
这条语句引起了怀疑,问了开发人员,应用确实是这个查询。
顺手抓了它的执行计划:
SQL> set linesize 200
SQL> set pagesize 9999
SQL> SELECT * FROM TABLE(dbms_xplan.display_awr('f2zymmdhuzvxj'));
。。。。省略了sql
Plan hash value: 1054932678
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 27177 (100)| |
| 1 | HASH JOIN | | 574K | 329M| 115M | 27177 (1)| 00:06:21 |
| 2 | TABLE ACCESS FULL | WTPARTMASTER | 686K | 107M | | 2935 (1)| 00:00:42 |
| 3 | TABLE ACCESS FULL | WTPART | 574K | 239M | | 8804 (1)| 00:02:04 |
---------------------------------------------------------------------------------------------------------------------------
到这里就知道direct path write temp的原因了:
查询结果太大,使用了hash join,结果没有在pga里放下(hash join是在pga中完成的),于是使用了tempspace.从pga直接读写tempspace,自然就会有direct path write temp等待。
根本原因:
这个库是开发人员自己装的,pga和sga等值都太小了,调整之后就没有任何问题了
from:http://hi.baidu.com/wa0362/blog/item/5cc1a49bc7d5ccb0c8eaf48e.html