FAQ: How to Use AWR reports to Diagnose Database Performance Issues [ID 1359094.1] |
||
| 修改时间 11-MAY-2012 类型 HOWTO 状态 PUBLISHED |
|
In this Document
| Goal |
| Fix |
| Interpretation |
| Top 5 Timed Events |
| SQL Statistics |
| Analysis: |
| Other SQL Statistic Sections |
| Waits for 'Cursor: mutex/pin' |
| Load Profile |
| Instance Efficiency |
| Latch Activity |
| Notable timed and wait events: |
| CPU time events |
| Analysis: |
| Other Potential CPU related Issues: |
| Check to see if other waits follow the high CPU timed event. |
| High External CPU usage |
| Troubleshooting CPU usage |
| 'Log file sync' waits |
| Buffer busy waits |
| Troubleshooting Other Issues |
| Use of ADDM Reports alongside AWR |
| Other AWR reference Articles |
| Statspack |
| References |
OracleServer - Enterprise Edition - Version 10.2.0.1 to 11.2.0.3 [Release 10.2 to11.2]
Information in this document applies to any platform.
This article aims toprovide guidance on how to interpret AWR information specifically for DatabasePerformance issues.
AWR reports are anextremely useful diagnostic tool for the determination of the potential causeof database wide performance issues.
Typically when a performance issue is detected you would collect an AWR reportcovering the period of the poor performance. It is best to use a reportingperiod no longer than 1 hour as otherwise specifics can be lost.
It is also prudent to Gather AWR reports during times when performance isacceptable to provide baselines for comparison when there is a problem. Ensure that the baseline snapshot duration is the same as the problem durationto facilitate like with like comparison
For information regarding collecting AWR reports refer to:
Document1363422.1 Automatic Workload Repository (AWR) Reports - Start Point
NOTE:It is often prudent to use a matched ADDM report initially to give a pointer tothe main issues. Reading the corresponding ADDM report as a first step totuning can save a lot of time because it immediately points at the main user ascompared to trying to understand what an AWR report is presenting.
See: Useof ADDM Reports alongside AWR
Since we are lookingat a performance issue, our primary concern is what the database is waitingfor.
When processes wait, they are being prevented from doing an activity because ofsome other factor. High waits provide the highest benefit when wait times arereduced and as such are a good focus.
The Top Wait information provides such information and allows us to focus onthe main problem areas without wasting time investigating areas that are notcausing significant delay.
Asmentioned, the Top waits section is the most important single section in thewhole report being as it quatifies and allows comparison of the primarydiagnostic: what each session is waiting for. An example output is providedbelow:
Top 5 Timed EventsAvg %Total
~~~~~~~~~~~~~~~~~~waitCall
EventWaitsTime (s)(ms)Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file scattered read10,152,56481,327829.6User I/O
db file sequential read10,327,23175,878727.6User I/O
CPU time56,20720.5
read by other session4,397,33033,455812.2User I/O
PX Deq Credit: send blkd31,39826,5768469.7Other
-------------------------------------------------------------
The Top 5 Waits section reports on a number of useful topics related to Events.It records the number of waits encountered in the period and the total timespent waiting together with the average time waited for each event. The sectionis ordered by the %age of the total call time that each Event is responsiblefor.
Dependent on what is seen in this section, other report sections may need to bereferenced in order to quantify or check the findings. For example, the waitcount for a particular event needs to be assessed based upon the duration ofthe reporting period and also the number of users on the database at the time;10 Million waits in 10 minutes is far more significant than 10 Million in 10hours, or if shared among 10 users as opposed to 10,000.
In this example report, almost 60% of the time is spent waiting for I/O relatedreads.
Event 'db file scattered read ' is typically used when fetching blocks for a full tablescan index fast full scan and performs multiblock IO.
Event 'db file sequential read' is a single block read and is typically engaged for any activity where multiblock io is unavailable (for example index reads).
Another 20% of the time is spent waiting for or using CPU time. High CPU usageis often a symptom of poorly tuned SQL (or at least SQL which has potential totake less resource) of which excessive I/O can also be a symptom. More on CPUusage follows later.
Based on this we would investigate whether these waits indicate a problem ornot. If so, resolve the problem, if not, move on to the next wait to determineif that is a potential cause.
There are 2 main reasons why I/O related waits are going to be top of thewaits:
The database is doing lots of reads
The individual reads are slow
The Top 5events show us information that helps us here :
Is the database doing lots of reads?:
The section shows > 10 Million reads for each of these events in the period.
Whether this is a lot depends on whether the report duration is 1 hour or 1 minute.
Check the report duration to asses this.
If the reads do seem excessive, then why would the database do a lot of reads?
The database only reads data because the execution of SQL statements has instructed it to do so. To investigate further refer to the SQL Statistics Section.
Are the individual reads slow?
The section shows waits of <=8 ms for the 2 I/O related events.
Whether this is fast or slow is dependent on the hardware underlying the I/O subsystem, but typically anything under 20 ms is acceptable.
If the I/O was slow then you can get further information from the 'Tablespace IO Stats ' section:
oTablespace IO StatsDB/Inst: VMWREP/VMWREPSnaps: 1-15
o-> ordered by IOs (Reads + Writes) desc
o
oTablespace
o------------------------------
oAvAvAvAvBuffer Av Buf
oReads Reads/s Rd(ms) Blks/RdWrites Writes/sWaits Wt(ms)
o-------------- ------- ------ ------- ------------ -------- ---------- ------
oTS_TX_DATA
o14,246,3672837.64.6145,263,8802,8833,844,1618.3
oUSER
o204,834 410.71.017,849,02135415,2499.8
oUNDOTS1
o19,72503.01.010,064,0862001,9644.9
oAE_TS
o4,287,567855.46.79320465,7933.7
oTEMP
o2,022,883400.05.8878,0491700.0
oUNDOTS3
o1,310,493264.61.0941,67519430.0
oTS_TX_IDX
o1,884,478377.31.023,695073,7038.3
o>SYSAUX
o346,09475.63.9112,744200.0
oSYSTEM
o101,77127.93.525,09806532.7
Specifically, look for the timing under Rd(ms). If it is higher than 20milliseconds per read and reads are high, then you may want to startinvestigating a potential I/O bottleneck from the os.
NOTE: Youshould ignore relatively idle tablespaces/files as you can get high values dueto disk spinup etc. which are not relevant. If you have an issue with 10 millionreads being slow it is unlikely that a tablespace/file with 10 reads has causedthe problem!
For further investigation, the following note may be helpful:
Note:223117.1Troubleshooting I/O-related waits
Although high waits for 'db file scattered read' and 'db file sequential read'can be I/O related, it is actually more common to find that these waits arerelatively 'normal' based on the SQL that the database is being asked to run.In fact, on a well tuned database, you would want these events to be top of thewaits, since that would mean that no 'problem' events were there instead!
The trick is being able to assess whether the high waits is indicative of someSQL statements are not using optimal paths (as mentioned earlier) orotherwise. If there are high waits for 'db file scattered read', then SQLmay not be using optimal access paths and so are tending to do Full Table Scansas opposed to indexes (or there may be missing indexes or not optimalindexes). Furthermore, high waits for 'db file sequential read' mayindicate SQL statements are using unselective indexes and there for readingmore index blocks than necessary or using the wronmg indexes. So thesewaits may point to poor execution plans for sqls.
In either case the next step would be to check the top resource consuming sqlsfrom the AWR report to determine whether these look excessive or whetherimprovements can be made.
To do this look at the SQLStatistics Section.
As mentioned 20% of the time is spent waiting for or using CPU time. Thisshould also be looked at when looking at the SQL Statistics.
Rememberthat the next step to take following the Top 5 Waits is dependent upon thefindings within that section. In the example above, 3 of the waits pointtowards potentially Sub-optimal SQL so that should be the section investigatednext.
Equally, if you do not see any latch waits, then latches are not causing asignificant problem on your instance and so you do not need to investigatelatch waits further.
Generally, if the database is slow, and the Top 5 timed events include"CPU" and "db file sequential read" and "db filescattered read" in any order, then it is usually worth jumping to the TopSQL (by logical and physical reads) section of an AWR report and calling theSQL Tuning Advisor on them (or tune them manually) just to make sure that theyare running efficiently.
AWR Reportsshow a number of different SQL statistics:
The different SQL statistic sub sections should be examined based upon the TopWait events seen in the Top 5 Section.
In our example we saw top waits as 'db file scattered read' , 'db filesequential read' and CPU. For these we are most interested in SQL orderedby CPU Time, Gets and Reads. These sections actually duplicate someinformation adding other specifics as appropriate to the topic.
Often looking at 'SQL ordered by gets' is a convenient stating point asstatements with high buffer gets are usually good candidates for tuning :
SQL ordered by Gets
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> Total Buffer Gets:4,745,943,815
-> Captured SQL account for122.2% of Total
GetsCPUElapsed
Buffer GetsExecutionsper Exec%Total Time (s)Time (s)SQL Id
-------------- ------------ ------------ ------ -------- --------- -------------
1,228,753,8771687,314,011.225.98022.468404.73 5t1y1nvmwp2
SELECT ADDRESSID",CURRENT$."ADDRESSTYPEID",CURRENT$URRENT$."ADDRESS3",
CURRENT$."CITY",CURRENT$."ZIP",CURRENT$."STATE",CURRENT$."PHONECOUNTRYCODE",
CURRENT$."PHONENUMBER",CURRENT$."PHONEEXTENSION",CURRENT$."FAXCOU
1,039,875,75962,959,36316.521.95320.275618.96 grr4mg7ms81
Module: DBMS_SCHEDULER
INSERT INTO "ADDRESS_RDONLY" ("ADDRESSID","ADDRESSTYPEID","CUSTOMERID","
ADDRESS1","ADDRESS2","ADDRESS3","CITY","ZIP","STATE","PHONECOUNTRYCODE","PHONENU
854,035,2231685,083,543.018.05713.507458.95 4at7cbx8hnz
SELECT "CUSTOMERID",CURRENT$."ISACTIVE",CURRENT$."FIRSTNAME",CURRENT$."LASTNAME",CU<
RRENT$."ORGANIZATION",CURRENT$."DATEREGISTERED",CURRENT$."CUSTOMERSTATUSID",CURR
ENT$."LASTMODIFIEDDATE",CURRENT$."SOURCE",CURRENT$."EMPLOYEEDEPT",CURRENT$.
Tuning can either be performed either manually or by calling the SQL TuningAdvisor on them:
Document271196.1 Automatic SQL Tuning - SQL Profiles.
Document262687.1 How to use the Sql Tuning Advisor.
Document276103.1 PERFORMANCE TUNING USING ADVISORS AND MANAGEABILITY FEATURES: AWR,ASH, and ADDM and Sql Tuning Advisor.
Note: Use of the SQL Tuning Advisor requires the Oracle Tuning Pack License:
http://docs.oracle.com/cd/E11882_01/license.112/e10594/options.htm#DBLIC170
-> Total Buffer Gets: 4,745,943,815
On the assumption that this is an hour long report, this is a significant number of gets and as such this confirms that it is worth investigating the top SQL statements to make sure they are taking optimal paths.
Individual Buffer Gets
The buffer gets for the individual statements shown are very high with the lowest being 850 Million. These 3 statements actually point towards 2 different reasons for the large number of buffers:
Excessive Buffer Gets/Execution
SQL_IDs '5t1y1nvmwp2' and '4at7cbx8hnz' are only executed 168 times, but each execution reads over 5 Million buffers. This SQL statement is a prime candidate for tuning since the number of buffers read in each execution is so high.
Excessive Executions
On the other hand SQL_ID 'grr4mg7ms81' only reads 16 buffers for each execution. Tuning the individual statement may not be able to reduce that significantly. However, the issue with this statement is caused by the number of times it is executed - 65 Million.
Changing the way in which the statement is called is likely to have the largest impact here - it is likely that the statement is called in a loop, once per record, if it could be called so as to process multiple records at once then there is potential for significant economies of scale.
Rememberthat these numbers may be 'normal' for this environment (since some are verybusy). By comparing this report against a baseline, you can see whetherthese SQL statements also read this much data when the database performs well.If they do then they are not the cause of the issue and can be ignored(although there may be benefit generally in improving them).
As mentionedpreviously there are a number of different report sections that help forspecific causes. If you do not have the particular cause then there is likelyto be little benefit in looking at these. The following section outlines somepotential causes and uses:
If there aremutex waits such such as 'Cursor: pin S wait on X' or 'Cursor: mutex X' etc ,then these are indicative of parsing issues. On this basis look for statementswith high parse counts or high version counts under 'SQL ordered by ParseCalls' and 'SQL ordered by Version Count' as these are most likely to be thecauses of problems. The following notes can assist further:
Document1356828.1 FAQ: 'cursor: mutex ..' / 'cursor: pin ..' / 'library cache:mutex ..' Type Wait Events
Note:1349387.1Troubleshooting 'cursor: pin S wait on X' waits.
Dependent onthe waits, the load profile section either provides useful general backgroundinformation or specific details related to potential issues.
Load Profile
~~~~~~~~~~~~Per SecondPer Transaction
------------------------------
Redo size:4,585,414.803,165,883.14
Logical reads:94,185.6365,028.07
Block changes:40,028.5727,636.71
Physical reads:2,206.121,523.16
Physical writes:3,939.972,720.25
User calls:50.0834.58
Parses:26.9618.61
Hard parses:1.491.03
Sorts:18.3612.68
Logons:0.130.09
Executes:4,925.893,400.96
Transactions:1.45
% Blocks changed per Read:42.50Recursive Call %:99.19
Rollback per transaction %:59.69Rows per Sort:1922.64
In the example, the waits section shows potential for issues with the executionof SQL so the load profile can be checked for details in this area, although itis not the primary source of such information.
If you were looking at the AWR report for general tuning you might pick up thatthe load section shows relatively high redo activity with high physical writes.There are more writes than reads on this load with 42% block changes.
Furthermore, there is less hard parsing compared the soft parses.
If there was a mutex wait as top wait such as 'library cache: mutex X', thenstatistics such as the overall parse rate would be more relevant.
Again, comparing to a baseline will provide the best information, for example,checking to see if the load has changed by comparing redo size, users calls,and parsing.
Again,instance efficiency stats are more use for general tuning as opposed toaddressing specific issues (unless waits point at these).
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %:99.91Redo NoWait %:100.00
BufferHit%:98.14In-memory Sort %:99.98
Library Hit%:99.91Soft Parse %:94.48
Execute to Parse %:99.45Latch Hit %:99.97
Parse CPU to Parse Elapsd %:71.23% Non-Parse CPU:99.00
The most important Statistic presented here from the point of view of ourexample is the '% Non-Parse CPU' because this indicates that almost all theCPU time that we see in the Top Waits section is attributable toExecution and not parse, which means that tuning SQL may help to improve this.
If we were tuning then 94.48% soft parse rate would show a small proportion ofhard parsing which is desirable. The high execute to parse % indicatesgood usage of cursors. Generally, we want the statistics here close to100%, but remember that a few percent may not be relevant dependent on theapplication. For example, in a data warehouse environment, hard parsingmay be higher due to usage of materialized views and, or histograms. Soagain comparing to baseline report when performance was good isimportant.
In theexample we are not seeing significant waits for latches so this section couldbe ignored.
However if latch waits were significant, then we would be looking for highlatch sleeps under Latch Sleep Breakdown for latch free waits:
Latch Sleep Breakdown
* ordered by misses desc
Latch Name
----------------------------------------
Get RequestsMissesSleepsSpin GetsSleep1Sleep2Sleep3
-------------- ----------- ----------- ---------- -------- -------- --------
cache buffers chains
2,881,936,948 3,070,27141,3363,031,456000
row cache objects
941,375,5711,215,3958521,214,606000
object queue header operation
763,607,977949,37630,484919,782000
cache buffers lru chain
376,874,990705,1623,192702,090000
Here the top latch is cache buffers chains. Cache Buffers Chains latchesprotect the buffers in the buffer cache that hold data that we have retrievedfrom disk. This is a perfectly normal latch to see when data is being read.When this becomes stressed, the sleeps figure tends to rise as sessions startto wait to get the buffers they require. Contention can be caused by poorlytuned SQL reading the same buffers.
In our example, although the gets are high at 2.8 billion buffer gets, thesleeps at 41,336 is low. Average number of sleeps per miss ratio (AvgSlps/Miss) is low. The reason for this is that the server is able to deal withthis volume of data and so there is no significant contention on Cache BuffersChains latches at this point.
For other latch free waits, review the following note to identify what type oflatches to investigate:
Note:413942.1How to Identify Which Latch is Associated with a "latch free" wait
Just becauseCPU comes as top timed event in AWR may not indicate a problem. However,if performance is slow with high CPU usage, then start investigating thewait. First, check to see if a sql is taking most CPU under SQL orderedby CPU Time in AWR:
SQL ordered by CPU Time
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> % Total is the CPU Time divided into the Total CPU Time times 100
-> Total CPU Time (s):56,207
-> Captured SQL account for114.6% of Total
CPUElapsedCPU per% Total
Time (s)Time (s)ExecutionsExec (s) % Total DB Time SQL Id
---------- ---------- ------------ ----------- ------- ------- -------------
20,34924,884168121.1236.29.1 7bbhgqykv75px9
Module: DBMS_SCHEDULER
DECLARE job BINARY_INTEGER := :job; next_date TIMESTAMP WITH TIME ZONE := :myda
te; broken BOOLEAN := FALSE; job_name VARCHAR2(30) := :job_name; job_subname
VARCHAR2(30) := :job_subname; job_owner VARCHAR2(30) := :job_owner; job_start
TIMESTAMP WITH TIME ZONE := :job_start; job_scheduled_start TIMESTAMP WITH TIME
-> Total CPU Time (s): 56,207
This represents 15 minutes of CPU time in total. Whether this is significant depends on the report duration.
The top CPU using SQL uses 20,349 second (around 5 minutes),
Total DB of time this represents is 9.1%.
Executions is 168 - being as this execution count is the same as 2 of the 3 SQLs identified earlier, these may be related and this task may well be the scheduling job that runs the SQLs.
For example,cursor: pin S waits may cause the high CPU with following known issue:
Note:6904068.8Bug 6904068 - High CPU usage when there are "cursor: pin S" waits
If a processoutside of the database is taking high CPU then this could be preventingdatabase processes from getting the CPU they require and affecting the databaseperformance. In this case, run oswatcher or other os diagnostic tools to findwhich process is taking high CPU.
Note:433472.1OS Watcher For Windows (OSWFW) User Guide
Thefollowing note outlines how to further diagnose high CPU usage:
Note:164768.1Troubleshooting: High CPU Utilization
When a usersession commits or rolls back, the log writer flushes the redo from log bufferto the redo logs. AWR reports are very useful for determination if this is aproblem and whether the cause of the probnlem is I/O or in some other area. Thefollowing articles deal specifically with this symptom:
Document1376916.1 Troubleshooting: "Log File Sync" Waits
Note:34592.1WAITEVENT:"log file sync"
This is heevent waited on when a session is trying to get a buffer from the buffer cachebut the buffer is busy - either being read by another session or anothersession is holding it in incompatible mode. In order to find which blockis busy and why, use the following notes:
Document155971.1 Resolving Intense and "Random" Buffer Busy WaitPerformance Problems:Note:34405.1WAITEVENT: "buffer busy waits"
For guidancetroubleshooting other performance issues see:
Document1377446.1 Troubleshooting Performance Issues
ADDM reports can bereviewed along with AWR to assist in diagnosis since they provide specificrecommendations which can help point at potential problems. The following is asample ADDM report taken from:
Note:250655.1Howto use the Automatic Database Diagnostic Monitor:
Example Output:
DETAILED ADDM REPORT FOR TASK 'SCOTT_ADDM'WITH ID 5
----------------------------------------------------
Analysis Period:17-NOV-2003 from 09:50:21 to 10:35:47
Database ID/Instance:494687018/1
Snapshot Range: from 1to 3
Database Time: 4215seconds
Average Database Load:1.5 active sessions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FINDING 1: 65% impact(2734 seconds)
------------------------------------
PL/SQL executionconsumed significant database time.
RECOMMENDATION 1: SQLTuning, 65% benefit (2734 seconds)
ACTION: Tune the PL/SQLblock with SQL_ID fjxa1vp3yhtmr. Refer to
the "Tuning PL/SQLApplications" chapter of Oracle's "PL/SQL
User's Guide andReference"
RELEVANT OBJECT: SQLstatement with SQL_ID fjxa1vp3yhtmr
BEGINEMD_NOTIFICATION.QUEUE_READY(:1, :2, :3); END;
FINDING 2: 35% impact(1456 seconds)
------------------------------------
SQL statements consumingsignificant database time were found.
RECOMMENDATION 1: SQLTuning, 35% benefit (1456 seconds)
ACTION: Run SQL TuningAdvisor on the SQL statement with SQL_ID
gt9ahqgd5fmm2.
RELEVANT OBJECT: SQLstatement with SQL_ID gt9ahqgd5fmm2 and
PLAN_HASH 547793521
UPDATE bigemp SET empno= ROWNUM
FINDING 3: 20% impact(836 seconds)
-----------------------------------
The throughput of theI/O subsystem was significantly lower than expected.
RECOMMENDATION 1: HostConfiguration, 20% benefit (836 seconds)
ACTION: Considerincreasing the throughput of the I/O subsystem.
Oracle's recommendedsolution is to stripe all data file using
the SAME methodology.You might also need to increase the
number of disks forbetter performance.
RECOMMENDATION 2: HostConfiguration, 14% benefit (584 seconds)
ACTION: The performanceof file
D:\ORACLE\ORADATA\V1010\UNDOTBS01.DBFwas significantly worse
than other files. Ifstriping all files using the SAME
methodology is notpossible, consider striping this file over
multiple disks.
RELEVANT OBJECT:database file
"D:\ORACLE\ORADATA\V1010\UNDOTBS01.DBF"
SYMPTOMS THAT LED TO THEFINDING:
Wait class "UserI/O" was consuming significant database time.
(34% impact [1450seconds])
FINDING 4: 11% impact(447 seconds)
-----------------------------------
Undo I/O was asignificant portion (33%) of the total database I/O.
NO RECOMMENDATIONS AVAILABLE
SYMPTOMS THAT LED TO THEFINDING:
The throughput of theI/O subsystem was significantly lower than
expected. (20% impact[836 seconds])
Wait class "UserI/O" was consuming significant database time.
(34% impact [1450seconds])
FINDING 5: 9.9% impact(416 seconds)
------------------------------------
Buffer cache writes dueto small log files were consuming significant
database time.
RECOMMENDATION 1: DBConfiguration, 9.9% benefit (416 seconds)
ACTION: Increase thesize of the log files to 796 M to hold at
least 20 minutes of redoinformation.
ADDM report gives possible recommendations in more readable format thanAWR. However, ADDM should be interpreted along with AWR statistics foraccurate diagnostics.
The followingdocuiments can assist when reading other sections of AWR reports and for otherpurposed:
Document786554.1 How to Read PGA Memory Advisory Section in AWR and StatspackReports
Document754639.1 How to Read Buffer Cache Advisory Section in AWR and StatspackReports
Document1301503.1 Troubleshooting: AWR Snapshot Collection issues
Document1363422.1 Automatic Workload Repository (AWR) Reports - Start Point
AWR reports supercedelegacy reports such as statspack and bstat/estat. For reference, the followingis a link to and article outlining how to read statspack reports:
http://www.oracle.com/technetwork/database/focus-areas/performance/statspack-opm4-134117.pdf
Additionalinformation can be found in the following articles:
Document94224.1 FAQ- Statspack Complete Reference
Document 394937.1 Statistics Package (STATSPACK) Guide
Document149113.1 Installing and Configuring StatsPack Package
Document149121.1 Gathering a StatsPack snapshot
Document228913.1 Systemwide Tuning using STATSPACK Reports