Troubleshooting: Waits for Mutex Type Events (文档 ID 1377998.1)

In this Document

  Purpose
  Troubleshooting Steps
  Mutex Event Waits
  How to Identify Mutex Event Waits.
  Diagnosing Potential Causes using AWR Report
  Load Profile
  Increased Parse Counts
  Mutex Sleeps
  Database Appears 'Hung'
  Documents with Suggestions on How to Diagnose Specific Waits
  Potential Solutions
  Use Recommended Patch Levels
  Reduce Parsing
  High Version Counts
  CURSOR_SHARING=SIMILAR
  OS Resources
  Size the Shared Pool Correctly
  Known Issues
  11.2.0.X
  11.1.0.7.X
  Further Details on Specific Mutex Wait Events
  Troubleshooting Other Issues
  References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.1.0.2 and later
Information in this document applies to any platform.
***Checked for relevance on 01-Jan-2014*** 

PURPOSE

The purpose of the note is to help customers troubleshoot mutex waits.

TROUBLESHOOTING STEPS

 

Mutex Event Waits

"Mutex Waits" is a collective term for waits for resources associated with the management of cursor objects in the shared pool during parsing. Mutexes were introduced in 10g as faster and lightweight forms of latches and waits for these resources will occur in normal operations. However when these waits become excessive, contention can occur causing problems. 

Full troubleshooting and diagnostics for every type of mutex related issue may be beyond the scope of this article, but basic principles and problem identification can be achieved.

Firstly you need to actually identify that mutex waits are occurring.

How to Identify Mutex Event Waits.

Mutex waits are characterised by sessions waiting for one one or more of the following events:

  • cursor: mutex X
  • cursor: mutex S
  • cursor: pin S
  • cursor: pin X
  • cursor: pin S wait on X
  • library cache: mutex X
  • library cache: mutex S

Cursor mutexes are used to protect the parent cursor and also with cursor statistic operations. 
Cursor pins are used to pin a cursor in preparation for a  related operation on the cursor.
Library cache mutexes are similar to library cache operations in earlier versions except they are now implemented using mutexes. In all these cases, waits for these resources occurs when 2 (or more) sessions are working with the same cursors simultaneously. When one session takes and holds a resource required by another, the second session will wait and will wait on one of these events.

Mutex contention is typically characterised by a perception of slow performance at a session or even the database level. Since Mutexes are almost wholly a CPU using resource, if contention occurs, CPU usage can rise and will quickly start to impact users. In normal operation the amount of CPU usage per mutex and the time taken is extremely small, but when contention occurs and the number of mutex operations against the same objects goes in to millions these small numbers add up. Additionally as the CPU is used the mutex operations themselves can start to take longer (because of the time taken waiting on the CPU run queue) further adding to problems.

Diagnosing Potential Causes using AWR Report

The best starting point for identification of Mutex waits is the use of a general database report such as the Automatic Workload Repository (AWR) Reports.

When looking for mutex contention it is best to collect AWR reports for 2 separate periods:

  • When the problem is actually occurring
  • A separate,baseline, period when the problem is not occurring but with similar load

Collection of both an active report and a baseline is extremely useful for comparison purposes.

Remember AWR is a general report and as such may not show directly which session is holding the mutex and why, but it will reveal the overall picture of database statistics such as the top waits, sql statements run, parses, version counts, parameter settings, etc. that are useful indicators towards mutex issues.


For information on how to collect AWR reports refer to:

Document 1363422.1 Automatic Workload Repository (AWR) Reports - Start Point


For mutex contention, it is preferable to look at snapshots with a maximum duration of an hour. Durations as short as 5-10 minutes can be used as long as the durations are the same for the baseline and and problem periods.

If mutex contention is occurring then usually mutex waits will surface to the top timed events:

Problem Period AWR Report: (1 hour duration)

Compare to the baseline report: (1 hour duration)

Baseline AWR Report:

  

In the problem report, the top wait is for a cursor operation 'library cache: mutex X' which means that sessions are waiting to get a library cache mutex in eXclusive mode for one or more cursors. From the figures, this is taking > 56.42% of the database time. The Average wait time of 294 ms (milliseconds) is extremely high as is the number of waits  at > 1.3 Million waits in an hour. 

In comparison, during the baseline, there is no evidence of high waits for mutex events in the top 5 at all and  the events seen are the more normal I/O waits.

Now that we have identified a problem, we want to dig deeper and determine the area the problem is in so that we can ultimately get to a root cause and a solution.

- If you have an AWR which shows the high mutex issue then start by running:

 

select * from (
select p1, sql_id,
count(*),
(ratio_to_report(count(*)) over ())*100 pct
from dba_hist_active_sess_history
where event='library cache: mutex X'
and snap_id between <begin snap> and <end snap>
and dbid = <dbid>
group by p1, sql_id
order by count(*) desc)
where rownum <= 10;

 

This will give you the top 10 P1/SQL_ID arguments of the waits.

The SQL_ID is the SQL statement the session is running.

The P1 is the object the mutex is against.

For the topmost P1 run:

 

select KGLNAOBJ, KGLNAOWN, KGLHDNSP, KGLOBTYP
from x$kglob where KGLNAHSH= {value of P1}

This will tell you the object the mutex is against.   If the same SQL_ID shows up with different P1 values in the Top10, then it is likely to be related to that SQL statement. If the SQL_ID and P1 is unique, it is likely to be a hot object. 

If there is hot object, review following bug:

Note:9239863.8 Excessive "library cache:mutex X" contention on hot objects

 If there is no hot object, but high general mutex waits, start diagnosing the load profile.

Load Profile

The load profile on the server and the location of that load can help drill down. For mutex contention issues we are primarily interested in parse information

Problem Period AWR Report: (1 hour duration)

 

Baseline AWR Report: (1 hour duration)

          

Generally, the load is higher in the "Problem Period" report. Furthermore, the parse statistics are higher in the 'bad' report; hard parse is 45 vs 23 per second. So this indicates that there is a higher rate of parsing in the Problem period which may be causing contention issues. Now we should look to see the SQL that is being parsed the most as this is likely to be the cause of the problem.

Note: The SQL with the Highest Volume is more likely to be the cause of problems but this is not necessarily the case and often an increase in parsing from a "Good" Baseline is a better indicator.

Increased Parse Counts

Under SQL ordered by Parse Calls, we are looking for the total parse calls and then the parse calls for particular statements:

Problem Period AWR Report: (1 hour duration)


  
Baseline AWR Report: (1 hour duration)

  

In general the parse count has increased moving from 1.8M to 3.1M. Focusing on specific statements, SQL_Id '68subccxd9b0'3 and '12235mxs4h54u' have doubled the number of parses and '3j91frnd21kks' has come in from 'nowhere' and must also have at least doubled the parses since the lowest parse calls shown in the baseline is 15,000 and this shows 42,000. 

These SQL statements are good candidates for investigation:

  • Why has the parse count increased?
    • Has new code or code changes been introduced?
    • Is a new application being used?
    • Have more users been brought online?
    • Has the activity profile been changed 
      - are more activities being run concurrently than previously?

By answering these kind of questions, you can often find potential causes.

See the "Over Parsing" section in:

Document:33089.1 TROUBLESHOOTING: Possible Causes of Poor SQL Performance

Mutex Sleeps

When a mutex is requested, this is called a get request. 
If a session tries to get a mutex but another session is already holding it, then the get request cannot be granted and the session requesting the mutex will 'spin' for the mutex a number of times in the hope that it will be quickly freed. The session spins in a tight loop, testing each time to see if the mutex has been freed.

If the mutex has not been freed by the end of this spinning, the session waits. 
When this happens the sleeps column for the particular code location where the session is waiting is incremented in the v$mutex_sleep* views. 

This 'Sleeps' count for a particular location is very useful for identification of the area in which mutex contention is occurring.

In later versions this information is externalised in the 'Mutex Sleep Summary' section of the AWR report:

Mutex Type       Location       Sleeps       Time (ms)
---------------- -------------- ------------ ------------
Library Cache    kglpin1 4        20,053,325      201,203
Library Cache    kglget1 1            38,809      110,015
Library Cache    kglpndl1 95          25,147       55,946
Library Cache    kglpin1 4            24,887       52,524


What we are interested in here is the location and primarily the Time spent in each. The number of sleeps is also important but if it takes no time then it is unlikely to be affecting performance.

This information can be used to search for other similar issues that have also resulted in contention in this particular area and from these determine solutions that have previously been used to address these.

As an example:
In this case the the top location for sleeps is in the Library cache 'kglpin1 4'. 
In terms of time this is taking almost 2x as much time as the next sleeper and also is responsible for 20M more sleeps. This would therefore be a good candidate for a search for known issues. In this case if you search on 'kglpin1 4', one of the documents you will find is:

Document:7307972.8 Bug 7307972 - Excessive waits on 'library cache: mutex x'


which may be directly applicable, or may give pointers as to potential solutions.

Note: if nothing specific is found from searches on 'kglpin1 4' it is worth searching for the other locations (e.g. 'kglget1 1') - although this may be a new issue, related information from these searches may be helpful.

 

Note: Although this information is included in AWR reports, you can select it directly from the view V$MUTEX_SLEEP_HISTORY using:

 

select to_char(sysdate, 'HH:MI:SS') time, KGLNAHSH hash, sum(sleeps) sleeps,location,MUTEX_TYPE
, substr(KGLNAOBJ,1,40) object
from x$kglob , v$mutex_sleep_history
where kglnahsh=mutex_identifier
group by KGLNAOBJ,KGLNAHSH,location,MUTEX_TYPE
order by sleeps
/


Interpretation is as with the AWR example above.

Database Appears 'Hung'

Sometimes contention for mutexes will become so intense that the database may appear to hang. In these cases, it is useful to determine which session or sessions are blocking others and to investigate what the blocking sessions are doing. 

By running the following select (which outputs the Session ID and the Text of the SQL being executed) at short intervals,  pick up common blockers and investigate their activities. If the same SQL is seen then it can be investigated for problems in a similar way to we investigated High Parsing SQL previously.

select s.sid, t.sql_text
from v$session s, v$sql t
where s.event like '%mutex%'
and t.sql_id = s.sql_id

 

Documents with Suggestions on How to Diagnose Specific Waits

Document:1356828.1 FAQ: FAQ: 'cursor: mutex ..' / 'cursor: pin ..' / 'library cache: mutex ..' Type Wait Events

Document:1349387.1 Troubleshooting 'cursor: pin S wait on X' waits 
Document:1357946.1 Troubleshooting 'library cache: mutex X' waits.

 

Potential Solutions

  1. Use Recommended Patch Levels

    Make sure you are running on the Latest Patchset Revision
    Issues with mutex contention are likely to be priority fixes and so are strong candidates for inclusion in patchsets. Ensuring you are on the latest Patchset is a good way of avoiding known issues.

    Document 756671.1 Oracle Recommended Patches -- Oracle Database
  2. Reduce Parsing

    From the AWR, look for high parsing SQLs.  Check to see if the amount of parsing that these SQLs are instigating can be reduced or perhaps they can be run less often to decrease amount of parsing.  Mutex waits can be caused by high volumes of hard and soft parsing (in a similar way to how this caused issues with library cache waits in earlier versions). 
  3. High Version Counts

    When a statement is parsed, the library cache is checked to see if there is already a version of that statement cached. If there is more than one version of the statement then these are all checked for a match. This takes time and CPU cycles. If a statement has a high number of versions, then when that statement is parsed, the mutex protecting the checking operation is held for longer. This means that the more versions there are, the more chance of contention occurring. For details on how to identify and reduce version counts, see: 

    Document:296377.1 Troubleshooting: High Version Count Issues
    Document:438755.1 High SQL Version Counts - Script to determine reason(s)

    In 11.2.0.2.2 and higher, high version counts can be limited by setting a threshold after which  the parent cursor is made obsolete. For further information on this and and how to enforce this behavior see:

    Document:10187168.8 Bug 10187168 Enhancement to obsolete parent cursors if VERSION_COUNT exceeds a threshold

    Note: This does not solve issues causing high version counts, but can limit the version counts and thus minimize waits on mutexes while the base issue is being investigated

    CURSOR_SHARING=SIMILAR

    A specific occurrence of version counts can occur when Cursor Sharing is set to SIMILAR.
    The following note explains why setting cursor sharing to similar is not recommended and is indeed deprecated in 11g.

    Document:261020.1 High Version Count with CURSOR_SHARING = SIMILAR or FORCE
    Document:1169017.1 ANNOUNCEMENT: Deprecating the cursor_sharing = 'SIMILAR' setting
  4. OS Resources

    Mutex contention can be a symptom of exhaustion of OS resources as opposed to the cause of them. Make sure that other processes on the system are not consuming large amounts of OS resources, as this can cause operations waiting for mutexes to wait longer causing more contention.  If there is CPU starvation (100% CPU consumption along with mutex waits), resolve the CPU issue first.  You can use oswatcher to help identify OS issues. See:

    Document:301137.1 OS Watcher User Guide

    You are looking for high CPU usage and high memory consumption, if found investigate what the process is doing and why it is using such large amounts of resource.

    If resource manager is turned on, try turning it off (and vice versa) to see if it alleviates the mutex wait. 

    You should also ensure that your OS is patched to the recommended version.
  5. Size the Shared Pool Correctly

    If the shared pool is incorrectly sized, this can result in cursors being unnecessarily aged out and re-parsed. If large numbers of cursors are being re-parsed this can cause mutex contention.

    Document:62143.1 Understanding and Tuning the Shared Pool

Known Issues

  • 11.2.0.X

    On Oracle 11g Version 11.2, then consider applying PSU 11.2.0.2.3.
    On Oracle 11g Version 11.2.0.2, seriously consider applying 11.2.0.2.2 psu + fix for 12431716. Many mutex fixes are already included in these patches:

    Document:1291879.1Oracle Database Patch Set Update 11.2.0.2.2 Known Issues
  • 11.1.0.7.X

    On Oracle 11g Version 11.1.0.7.0, consider applying the latest patchset.  See:

    Document:12419384.8 11.1.0.7.8 Patch Set Update (PSU) Yes Patch:12419384

Further Details on Specific Mutex Wait Events

 

Document:1310764.1 WAITEVENT: "cursor: pin S" Reference Note
Document:1298015.1 WAITEVENT: "cursor: pin S wait on X" Reference Note
Document:727400.1 WAITEVENT: "library cache: mutex X"

Troubleshooting Other Issues

For guidance troubleshooting other performance issues see:


Document:1377446.1 Troubleshooting Performance Issues


NOTE:1349387.1 - Troubleshooting 'cursor: pin S wait on X' waits.
NOTE:1356828.1 - FAQ: 'cursor: mutex ..' / 'cursor: pin ..' / 'library cache: mutex ..' Type Wait Events
NOTE:1357946.1 - Troubleshooting 'library cache: mutex X' waits.
NOTE:1363422.1 - Automatic Workload Repository (AWR) Reports - Start Point
NOTE:1377446.1 - * Troubleshooting Performance Issues

NOTE:296377.1 - Troubleshooting: High Version Count Issues
NOTE:301137.1 - OSWatcher Black Box (Includes: [Video])
NOTE:33089.1 - * TROUBLESHOOTING: Possible Causes of Poor SQL Performance
NOTE:62143.1 - Troubleshooting: Tuning the Shared Pool and Tuning Library Cache Latch Contention
NOTE:727400.1 - WAITEVENT: "library cache: mutex X"
NOTE:7307972.8 - Bug 7307972 - Excessive waits on 'library cache: mutex x'
NOTE:756671.1 - Oracle Recommended Patches -- Oracle Database

BUG:7307972 - HIGH VALUE FOR LIBRARY CACHE: MUTEX X" IN AWR AS TOP WAIT EVENT
NOTE:10187168.8 - Bug 10187168 - Enhancement to obsolete parent cursors if VERSION_COUNT exceeds a threshold
NOTE:1169017.1 - ANNOUNCEMENT: Deprecating the cursor_sharing = 'SIMILAR' setting
NOTE:12419384.8 - Bug 12419384 - 11.1.0.7.8 (Jul 2011) Database Patch Set Update (PSU)
NOTE:1291879.1 - Oracle Database Patch Set Update 11.2.0.2.2 Known Issues
NOTE:1298015.1 - WAITEVENT: "cursor: pin S wait on X" Reference Note
NOTE:1310764.1 - WAITEVENT: "cursor: pin S" Reference Note
 
 

附件

   
 
  • Snap13.bmp(184.55 KB)
  • Snap14.bmp(447.63 KB)
  • Snap2.bmp(272.12 KB)
  • Snap3.bmp(384.43 KB)
  • Snap5.bmp(342.79 KB)
  • Snap5.bmp(238.19 KB)
  • Snap9.bmp(384.84 KB)
 
 

相关内容

   
 
 

产品

   
 
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Database - Enterprise Edition > RDBMS > Database Level Performance Issues (not SQL Tuning)
 

关键字

   
 
TROUBLESHOOT

你可能感兴趣的:(Troubleshooting: Waits for Mutex Type Events (文档 ID 1377998.1))