In this Document
Purpose |
Troubleshooting Steps |
What is a Row Cache Enqueue Lock? |
What is the meaning of the warning WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK? |
Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" |
SGA Shrink/Resize Operations |
Issues by Row Cache Enqueue Type |
DC_TABLESPACES |
DC_SEQUENCES |
DC_USERS |
DC_OBJECT_IDS |
DC_SEGMENTS |
DC_ROLLBACK_SEGMENTS |
DC_TABLE_SCNS |
DC_AWR_CONTROL |
What information can be gathered to help identify the cause? |
Systemstate dump |
AWR, ADDM and ASH Reports |
How to interpret the information obtained |
Systemstate dump |
Example 1: |
Example 2: |
AWR Report |
Possible Issue in Pre-10g versions |
Troubleshooting Other Issues |
References |
The purpose of the note is to help troubleshoot reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "
The Row Cache or Data Dictionary Cache is a memory area in the shared pool that holds data dictionary information. Row cache holds data as rows instead of buffers.A Row cache enqueue lock is a lock on the data dictionary rows. The enqueue will be on a specific data dictionary object. This is called the enqueue type and can be found in the v$rowcache view
For a list of row cache types per version see:
This wait event is used when we are trying to acquire a lock on a rowcache entry.
When Row cache contention occurs, if the enqueue cannot be obtained within a certain predetermined time period, a trace file will be generated in the user_dump_dest or background_dump_dest depending on whether a user or background process created the trace file. The alert log is usually updated accordingly with the warning and the location of the trace file.
The database detects that a key resource is being held for too long and flags this up to the administrator so that this situation can be resolved.This may also be accompanied by database hang or slowdown.
The message in the alert.log and the trace file generated will tend to contain the message:
If the rowcache entry lock cannot be granted immediately then we enter a loop where we free the row cache objects latch, wait (using the above event), re-acquire the latch and then again try to acquire the rowcache lock. In single-instance mode, this is repeated for up to 1000 times before the process reports WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK message. Under RAC it repeats until we fail to acquire the instance lock or we are interrupted.
The systemstate dumped can provide some useful information for diagnosing the cause of the contention.
When the SGA is dynamically resized, varous latches need to be held to prevent changes from being made while the operation completes. If the resize takes a while or is happening frequently you can see "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" occurring. The key identifiers for this is high waits for 'SGA: allocation forcing component growth' or similar waits at the top of waits in AWR and the blocking session for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" waiting for waiting for 'SGA: allocation forcing component growth' (or similar). There are a couple of fixes available, see:
For each enqueue type, there are a limited number of operations that require each enqueue. The enqueue type therefore may give an indication as the type of operation that may be causing the issue. As such some common reasons are outlined below:
Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
Check for appropriate caching of sequences for the application requirements.
Deadlock and resulting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.
This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.
This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these.
RAC Specific Bugs
A systemstate dump is automatically generated by Oracle when the issue occurs and listed in the alert.log
Run AWR and ASH reports for the time when the problem is reported as well as a report leading up to the problem as these can sometimes help build a picture of when a problem actually started. The AWR ,ADDM and ASH reports can compliment each other in getting a complete picture.
Depending on the interval used for generating AWR snapshot, get a report for the smallest time frame available. The default for snapshots is an hourly interval.
As systemstate is complicated to analyze, it is recommended to create a Service Request and upload the alert.log, systemstate dump and AWR reports preceding and during the problem to support.
Often the wait for a Row cache enqueue is the culmination of a chain of events and the lock being held is a symptom of another issue where a process holding the requested row cache enqueue is being blocked by other processes. ie it is often a symptom, not the cause.
Systemstate dumps can help to find which row cache is being requested and may help find the blocking process.
The header of the trace shows:
So, in the example above Process 77 is waiting for a row cache needed in shared mode (request: S) .
Systemstate contains process state information for every process in the database so looking for this process in the systemstate:
Above we see that PROCESS 77 is requesting row cache dc_users in shared mode.
If process 77 is waiting that implies it is being blocked by another process so we now need to search the systemstate to determine what is holding the resource and blocking the process.
It is best to search for the object reference which in this case is object=0x1dc9a5d30.
If this is done we find that Process 218 is requesting this object in eXclusive:
A request in Exclusive mode will result in any requests in shared mode to wait behind it until the process is granted the request in exclusive mode and has released it. Thus this will block. Note this is a request for an exclusive not an exclusive hold, so something must be blocking this process. Looking at the other processes, we see that Process 164 has the object held in mode=S
So, Process 164 is holding the row cache enqueue in Shared mode (mode=S) and thus preventing process 218 from obtaining the row cache enqueue in eXclusive. Furthermore we see that the process 164 is on CPU (the systemstate shows last wait for 'SQL*Net message from client' and not waiting for 'SQL*Net message from client'). To further diagnose support would need to check the stack calls included in the dump to determine why this session was on CPU and holding this enqueue for so long (based on 'seconds since wait started=2539').
In this example PROCESS 18 (MMON) was waiting for row cache type dc_awr_control in SHARED mode.
The row cache lock for this object (object=39a79f090) is being held by PROCESS 269 in exclusive mode (mode=X). The process is waiting for 'SGA: allocation forcing component growth' .
Thus the root cause in this case is the resizing of SGA and the wait on row cache is secondary result of this.
If we use the AWR report for the period to correlate this information:
Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- SGA: allocation forcing compon 42,067,317 38,469 1 7.6 Other CPU time 2,796 0.6 db file sequential read 132,906 929 7 0.2 User I/O latch free 4,282,858 704 0 0.1 Other log file switch (checkpoint in 904 560 620 0.1 Configurat -------------------------------------------------------------
We can see clearly in the Top 5 Timed Events that there is significant waiting for this event across the system and 'SGA: allocation forcing component growth' is a major issue at this time.
The root cause of the "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" message is the resizing activity. The top 5 events do not even show waits for the 'row cache' symptom.
For Frequent resizing, there are a couple of potential fixes available, see:
Prior to 10g there was a limitation to detect deadlock at the row cache level. Possible workarounds to minimize the possibility of a deadlock occurring are:
For guidance troubleshooting other performance issues see:
Bug 12772404 - Significant "row cache objects" latch contention when using VPD (Doc ID 12772404.8)