Oracle db file parallel write 和 log file parallel write 等待事件 说明

一. db file parallel write 等待事件

引自如下blog:

http://oradbpedia.com/wiki/Wait_Events_-_db_file_parallel_write

 

db file parallel write

       The db file parallel write wait event belongs to the Oracle Database Writer (DBWR) process since it is the only process that writes blocks from the SGA to datafiles. When it is time to write, the DBWR process compiles a set of dirty blocks, hands the batch over to the OS, and waits on the db file parallel write event for the I/O to complete.

 

       Although user sessions never experience the db file parallel write wait event, this doesn't mean that they are never impacted by it. If the write complete waits or free buffer waits event shows up in user sessions, then they may be suffering from the impact of the db file parallel write event.

 

       If a user session needs to modify a block that happens to be in the DBWR write batch, it has to wait on the write complete waits event until that batch of blocks is completely written to disk. If the batch size is large, or the I/O subsystem is slow, the DBWR process will incur additional time waiting for the I/O to complete, and so will the user session that needs the block that is being written.

 

       If user sessions are experiencing the free buffer waits wait event, and the number of waits increases steadily, this means there is a shortage of free blocks in the SGA. This can happen if the Buffer Cache is too small, or DBWR can't keep up with the rate of blocks being dirtied. One of the reasons why the DBWR process can't keep up with dirty blocks is that it spends too much time on the db file parallel write event.

 

Parameters:

       P1=The number of files Oracle is writing to.

       P2=The number of blocks to be written.

       P3=Total number of I/O request same as P2 because multi-block I/O is not used.

 

       Since P1 and P2 report the number of files and blocks instead of the absolute file and block number, a DBA cannot tell which objects are being written. However, the user session that waits on the write complete waits or free buffer waits event does indicate the absolute file and block number in its P1 and P2 values.

 

Common Causes and Actions

       The db file parallel write latency is normally a symptom of a slow I/O subsystem or poor I/O configurations. This includes poor layout of database files, bad mount point to I/O controller ratio, wrong stripe size and/or RAID level, and not enough disks (i.e. there are a few high capacity disks versus many lower capacity disks). The DBA needs to look at the average I/O time. A well-laid database and I/O subsystem should provide an average I/O wait that does not exceed 2 centiseconds.

       If this is an issue, the DBA should review the I/O configuration by mapping out the I/O routes from mount points to controllers, and controllers to physical disk groups, and ensure proper placement of database files. The command for this function is platform specific and unfortunately, often requires administrator privilege. For storage systems configured with the Veritas volume manager on Sun platform, the DBA may be able to use the vxprint –ht command. The DBA should also watch for usage (i.e. I/O throughput and bottlenecks) from the OS level using sar –d, iostat –dxn, or an equivalent tool. If some disks are hit hard for an extended period of time (i.e. almost 100% busy), and the average queue length is greater than 3, then the DBA needs to rearrange some of the database files.

 

       Beyond ensuring the I/O subsystem is properly configured and database files are well placed, the DBA should make non-blocking I/O (DISK_ASYNC_IO = TRUE) available to the DBWR process if the platform supports asynchronous I/O.

 

       A larger write batch increases the DBWR I/O wait time, especially in an environment where datafiles are poorly placed. A sure sign that the write batch is too big is when user sessions start to wait on the write complete waits event. Prior to Oracle 8i, the _DB_BLOCK_WRITE_BATCH parameter determined the DBWR write batch size and the value can be seen in X$KVII. It is listed as DB writer IO clump. In 8i and higher, this parameter was replaced by the _DB_WRITER_CHUNK_WRITESand is listed as DBWR write chunk. A new parameter _DB_WRITER_MAX_WRITES was introduced to limit the number of outstanding DBWR I/Os. The DBA should ensure the batch size is not so large that it causes write complete waits and longer db file parallel write, and also not so small that it causes long dirty queue and free buffer waits. Also, bear in mind the improvements that Oracle made since 8i should put the write batch issue to rest, and DBAs shouldn't have to mess with it. The write complete waits event is prevalent in versions prior to 8i.

 

Prior to Oracle 8i

SQL>select * from x$kvii where kviitag = 'kcbswc';

 

Oracle 8i and higher

SQL>select * from x$kvii where kviitag in ('kcbswc','kcbscw');

      

       When the DB_BLOCK_MAX_DIRTY_TARGET parameter is set too low, it can also cause excessive waits on the db file parallel write and write complete waits events. This parameter is used to influence the amount of time it takes to perform instance recovery. When the number of dirty buffers exceeds the parameter's value, DBWR will write the dirty buffers to disk. This is known as incremental checkpoint. A smaller value provides shorter instance recovery time but it may cause the DBWR process to become hyperactive, especially in an active database where a large number of buffers are being modified. This in turn may cause excessive write complete waits and a longer db file parallel write time. This parameter is hidden in 9i and DBAs should not have to be concerned with it.

 

Diagnosis

       For system-level diagnosis, query the V$SYSTEM_EVENT view to determine if the AVERAGE_WAIT is an issue.

       SQL>select * from v$system_event where event = 'db file parallel write';

 

While at the V$SYSTEM_EVENT, look also for the companion events.

       SQL>select * from v$system_event

       SQL>where event in ('write complete waits', 'free buffer waits');

 

       This event occurs in the DBWR. It indicates that the DBWR is performing a parallel write to files and blocks. When the last I/O has gone to disk, the wait ends.Wait Time:

 

Wait until all of the I/Os are completed.

Parameter

Description

requests

This indicates the total number of I/O requests, which will be the same as blocks.

interrupt

timeout

This indicates the timeout value in centiseconds to wait for the IO completion.

 

 

二.  log file parallel write 等待事件

引自如下blog:

http://oracle-dox.net/McGraw.Hill-Oracle.Wait.Interf/8174final/LiB0036.html

 

log file parallel write

       The log file parallel write wait event has three parameters: files, blocks, and requests. In Oracle Database 10g, this wait event falls under the System I/O wait class. Keep the following key thoughts in mind when dealing with the log file parallel write wait event.

       (1)The log file parallel write event belongs only to the LGWR process.

       (2)A slow LGWR can impact foreground processes commit time.

       (3)Significant log file parallel write wait time is most likely an I/O issue.

 

Common Causes, Diagnosis, and Actions

       As the db file parallel write wait event belongs only to the DBWR process, the log file parallel write wait event belongs only to the LGWR process. When it is time to write, the LGWR process writes the redo buffer to the online redo logs by issuing a series of system write calls to the operating system. The LGWR process waits for the writes to complete on the log file parallel write event. The LGWR process looks for redo blocks to write once every three seconds, at commit, at rollback, when the _LOG_IO_SIZE threshold is met, when there is 1MB worth of redo entries in the log buffer, and when posted by the DBWR process.

 

       Although user sessions never experience the log file parallel write wait event, they can be impacted by a slow LGWR process. A slow LGWR process can magnify the log file sync waits, which user sessions wait on during commits or rollbacks. User sessions will not get the commit or rollback complete acknowledgement until the LGWR has completed the writes. Chapter 7 has more details on the log file sync wait event.

 

       The key database statistics to look at are the systemwide TIME_WAITED and AVERAGE_WAIT of the log file parallel write and log file sync wait events because they are interrelated:

 

SQL>select event, time_waited, average_wait from   v$system_event where  event in ('log file parallel write','log file sync');

EVENT             TIME_WAITED AVERAGE_WAIT
------------------------- ----------- ------------
log file parallel write   11315158   .508570816
log file sync          7518513   .497255756

 

       If the log file parallel write average wait time is greater than 10ms (or 1cs), this normally indicates slow I/O throughput. The cure is the same as for the db file parallel write waits. Enable asynchronous writes if your redo logs are on raw devices and the operating system supports asynchronous I/O. For redo logs on file systems, use synchronous direct writes.

       Unfortunately, you cannot spawn more than one LGWR process. In this case, it is critical that nothing else is sharing the mount point of the redo log files. Make sure the controller that services the mount point is not overloaded. Moving the redo logs to higher speed disks will also help.

       We strongly suggest that you avoid putting redo logs on RAID5 disks, but we also understand that many times you don’t have a choice or a say in it. You can vent your frustration at www.baarf.com.

 

       Besides improving the I/O throughput, you can also work on lowering the amount of redo entries. This will provide some relief, but not the cure. Whenever possible, use the NOLOGGING option. Indexes should be created or rebuilt with the NOLOGGING option. CTAS operations should also use this option.

 

 

Note:

       The NOLOGGING option doesn’t apply to normal DML operations such as inserts, updates, and deletes. Objects created with the NOLOGGING option are unrecoverable unless a backup is taken prior to the corruption. If you have to take an additional backup, then the I/Os that you save by not logging will be spent on backup.        Database in FORCE LOGGING mode will log all changes (except for changes in temporary tablespaces), regardless of the tablespace and object settings.

 

       A lower commit frequency at the expense of higher rollback segment usage can also provide some relief.

       A high commit frequency causes the LGWR process to be overactive and when coupled with slow I/O throughput will only magnify the log file parallel write waits.

       The application may be processing a large set of data in a loop and committing each change, which causes the log buffer to be flushed too frequently. In this case, modify the application to commit at a lower frequency. There could also be many short sessions that log in to the database, perform a quick DML operation, and log out.

       In this case, the application design may need to be reviewed. You can find out who is committing frequently with the following query:

 

       SQL>select sid, value from   v$sesstat where  statistic# = (select statistic#                      from   v$statname    where  name = 'user commits') order by value desc;

-- Another evidence of excessive commits is high redo wastage.

SQL>select b.name, a.value, round(sysdate - c.startup_time) days_old from   v$sysstat a, v$statname b, v$instance c where  a.statistic# = b.statistic# and   b.name  in ('redo wastage','redo size');

NAME         VALUE        DAYS_OLD
--------------- --------------- ---------------
redo size        249289419360       5
redo wastage     2332945528         5

 

       Check the job scheduler to see if hot backups run during peak hours. They can create large amounts of redo entries, which in turn increases the log file parallel write waits. Hot backups should run during off-peak hours, and tablespaces should be taken out of hot backup mode as soon as possible.

 

       Lastly, be careful not to jam the LGWR with too many redo entries at one time. This can happen with large log buffer because the one-third threshold is also larger and holds more redo entries. When the one-third threshold is met, the LGWR process performs a background write if it is not already active. And the amount of redo entries may be too much for the LGWR to handle at one time, causing extended log file parallel write waits. So the idea is to stream the LGWR writes. This can be done by lowering the one-third threshold, which is controlled by the initialization parameter _LOG_IO_SIZE.

       By default the _LOG_IO_SIZE is one-third of the LOG_BUFFER or 1MB, whichever is less, expressed in log blocks. Query the X$KCCLE.LEBSZ for the log block size. Typically, it is 512 bytes.

       For example, if the LOG_BUFFER is 2,097,152 bytes (2MB), and the log block size is 512 bytes, then the default value for _LOG_IO_SIZE is 1,365 used log blocks. At this size, the LGWR process becomes lazy and normally writes only on transaction terminations (sync writes) or when it wakes up from its three-second timeouts. You should set the _LOG_IO_SIZE at the equivalent of 64K. That way, you can still have a larger log buffer to accommodate the spikes for buffer space after checkpoints, but the writes will start when there is 64K worth of redo entries in the buffer, assuming there is no user commit or rollback, and the LGWR sleep hasn’t timed out during that time.

 

Notes:

       This method is not without overhead. The LGWR write operation requires the redo copy and redo writing latches. So a more active LGWR process will increase the load on these latches. Do not reduce the _LOG_IO_SIZE if these latches currently have high SLEEPS. However, if the condition allows you to change the _LOG_IO_SIZE, you must monitor its impact over time by querying the V$LATCH view. Make sure you obtain a baseline before implementing the change.

 

       You can use the following query to find the average number of redo log blocks per write and the average LGWR I/O size in bytes:

 

SQL>select round((a.value / b.value) + 0.5,0) as avg_redo_blks_per_write,       round((a.value / b.value) + 0.5,0) * c.lebsz as avg_io_size from   v$sysstat a, v$sysstat b, x$kccle c where  c.lenum = 1 and    a.name  = 'redo blocks written' and    b.name  = 'redo writes';

AVG_REDO_BLKS_PER_WRITE AVG_IO_SIZE
----------------------- -----------
                      8        8192

 

 

 

 

-------------------------------------------------------------------------------------------------------

你可能感兴趣的:(Oracle db file parallel write 和 log file parallel write 等待事件 说明)