Understanding 9i Real Application Clusters Cache Fusion (Doc ID 139436.1)

PURPOSE
-------

The purpose of this document is to explain the benefits and functionality of 
cache fusion in an Oracle Real Application Clusters Environment. 

 
SCOPE & APPLICATION
-------------------

This document is intended for Oracle Real Application Clusters database 
administrators that would like to understand how cache fusion works and how it 
can increase performance on their clustered database.


Understanding 9i Real Application Clusters Cache Fusion
-------------------------------------------------------
 
In Oracle 8i OPS the concept of cache fusion was introduced.  This functionality
was added to reduce pinging blocks via disk to get read consistent views of the
blocks on a remote instance.  This functionality greatly reduced the cost of 
selecting data from an instance who's lock element was in use by another instance.
Instead of forcing the owning instance to write its changes to disk (forcing an I/O)
so that it can be read on the remote instance, cache fusion will create a copy of 
the buffer and ship it across the interconnect to the instance that is selecting 
the data.  This concept addressed read/write contention but when an instance had to 
change the block that was held on the remote instance it has to go through the 
same ping mechanism and write the block to disk.  

9i Real Application Clusters has improved the cache fusion design by addressing
write/write contention thus increasing speedup and scaleup.  With RAC cache fusion, 
Oracle has taken further steps to reduce the amount of I/O contention by virtually 
eliminating the need to 'ping' blocks to disk for locks that are in use but are 
requested for write access on the remote instance.  Instead, a RAC instance can 
ship copies of dirty buffers to remote instances for write access.  This
functionality is only available when using dba (1:1) releasable locking (setting
gc_files_to_locks to 0 for a particular datafile or not setting gc_files_to_locks
at all).  Hashed locks still use the ping mechanism and fixed locks have been
eliminated in 9i RAC.

As previously mentioned, in RAC cache fusion an instance can ship copies of dirty 
buffers to remote instances.  RAC cache fusion introduces the concept of past images 
which are older copies of a buffer that have not yet been written to disk.  To keep 
track of these past images Oracle uses global and local lock roles and BWR (block 
written redo).  To clarify the concept of global and local lock roles consider that 
all lock types from Oracle 8i are considered local.  So...in Oracle 8i there are 3 
different lock modes:

	N - Null
	S - Shared
	X - Exclusive

When referring to lock modes in 9i RAC, there are 3 characters to distinguish locks. 
The first letter represents the lock mode, the second character represents the lock
role, and the third character (a number) shows us if there are any past images for
this lock in the local instance.  So, our lock types are now:

	NL0 - Null Local 0 - Essentially the same as N in 8i (no past images)
	SL0 - Shared Local 0 - Essentially the same as S in 8i (no past images)
	XL0 - Exclusive Local 0 - Essentially the same as X in 8i (no past images)
	SG0 - Shared Global 0 - Global S lock, instance owns current block image
	XG0 - Exclusive Global 0 - Global X lock, instance owns current block image
	NG1 - Null Global 1 - Global N lock, instance owns past image
	SG1 - Shared Global 1 - Global S lock, instance owns past image
	XG1 - Exclusive Global 1 - Global X lock, instance owns past image
	
When a lock is acquired for the first time it is acquired with a local role.  If the
lock is acquired and already has dirty buffers on a remote instance then it takes on
a global lock role and any remote instances that contain these dirty buffers will 
have what is called a 'past image' of the buffer.  For recovery purposes, instances 
that have past images will keep these past images in their buffer cache until 
notified by the master instance of the lock to release them.  When the buffers are 
discarded, the instance keeping the past image will write a BWR or 'block written 
redo' to its redo stream indicating that the block has already been written to disk
and is not needed for recovery by this instance.

Let's assume there is a 3 node RAC cluster and with lock element 123 covering a
block in the EMP table:

User C issues a select against table EMP lock element 123 on Instance 3 which opens 
an SL0 lock on Instance 3:

    ----------------      -----------------     -----------------  
    |  Instance 1   |     | Instance 2    |    |  Instance 3    |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held:  |    |   Lock Held:   | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |               |     |               |    |      SL0       |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

Acquiring shared locks do not effect the lock roles.  So, if user B issues the same 
select against table EMP lock element 123 on instance 2 the lock mode is the same 
(because it is an S lock) and the lock role is the same because no buffers are 
dirtied:

    ----------------      -----------------     -----------------  
    |  Instance 1   |     |  Instance 2   |    |   Instance 3   |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held   |    |   Lock Held    | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |               |     |      SL0      |    |      SL0       |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

Acquiring the first exclusive lock also does not effect the lock role if there are
no dirty buffers for the lock element.  So, if user B decides to update rows on 
table EMP lock element 123 on instance 2 they will acquire XL0 and the SL0 locks 
will be removed per standard 8i OPS locking behavior:

    ----------------      -----------------     -----------------  
    |  Instance 1   |     |  Instance 2   |    |   Instance 3   |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held   |    |   Lock Held    | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |               |     |      XL0      |    |                |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

Acquiring an exclusive lock on a lock element that has dirty buffers in a remote 
instance causes cache fusion phase 2 to come into play.  So, if user A decides to
update rows on table EMP lock element 123 on instance 1 and instance 2's block is 
dirty in its buffer cache then instance 2 will ship a copy of the block to instance 
1 and instance 2 will now have null global lock (past image)*.  At this time 
instance 1 will own an exclusive, globally dirtied lock while instance 2 retains a 
past image:

* Note that when an instance owns a past image of a block it is not permitted to 
write the block to disk or discard the image until notified by the master node.  

    ----------------      -----------------     -----------------  
    |  Instance 1   |     |  Instance 2   |    |   Instance 3   |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held   |    |   Lock Held    | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |      XG0      |     |      NG1      |    |                |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

Now let's assume that user C on instance 3 wants to select from table EMP lock 
element 123.  After user C issues the select Instance 1's lock will be downgraded to
S mode and will have the most recent copy of the buffer with instance 2 still 
holding an older past image of the buffer:

    ----------------      -----------------     -----------------  
    |  Instance 1   |     |  Instance 2   |    |   Instance 3   |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held   |    |   Lock Held    | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |      SG1      |     |      NG1      |    |      SG0       |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

Now let's assume that user B on instance 2 wants to select from table EMP lock 
element 123.  Instance 2 will now request a consistent read copy of the buffer 
from another instance.  The instance that instance 2 will receive the buffer 
from will be determined in the following order:

	1. Master instance for lock.
	2. S holding the last past image.
	3. Shared local.
	4. Most recently granted S.
 
Let's assume that Instance 1 is the master of the lock (and holds last pi).  
So...instance 2 will have instance 1 ship a copy of the block to its buffer cache 
and instance 2 will now have SG1 (read consistent copy of a past image) while the 
others stay the same:

    ----------------      -----------------     -----------------  
    |  Instance 1   |     |  Instance 2   |    |   Instance 3   |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held   |    |   Lock Held    | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |      SG1      |     |      SG1      |    |      SG0       |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

Now let's assume that user C on Instance 3 now wants to update table EMP lock 
element 123.  User C will now require an exclusive lock and instances 1 and 2 will 
have to downgrade.  The end result is instance 3 holding XG0 and instances 1 and 2 
holding NG1's respectively:

    ----------------      -----------------     -----------------  
    |  Instance 1   |     |  Instance 2   |    |   Instance 3   |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held   |    |   Lock Held    | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |      NG1      |     |      NG1      |    |      XG0       |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

Now let's assume that instance 3 is checkpointing and is writing all dirty buffers
to disk.  At this time instances 3 will notify that master node that it is writing.
In turn the master will notify instances 1 and 2 that they can discard their past 
images and instance 3 will hold XL0.  Also note that instances 1 and 2 will write 
BWRs (block written redos) to its redo stream indicating that the block has already 
been written to disk and is not needed for recovery by this instance.

    ----------------      -----------------     -----------------  
    |  Instance 1   |     |  Instance 2   |    |   Instance 3   |
    |               |     |               |    |                |
    |   Lock Held   |     |   Lock Held   |    |   Lock Held    | 
    | on LENUM 123: |     | on LENUM 123: |    | on LENUM 123:  | 
    |               |     |               |    |      XL0       |
    |               |     |               |    |                |
    ----------------      -----------------     -----------------  

RELATED DOCUMENTS
-----------------

Note 144152.1 - Understanding 9i Real Application Clusters Cache Fusion Recovery
Note 139435.1 - Fast Reconfiguration in 9i Real Application Clusters

你可能感兴趣的:(Understanding 9i Real Application Clusters Cache Fusion (Doc ID 139436.1))