Data Guard 9i Configuring Transparent Application Failover in a Data Guard Environment [ID 205637.1] |
||
|
||
|
Modified 19-OCT-2010 Type BULLETIN Status PUBLISHED |
|
PURPOSE
-------
When considering 9i Data Guard and the possible failure scenarios, we see that
the proper configuration for redirecting new and existing connections from the
failed instance to the new primary is crucial. The discussion below covers two
possible failover configurations: connect time failover and application
failover.
SCOPE & APPLICATION
-------------------
This document is intended to aid in the configuration of Connect Time Failover
and Transparent Application Failover in a Data Guard environment.
Connect Time Failover
----------------------
Connect time failover will reroute incoming connections to the instance that has
just become primary. This type of failover should work in cases where the old
primary node is down, old primary network is down, old primary listener is down,
or old primary instance is now the standby.
When the old primary network is down, failover functionality is built into the
basic layer of Oracle Net. We simply tcp timeout and fail to the next host in
the list. Changing the tcp timeout parameters will determine the speed at which
failover occurs. However, the basic configuration of connect time failover is
not sufficient for the remaining failure scenarios.
Consider the following service name:
DGD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu2)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = DGD)
)
)
Using the above alias, the failover (graceful or forced will work correctly if
the old primary instance/listener is down. However it does not work correctly
for the switchover scenario. With switchover, the old primary is now the
standby and the old standby is now the primary. When we issue the connection
to the old primary node -- now running as a mounted standby -- we receive the
following error:
ORA-01033: ORACLE initialization or shutdown in progress
This is expected behavior. Connect time failover is not programmed to failover
on this error.
We can solve this by setting following parameters in the init.ora files:
Primary init.ora: instance_name=DGD_P
Standby init.ora: instance_name=DGD_S
After a switchover, as the standby and primary databases are brought up, PMON
will register the service_names AND the instance_name. We also must change the
TNS service name to look for these values:
DGD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu2)(PORT = 1521))
)
(CONNECT_DATA =
(INSTANCE_NAME=DGD_P)
(SERVICE_NAME = DGD)
)
)
Now when we connect to the old primary/new standby, we get the following error:
'ORA-12521 TNS:Listener could not resolve INSTANCE_NAME given in connect descriptor'
At this point the connection fails over to the second host in the address list.
This final connection attempt succeeds as the proper instance_name (DGD_P) is
present.
Note that the DBA must maintain two init.ora's to maintain the seperate
instance_name values or alter parameter with the alter system command once the
instance has opened.
Application Failover:
----------------------
For application failover, all existing connections from the current primary
must failover to the new primary. One of the biggest obstacles to overcome is
the lag time from when the standby databse becomes the primary database.
Client connections should continue to retry the failover until the standby has
been opened as the new production. This can be configured by having an alias
similar to the following:
DGD_TAF=
(DESCRIPTION=
(address_list=
(load_balance=off)
(failover=on)
(ADDRESS=(PROTOCOL=TCP)(Host=hasunclu1)(Port=1521))
(ADDRESS=(PROTOCOL=TCP)(Host=hasunclu2)(Port=1521))
)
(CONNECT_DATA=
(SERVICE_NAME=DGD)
(instance_name=DGD_P)
(FAILOVER_MODE=
(TYPE=session)
(METHOD=BASIC)
(RETRIES=180)
(DELAY =5)))
)
With this alias, TAF will try to failover to the second node in the address_list.
If it cannot connect, it will wait five seconds and retry again. It will retry
a total of 180 times. This delay will provide the DBA with enough time to
perform a switchover or activate the standby as the new production.
This timing can be adjusted to suit your environment and should be tested
accordingly.
RELATED DOCUMENTS
-------------
------------------------------------------------------------------------------