一. TAF 分类
之前写了篇TAF的说明,参考:
Oracle Transparent Application Failover(TAF) 说明
http://blog.csdn.net/xujinyang/article/details/6831349
网友 lwd118 对DG 测试这块提了一些建议。看的也不是很清楚,所以去metalink上搜了一下。
Data Guard 9i Configuring Transparent Application Failover in a Data Guard Environment [ID 205637.1]
http://blog.csdn.net/xujinyang/article/details/6925073
在这篇文章里,把TAF分2种: connect time failover and application failover.
(1)Connect Time Failover
Connect time failover will reroute incoming connections to the instance that has just become primary. This type of failover should work in cases where the old primary node is down, old primary network is down, old primary listener is down, or old primary instance is now the standby.
When the old primary network is down, failover functionality is built into the basic layer of Oracle Net. We simply tcp timeout and fail to the next host in the list. Changing the tcp timeout parameters will determine the speed at which failover occurs. However, the basic configuration of connect time failover is not sufficient for the remaining failure scenarios.
Consider the following service name:
DGD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu2)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = DGD)
)
)
Using the above alias, the failover (graceful or forced will work correctly if the old primary instance/listener is down. However it does not work correctly for the switchover scenario. With switchover, the old primary is now the standby and the old standby is now the primary. When we issue the connection to the old primary node -- now running as a mounted standby -- we receive the following error:
ORA-01033: ORACLE initialization or shutdown in progress
-- 这种配置不适用于switchover,因为switchover之后,主库变成了mounted standby。 但是我们的连接还是会连接主库,这样进行连接时就会报如上错误。
This is expected behavior. Connect time failover is not programmed to failover on this error.
这种Failover下switchover的切换问题的解决方法:
We can solve this by setting following parameters in the init.ora files:
Primary init.ora: instance_name=DGD_P
Standby init.ora: instance_name=DGD_S
After a switchover, as the standby and primary databases are brought up, PMON will register the service_names AND the instance_name. We also must change the TNS service name to look for these values:
DGD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = hasunclu2)(PORT = 1521))
)
(CONNECT_DATA =
(SID=DGD_P)
(SERVICE_NAME = DGD)
)
)
Now when we connect to the old primary/new standby, we get the following error:
'ORA-12521 TNS:Listener could not resolve INSTANCE_NAME given in connect descriptor'
At this point the connection failsover to the second host in the address list. This final connection attempt succeeds as the proper instance_name (DGD_P) is present.
Note that the DBA must maintain two init.ora's to maintain the seperate instance_name values or alter parameter with the alter system command once the instance has opened.
这种解决方法就是通过指定不同的Instance_name。 主备库的实例名不一样, 当主备库发生切换之后,在客户端的连接里也修改相应的参数。 所以还是需要人工的参与。 所以switchover 的TAF 意义不并不很大。
还有,在这块,metalink上的是instance_name=sid. 其实这块应该是SID=instance_name。 在tnsnames.ora 文件中,只有SID 和service_name 这2个参数。 我用的是10g。 没有9i的环境进行测试。
个人的理解,既然是需要人工参与,直接修改service_name这个参数就可以了。 关键的是要理解这种思想,就是主备库的实例名不一样。 做DG 切换之后,通过修改监听的参数,来完成连接。 至于service_name还是SID,都可以。 只要保证修改之后能正常连接就可以了。
(2)Application Failover:
For application failover, all existing connections from the current primary must failover to the new primary. One of the biggest obstacles to overcome is the lag time from when the standby databse becomes the primary database.
-- 这种切换将存在的所有连接都转移到新的主库。 但是最大的问题是备库切换到主库的时间上。 解决方法就是通过设置参数多连几次。
Client connections should continue to retry the failover until the standby has been opened as the new production. This can be configured by having an alias similar to the following:
DGD_TAF=
(DESCRIPTION=
(address_list=
(load_balance=off)
(failover=on)
(ADDRESS=(PROTOCOL=TCP)(Host=hasunclu1)(Port=1521))
(ADDRESS=(PROTOCOL=TCP)(Host=hasunclu2)(Port=1521))
)
(CONNECT_DATA=
(SERVICE_NAME=DGD)
(FAILOVER_MODE=
(TYPE=session)
(METHOD=BASIC)
(RETRIES=180)
(DELAY =5)))
)
With this alias, TAF will try to failover to the second node in the address_list. If it cannot connect, it will wait five seconds and retry again. It will retry a total of 180 times. This delay will provide the DBA with enough time to perform a switchover or activate the standby as the new production. This timing can be adjusted to suit your environment and should be tested accordingly.
网友lwd118 还提到了LOAD_BALANCE 这个参数,在下面的文章中提到了这个参数:
Failover Connections for Data Guard Error with ORA-1033 [ID 461874.1]
http://blog.csdn.net/xujinyang/article/details/6925102
Error ORA-1033 is expected for connections when they attempt to connect to standby instance, because it is mounted and not open.
When LOAD_BALANCE is set, connections can load balance between the addresses in the net service name. When there are two description sections in a net service name, Oracle Net will load balance between them.
在上面配置Failover参数是是把LOAD_BALANCE这个参数off了。 这个参数默认是ON的。 所以必须显示的off。
设置OFF的作用,就是避免自动的负载均衡。 因为在监听的配置中有2个地址,原主库是mount standby的,原备库是open的。 这样,如果连主库,是没有问题的,如果连备库就会报错:
ORA-01033: ORACLE initialization or shutdown in progress.
我在TAF文章里的测试的过程很简单,就是把备库直接shutdown了。我的配置是参考Application Failover,不过没有禁用Load_balance, 也没有进行open的测试。所以网友lwd118提了一些建议。
三. 测试Application Failover 下的Switchover
从上面的分析来看,Application Failover更适用一点,下面就这种情况下做一个Switchover的测试。
3.1 先看一下Data Gard的情况:
主库:
SQL> select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
124
SQL> select open_mode from v$database;
OPEN_MODE
----------
READ WRITE
备库:
SQL> select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
124
SQL> select open_mode from v$database;
OPEN_MODE
----------
MOUNTED
3.2 客户端监听配置:
DG_SF =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.3.3)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.3.2)(PORT = 1521))
(LOAD_BALANCE = OFF)
(FAILOVER=on)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES=180)
(DELAY =5)
)
)
)
3.3 在客户端连接DG
C:/Users/Administrator.DavidDai>sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on 星期五 12月 17 14:35:55 2010
Copyright (c) 1982, 2010, Oracle. All rights reserved.
SQL> conn sys/oracle@dg_sf as sysdba;
已连接。
SQL> select db_unique_name,open_mode from v$database;
DB_UNIQUE_NAME OPEN_MODE
------------------------------ ----------
orcl_st READ WRITE
3.4 Switchover 切换
主库:
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
SESSIONS ACTIVE
SQL> Alter database commit to switchover to physical standby with session shutdown;
Database altered.
SQL> shutdown immediate
ORA-01507: database not mounted
ORACLE instance shut down.
SQL>
备库:
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
SWITCHOVER_STATUS
--------------------
TO PRIMARY
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
Database altered.
SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL>
启动主备库,注意顺序:先备后主。
3.5 在之前的session里继续查询
SQL> select db_unique_name,open_mode from v$database;
DB_UNIQUE_NAME OPEN_MODE
------------------------------ ----------
orcl_st MOUNTED
3.6 新开窗口连接
C:/Users/Administrator.DavidDai>sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on 星期日 12月 19 19:05:21 2010
Copyright (c) 1982, 2010, Oracle. All rights reserved.
SQL> conn sys/oracle@dg_sf as sysdba;
已连接。
SQL> select db_unique_name,open_mode from v$database;
DB_UNIQUE_NAME OPEN_MODE
------------------------------ ----------
orcl_st MOUNTED
把原主库的监听停掉:
[oracle@dg2 bdump]$ lsnrctl stop
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 18-DEC-2010 03:21:58
Copyright (c) 1991, 2005, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dg2)(PORT=1521)))
The command completed successfully
注意:这里停数据库是不行的,必须要停监听。
在查:
C:/Users/Administrator.DavidDai>sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on 星期日 12月 19 19:02:57 2010
Copyright (c) 1982, 2010, Oracle. All rights reserved.
SQL> conn sys/oracle@dg_sf as sysdba;
已连接。
SQL> select db_unique_name,open_mode from v$database;
DB_UNIQUE_NAME OPEN_MODE
------------------------------ ----------
orcl_pd READ WRITE
SQL>
把主库的监听停掉之后,客户端就切换到备库去了。 这里就像lwd118 网友说的那样, 对于switchover,有问题。 这个问题的解决方法在 Connect Time Failover中已经做了说明,就是在主备库里分别指定instance_name。 然后在客户端的监听里也修改相关的参数。
但是,如果主备库的instance_name 不同的话,每次切换的话,都需要修改客户端的配置。也不方便。 由此可见,TAF 对Failover的意义更大。
------------------------------------------------------------------------------