RAC的HA是它的一大卖点,Oracle也常那它来宣传,但实际上,RAC称不上真正的HA,因为它是share-disk的架构,只能做到实例级的HA。RAC实例级HA的技术基础是Failover,它是指集群中任何一个节点的故障都不会影响用户的正常使用,之前连接在故障节点上的用户会被自动转移到健康节点,这样的切换对用户来说是透明的。
RAC的Failover可以细分为以下3种:
1)Client-Side Connect Time Failover
2)TAF (Transparent Application Failover)
3)Service - Side TAF
下面分别来看看这3种Failover有什么不一样:
1.定义
Client-Side Connect Time Failover是指:如果客户端tnsname.ora中配置了多个地址,用户发送连接请求时,会先尝试连接第一个地址,如果失败,则继续尝试第二个地址,直至成功或遍历尝试所有地址为止。
这种Failover方式只是在发起连接时刻起作用,一旦连接成功之后,如果节点出现故障,连接不会自动Failover到其它可用节点。从客户端的角度来看:会话断开,用户程序必须重新建立连接。
2. 配置
在客户端的tnsname.ora中添加FAILOVER=ON条目,该参数默认为ON,即客户端默认配置Failover。
3.测试
测试环境:两节点的RAC (O01RCD0A,O01RCD0B)
1)配置客户端tnsnames.ora
编辑客户端tnsnames.ora如下:
O01RCD0 =
(DESCRIPTION =
(failover = on)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST =wrong)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = drcdd0rb)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVICE_NAME = O01RCD0.world)
)
)
)
故意写错第一个地址。
2)测试能否连接数据库
可正常连接数据库,连接的实例是A:
A105024@O01RCD0>select instance_name from v$instance;
INSTANCE_NAME
----------------
O01RCD0A
3)在服务端kill服务进程模拟实例故障
查看对应的服务进程的OS id:
A105024@O01RCD0>select pid,spid from v$process where addr in (select paddr from v$session where username ='A105024');
PID SPID
---------- ------------
38 14408
在OS上杀掉该进程:
kill -9 14408
过一会儿,再查询时,发现连接已经断开:
A105024@O01RCD0>select instance_name from v$instance;
select instance_name from v$instance
*
ERROR at line 1:
ORA-03114: not connected to ORACLE
1.定义
从上面对Client-Side Connect Time Failover特点分析可以看出,这种Failover的意义有限,因为一旦建立连接之后,Failover的功能就无法发挥,所以Oracle又引入了新的Failover机制——TAF。所谓TAF,就是连接建立以后,如果实例发生故障,连接会自动迁移到其他健康实例上。对于应用程序来说,这个迁移过程是透明、自动的,不需要人为干预,但是这个透明也不是绝对的,因为用户未提交的事务会回滚。
2.配置
在客户端的tnsnames添加FAILOVER_MODE属性即可,该属性有以下4个子属性:
1)METHOD:有BASIC和PRECONNECT可选值
2)TYPE:有SESSION和SELECT可选值
3)DELAY
4)RETRIES
以上几个参数的含义可查看oracle的reference
3.测试
测试环境:两节点的RAC (O01RCD0A,O01RCD0B)
1)配置客户端tnsnames.ora
编辑客户端tnsnames.ora如下:
O01RCD0 =
(DESCRIPTION =
(failover = on)
(enable = broken)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = drcdd0ra)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = drcdd0rb)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVICE_NAME = O01RCD0.world)
(failover_mode =
(type = select)
(method = basic)
(retries = 10)
(delay = 5)
)
)
)
2)连接数据库,查看连接到哪个实例
A105024@O01RCD0>select INSTANCE_NAME,status from v$instance;
INSTANCE_NAME STATUS
---------------- ------------
O01RCD0A OPEN
目前用户连接的是A实例,查看用户连接的TAF配置:
A105024@O01RCD0>select username,failover_type,failover_method from v$session where username='A105024';
USERNAME FAILOVER_TYPE FAILOVER_M
------------------------------ ------------- ----------
A105024 SELECT BASIC
3)在服务端kill掉相应的服务进程以拟实例故障
查出用户对应的OS服务进程id:
A105024@O01RCD0>select PID,SPID from v$process where ADDR in (select PADDR from v$session where username='A105024');
PID SPID
---------- ------------
49 14953
在OS上杀掉该进程:
kill -9 14953
4)稍等几秒,再次执行语句,发现已经自动切换到了B节点
A105024@O01RCD0>select INSTANCE_NAME,status from v$instance;
INSTANCE_NAME STATUS
---------------- ------------
O01RCD0B OPEN
1.定义
上面介绍的TAF是Client-Side,配置在客户端的tnsname.ora文件,如果有很多客户端,将不利于维护,容易出错,而这里介绍的Server-Side TAF结合Service,把所有的TAF配置都保存在数据字典中,这样客户端就无须再做配置了。
2.Service
1)创建一个新的service:
srvctl add service -d O01RCD0 -s taf_test -r O01RCD0A -a O01RCD0B -P basic
2)查看service是否创建成功:
srvctl config service -d O01RCD0 -s taf_test -a
3)启动该服务:
srvctl start service -d O01RCD0 -s taf_test
4)在数据库上修改Service的TAF配置
begin
dbms_service.modify_service(
service_name=>'taf_test',
failover_method=>dbms_service.failover_method_basic,
failover_type=>dbms_service.failover_type_select,
failover_retries=>180,
failover_delay=>5
);
end;
/
PL/SQL procedure successfully completed.
5) 确认新Service已在数据字典里存在
A105024@O01RCD0>select NAME,FAILOVER_METHOD,FAILOVER_TYPE,FAILOVER_DELAY from dba_services where NAME='taf_test';
NAME FAILOVER_METHOD FAILOVER_TYPE FAILOVER_DELAY
-------------------- --------------- --------------- --------------
taf_test BASIC SELECT 5
3.Listener的配置
我们需要新建一个listener监听该service:
TNS_TAF_TEST =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = drcdd0ra)(PORT = 1521))
)
)
SID_LIST_TNS_TAF_TEST =
(SID_LIST =
(SID_DESC =
(SID_NAME = taf_test)
(ORACLE_HOME = /usr/local/oracle/10.2.0-64)
)
)
4.客户端配置
客户端配置很简单,只要配置成通过service连接数据库即可。
编辑tnsnames.ora文件如下:
O01RCD0 =
(DESCRIPTION =
(failover = on)
(enable = broken)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = drcdd0ra)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = drcdd0rb)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVICE_NAME = taf_test)
)
)
)
5.测试
测试环境:两节点的RAC (O01RCD0A,O01RCD0B)
连接数据库,查看连接到哪个实例:
A105024@O01RCD0>select INSTANCE_NAME,status from v$instance;
INSTANCE_NAME STATUS
---------------- ------------
O01RCD0A OPEN
3)在服务端kill掉相应的服务进程以拟实例故障
查出用户对应的OS服务进程id:
A105024@O01RCD0>select PID,SPID from v$process where ADDR in (select PADDR from v$session where username='A105024');
PID SPID
---------- ------------
43 28563
在OS上杀掉该进程:
kill -9 28563
4)稍等几秒,再次执行语句,发现已经自动切换到了B节点
A105024@O01RCD0>select INSTANCE_NAME,status from v$instance;
A105024@O01RCD0>select INSTANCE_NAME,status from v$instance;
INSTANCE_NAME STATUS
---------------- ------------
O01RCD0B OPEN