Oracle RAC FailOver配置
Oracle RAC主要为数据库的应用提供了HA(High Available)的环境,HA体现在负载均衡(loadbalance)和容错(failover)两个方面。
Oracle RAC 的Failover 可以分为2种:
1. TAF(Client_side)
2. Service-Side TAF
一. TAF(Transparent Application Failover)
参考:
10g & 11g Configuration of TAF(Transparent Application Failover) and Load Balancing [ID 453293.1]
所谓TAF,就是连接建立以后,应用系统运行过程中,如果某个实例发生故障,连接到这个实例上的用户会被自动迁移到其他的健康实例上。对于应用程序而言,这个迁移过程是透明的,不需要用户的介入,当然,这种透明要是有引导的,因为用户的未提交事务会回滚。 相对与Client-Side Connect Time Failover的用户程序中断,抛出连接错误,用户必须重启应用程序,TAF 这种方式在提高HA上有了很大的进步。
TAF 的配置也很简单,只需要在客户端的tnsnames.ora中添加FAILOVER_MODE配置项。这个条目有4个子项目需要定义。
1. METHOD: 用户定义何时创建到其实例的连接,有BASIC 和 PRECONNECT 两种可选值。
BASIC: 是指在感知到节点故障时才创建到其他实例的连接。(时间换资源)
PRECONNECT: 是在最初建立连接时就同时建立到所有实例的连接,当发生故障时,立刻就可以切换到其他链路上。(资源换时间)
两种方法比较: BASIC方式在Failover时会有时间延迟,PRECONNECT方式虽然没有时间延迟,但是建立多个冗余连接会消耗更多资源,两者就是是用时间换资源和用资源换时间的区别。
2. TYPE: 用于定义发生故障时对完成的SQL 语句如何处理,其中有2种类型:session 和select.
这2种方式对于未提交的事务都会自动回滚,区别在于对select 语句的处理,对于select,用户正在执行的select语句会被转移到新的实例上,在新的节点上继续返回后续结果集,而已经返回的记录集则抛弃。
假设用户正在节点1上执行查询,整个结果集共有1000条记录,现在已从节点1上返回100条记录,这时节点1宕机,用户连接被转移到节点2上,如果是session模式,则需要重新执行查询语句;如果是select方式,会从节点2上继续返回剩下的900条记录,而已经从节点1返回的100条记录不会重复返回给用户,对于用户而言,感受不到这种切换。
显然为了实现select 方式,Oracle 必须为每个session保存更多的内容,包括游标,用户上下文等,需要更多的资源也是用资源换时间的方案。
3. DELAY 和 RETRIES: 这2个参数分别代表重试间隔时间和重试次数
案例1:
tnsnames.ora: (METHOD = basic)
PROD = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.8.13)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.8.14)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod) (FAILOVER_MODE = (TYPE = session) (METHOD = basic) (RETRIES = 180) (DELAY = 5) ) ) )
客户端连接测试:
将Instance 1 shutdown
[oracle@node1 ~]$ srvctl stop instance -d prod -i prod1
验证客户端failover:
执行查询出现错误,再次执行查询成功
现在session切换到instance 2.
案例2:
tnsnames.ora: (METHOD = preconnect)
PROD_RAC1 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.8.13)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod) (FAILOVER_MODE = (TYPE = select) (METHOD = preconnect) (BACKUP = PROD_RAC2) ) ) ) PROD_RAC2 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.8.14)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod) (FAILOVER_MODE = (TYPE = select) (METHOD = preconnect) (BACKUP = PROD_RAC1) ) ) )
客户端连接测试:
将Instance 1shutdown
[oracle@node1 ~]$ srvctl stop instance -d prod -i prod1
验证客户端failover:
在客户端切换到新的节点时,实现了无缝切换!
二、Service-Side TAF
Service-Side TAF的配置参考:
How To Configure Server Side Transparent Application Failover [ID 460982.1]
Service-SideTAF 可以看作是TAF的一种变种,首先Service-SideTAF也是TAF,所有TAF的特点它都有,其次这种TAF是在服务器上配置的,而不像TAF是在客户端配置的。
Client-Side TAF 是在客户端修改tnsnames.ora 文件来配置的,如果有很多客户端使用这个数据库,那么每次微笑调整都需要把所有的计算机更改一遍,既低效又容易出错。而Service-Side TAF 通过结合Service,在数据库里保存FAIL_MODE的配置,把所有的TAF配置保存在数据字典中,从而省去了客户端的配置工作,现在客户端的TNS文件就不需要任何TAF的配置选项了。
从配置参数而言,Service-Side TAF和TAF 相比多了一个Instance Role(实例角色)的概念。 所谓的实例角色,就是当有多个Instance 参与一个Service时,可以配置优先使用哪一个Instance为用户提供服务。用户共有两种可选角色。
PREFERRED:首选实例,会优先选择拥有这个角色的实例提供服务。
AVAILABLE: 后备实例,用户连接会优先连接PREFFERRED的Instance,当PREFERRED的Instance不可用时,才会被转到AVAILBALE的Instance上。
要使用Server-Side TAF必须配置Service。 Service 可以在创建数据库时创建,也可以在创建数据库之后修改,既可以使用dbca 配置向导,也可以用命令行的 方式配置。
用srvctl 命令配置Service:
[oracle@node1 ~]$ srvctl add service -h Adds a service configuration to the Oracle Clusterware. Usage: srvctl add service -d <db_unique_name> -s <service_name> {-r "<preferred_list>" [-a "<available_list>"] [-P {BASIC | NONE | PRECONNECT}] | -g <server_pool> [-c {UNIFORM | SINGLETON}] } [-k <net_num>] [-l [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]] [-y {AUTOMATIC | MANUAL}] [-q {TRUE|FALSE}] [-x {TRUE|FALSE}] [-j {SHORT|LONG}] [-B {NONE|SERVICE_TIME|THROUGHPUT}] [-e {NONE|SESSION|SELECT}] [-m {NONE|BASIC}] [-z <failover_retries>] [-w <failover_delay>] -d <db_unique_name> Unique name for the database -s <service> Service name -r "<preferred_list>" Comma separated list of preferred instances -a "<available_list>" Comma separated list of available instances -g <pool_name> Server pool name -c {UNIFORM | SINGLETON} Service runs on every active server in the server pool hosting this service (UNIFORM) or just one server (SINGLETON) -k <net_num> VIP network number (default number is 1) -P {NONE | BASIC | PRECONNECT} TAF policy specification -l <role> Role of the service (primary, physical_standby, logical_standby, snapshot_standby) -y <policy> Management policy for the service (AUTOMATIC or MANUAL) -e <Failover type> Failover type (NONE, SESSION, or SELECT) -m <Failover method> Failover method (NONE or BASIC) -w <integer> Failover delay -z <integer> Failover retries -j <clb_goal> Connection Load Balancing Goal (SHORT or LONG). Default is LONG. -B <Runtime Load Balancing Goal> Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE) -x <Distributed Transaction Processing> Distributed Transaction Processing (TRUE or FALSE) -q <AQ HA notifications> AQ HA notifications (TRUE or FALSE) Usage: srvctl add service -d <db_unique_name> -s <service_name> -u {-r "<new_pref_inst>" | -a "<new_avail_inst>"} -d <db_unique_name> Unique name for the database -s <service> Service name -u Add a new instance to service configuration -r <new_pref_inst> Name of new preferred instance -a <new_avail_inst> Name of new available instance -h Print usage [oracle@node1 ~]$
案例3:
[oracle@node1 ~]$ srvctl add service -d prod -s TAF -r "prod1,prod2" -P BASIC
[oracle@node1 ~]$ srvctl start service -d prod -s TAF
[oracle@node1 ~]$ srvctl config service -d prod -a Service name: TAF Service is enabled Server pool: prod_TAF Cardinality: 2 Disconnect: false Service role: PRIMARY Management policy: AUTOMATIC DTP transaction: false AQ HA notifications: false Failover type: NONE Failover method: NONE TAF failover retries: 0 TAF failover delay: 0 Connection Load Balancing Goal: LONG Runtime Load Balancing Goal: NONE TAF policy specification: BASIC Preferred instances: prod1,prod2 Available instances:
修改service需要使用dbms_service:
16:39:31 SYS@ prod1>begin 16:39:34 2 dbms_service.modify_service( 16:39:57 3 service_name=>'TAF', 16:40:07 4 failover_method=>dbms_service.failover_method_basic, 16:40:45 5 failover_type=>dbms_service.failover_type_select, 16:41:07 6 failover_retries=>180, 16:41:26 7 failover_delay=>5); 16:41:40 8 end; 16:41:45 9 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.06 16:41:46 SYS@ prod1> 16:43:20 SYS@ prod1>COL NAME FOR A10 16:43:28 SYS@ prod1>COL FAILOVER_METHOD FOR A20 16:43:47 SYS@ prod1>col FAILOVER_TYPE for a20 16:43:54 SYS@ prod1>r 1 select name,failover_method,failover_type,goal,clb_goal from dba_services 2* where name='TAF' NAME FAILOVER_METHOD FAILOVER_TYPE GOAL CLB_G ---------- -------------------- -------------------- ------------ ----- TAF BASIC SELECT NONE LONG Elapsed: 00:00:00.00 16:36:08 SYS@ prod1>show parameter name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_file_name_convert string db_name string prod db_unique_name string prod global_names boolean FALSE instance_name string prod1 lock_name_space string log_file_name_convert string service_names string TAF
客户端验证:
tnsnames.ora:
TAF = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.8.13)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.8.14)(PORT = 1521)) ) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = taf) ) )
17:24:02 SYS@ prod2>select inst_id,sid,username from gv$session where username='SCOTT';
INST_ID SID USERNAME
---------- ---------- ------------------------------
1 33 SCOTT
将Instance 1 shutdown
[oracle@node1 ~]$ srvctl stop instance -d prod -i prod1
验证客户端failover:
17:25:01 SYS@ prod2>select inst_id,sid,username from gv$session where username='SCOTT';
INST_ID SID USERNAME
---------- ---------- ------------------------------
2 59 SCOTT
2 63 SCOTT
scott用户在instance 1被shutdown后切换到了instance 2,没有出现错误!
@至此,关于Oracle RAC FailOver的介绍就到此,欢迎批评指正!