利用Oracle CRS搭建应用的高可用集群

前言:CRS的简介和由来

从Oracle 10gR1 RAC 开始,Oracle推出了自身的集群软件,这个软件的名称叫做Oracle Cluster Ready Service(Oracle集群就绪服务),简称CRS。从Oracle 10gR2开始,包括最新的11g,Oracle将其更名为Clusterware(集群件),但通常意义上我们认为CRS = Clusterware = Oracle Cluster Ready Service = Oracle Cluster Software.

CRS一般用来搭建Oracle的并行数据库,即RAC,但除了与RAC的接口之外,CRS还提供了一组高可用性的应用程序接口(API),用来搭建一般应用程序的高可用集群,即一般我们常说的双机热备,比如使用CRS实现MySQL的双机热备。
这种主备模式的双机热备还可以包括许多第三方的应用程序,比如虚拟IP、磁盘组、文件系统、MySQL数据库、Apache,或者单节点的Oracle实例,或者单节点的ASM,等等,都可以作为资源注册到CRS中去,由CRS来启动,关闭,监测应用程序的状态,还可以设置应用程序相互的依赖关系,保证多组资源正确的启动顺序。

本文就以保护单节点oracle实例为例,演示如何使用CRS来实现上述功能。使用的主要的软件有:Solaris 10u4, Oracle CRS 10.2.0.2 , Oracle RDMBS 10.2.0.3, VxVM 5.0 ,磁盘阵列型号是AMS1000。

系统拓扑图大致如下:
利用Oracle CRS搭建应用的高可用集群

主要操作步骤如下:

一、准备工作:软件安装,数据库创建

1.安装Solaris 10, Veritas Volume Manager ,安装过程略
2.创建用户组dba/oinstall,oracle用户,并修改相应profile和/etc/hosts文件
修改rhosts文件,配置oracle用户的对等连接;
连接心跳网线;
如果生产环境中推荐心跳网络使用千兆,推荐每台机器有两块网卡分别连接两个网络交换机;
在操作系统中启动心跳网卡;
准备共享磁盘,这里使用的是SAN环境下的共享盘ams_wms0_0098,大小为20G;

3.下面使用静默方式来安装Oracle集群软件、数据库软件
这种安装创建方式的优点是创建速度比较快,并且不需要运行图形界面,适合远程安装和建库;
缺点是不直观,需要手工编写响应文件;
这一步CRS的安装只需要在一个节点上做:
oracle@rac01$. ./clusterware/runInstaller -silent -responsefile /tmp/shahand/crs.rsp
crs.rsp 文件内容参考“五-7”部分

CRS的runInstaller运行完毕以后,要手工在两个节点上运行root.sh,
并要手工运行$CRS_HOME/cfgtoollogs/configToolAllCommands

检查CRS安装配置正确:

 
 
<!-- Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/ --> root @rac01 #crs_stat - t Name Type Target State Host -- ---------------------------------------------------------- ora....c01.gsd application ONLINE ONLINE rac01 ora....c01.ons application ONLINE ONLINE rac01 ora....c01.vip application ONLINE ONLINE rac01 ora....c02.gsd application ONLINE ONLINE rac02 ora....c02.ons application ONLINE ONLINE rac02 ora....c02.vip application ONLINE ONLINE rac02

检查设置了正确的心跳网络:
root@rac01 # $CRS_HOME/bin/oifcfg getif
e1000g0 10.198.88.0 global public
e1000g1 192.168.2.0 global cluster_interconnect

使用静默方式安装Oracle 数据库软件,这一步两个节点都要做:
./runInstaller -silent -responsefile /tmp/shahand/db.rsp
db.rsp 文件内容参考“五-8”部分

4.创建oracle数据库文件所需要的盘组、逻辑卷、文件系统、挂载文件系统并设置权限;
只需要在一个节点上做;

 
 
<!-- Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/ --> root @rac01 # vxdisksetup - i ams_wms0_0098 root @rac01 # vxdg init oradata12 ams_wms0_0098 root @rac01 # vxassist - g oradata12 make oradata 18G root @rac01 # vxedit - g oradata12 set user = oracle group = dba mode = 644 oradata root @rac01 # timex mkfs - F vxfs / dev / vx / rdsk / oradata12 / oradata version 7 layout 37748736 sectors, 18874368 blocks of size 1024 , log size 65536 blocks largefiles supported real 10.30 user 0.07 sys 0.04 root @rac01 # mount - F vxfs - o largefiles / dev / vx / dsk / oradata12 / oradata / oradata root @rac01 # chown oracle:dba / oradata root @rac01 # df - h / oradata / Filesystem size used avail capacity Mounted on / dev / vx / dsk / oradata12 / oradata 18G 70M 17G 1 % / oradata

5.静默方式创建oracle数据库,只需要一个节点上做:

 
 
<!-- Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/ --> oracle @rac01 $ dbca - silent - createDatabase - sid orcl - sysPassword sys - systemPassword sys / - datafileDestination / oradata - gdbName orcl - templateName General_Purpose.dbc Copying database files 1 % complete 3 % complete 11 % complete 18 % complete 26 % complete 37 % complete Creating and starting Oracle instance 40 % complete 45 % complete 50 % complete 55 % complete 56 % complete 60 % complete 62 % complete Completing Database Creation 66 % complete 70 % complete 73 % complete 85 % complete 96 % complete 100 % complete

手工检查数据库orcl的状态,可以登陆数据库select status from v$instance查看。

二、Oracle 集群软件资源的手工注册

1. 注销crs本身自带的ons、gsd、vip资源

root@rac01 # crs_stop -all
Attempting to stop `ora.rac01.gsd` on member `rac01`
Attempting to stop `ora.rac01.ons` on member `rac01`
Attempting to stop `ora.rac02.gsd` on member `rac02`
Attempting to stop `ora.rac02.ons` on member `rac02`
Stop of `ora.rac02.gsd` on member `rac02` succeeded.
Stop of `ora.rac02.ons` on member `rac02` succeeded.
Stop of `ora.rac01.gsd` on member `rac01` succeeded.
Stop of `ora.rac01.ons` on member `rac01` succeeded.
Attempting to stop `ora.rac01.vip` on member `rac01`
Attempting to stop `ora.rac02.vip` on member `rac02`
Stop of `ora.rac02.vip` on member `rac02` succeeded.
Stop of `ora.rac01.vip` on member `rac01` succeeded.

root@rac01 # crs_unregister ora.rac01.gsd
root@rac01 # crs_unregister ora.rac01.ons
root@rac01 # crs_unregister ora.rac01.vip
root@rac01 # crs_unregister ora.rac02.vip
root@rac01 # crs_unregister ora.rac02.ons
root@rac01 # crs_unregister ora.rac02.gsd
root@rac01 # crs_stat -t
CRS-0202: No resources are registered.

2.创建虚拟IP资源:

root@rac01 # crs_profile -create havip -t application -a /oracle/crs/bin/usrvip /
-o oi=e1000g0,ov=10.198.94.139,on=255.255.248.0
root@rac01 # crs_register havip
root@rac01 # crs_setperm havip -o root
root@rac01 # crs_setperm havip -u user:oracle:r-x
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ha_vip application 0/1 0/0 OFFLINE OFFLINE
root@rac01 # crs_start havip
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
havip application 0/1 0/0 ONLINE ONLINE rac01


3.准备控制其他资源启动、关闭、检查的脚本文件dg.sh/fs.sh/db.sh/lsnr.sh
这四个脚本文件内容参考“五-3/4/5/6”部分

对crs_profile命令中的选项和参数做简单说明:
(1) 选项-r定义了该资源所依赖的资源,在下面的例子中,资源oradata_mount启动时依赖于
disk_group先 启动,需要停止disk_group的时候必须先停止资源oradata_mount,
资源orcl_db的启动则同时依赖于oradata_mount/disk_group/havip/listener;
(2) 参数-o 包括:ci的意思是crs对资源状态的监测间隔(check interval),单位为秒;
ra : crs重启资源的尝试次数,RESTART_ATTEMPTS,次数到达以后将重新分配;
fi : 资源状态出现错误以后,crs的尝试间隔,FAILURE_INTERVAL,单位是秒;
ft : 资源状态出现错误以后,crs的尝试次数,FAILURE_THRESHOLD;
这些参数可以使用默认值,分别是60秒/1/0秒/0。
(3) 参数-a 是指ACTION_SCRIPT,参数值为资源启动、关闭、监测的脚本,脚本固定的三个参数为
start/stop/check;

管理数据库监听的部分:

修改$ORACLE_HOME/network/admin/listener.ora文件,
将其中(HOST = rac01 )部分修改成(HOST = 10.198.94.139 ) (虚拟IP地址)

crs_profile -create listener -t application -a /oracle/crs/crs/public/lsnr.sh -r havip -o /
ci=180,ra=6,ft=2,fi=12
crs_register listener
crs_setperm listener -o root
crs_setperm listener -u user:oracle:r-x
crs_start listener

管理磁盘组和逻辑卷的部分:

crs_profile -create disk_group -t application -a /oracle/crs/crs/public/dg.sh -r havip -o /
ci=180,ra=6,ft=2,fi=12
crs_register disk_group
crs_setperm disk_group -o root
crs_setperm disk_group -u user:oracle:r-x

注:本身磁盘组的启动并不依赖于虚拟IP的启动,这里之所以设置两者的依赖关系,
是为了防止虚拟IP在一个节点启动,而磁盘组在另外一个节点启动,造成资源不一致的情况出现。

管理文件系统的部分:

crs_profile -create oradata_mount -t application -a /oracle/crs/crs/public/fs.sh -r disk_group -o /
ci=180,ra=6,ft=2,fi=12
crs_register oradata_mount
crs_setperm oradata_mount -o root
crs_setperm oradata_mount -u user:oracle:r-x

管理数据库实例的部分:

crs_profile -create orcl_db -t application -a /oracle/crs/crs/public/db.sh -r /
"oradata_mount listener" -o ci=180,ra=6,ft=2,fi=12
crs_register orcl_db
crs_setperm orcl_db -o root
crs_setperm orcl_db -u user:oracle:r-x
crs_start orcl_db

4.确保脚本具有执行属性,并把public 和profile的内容拷到第二个节点上。
# chmod +x /oracle/crs/crs/public/*
# rcp -r -p /oracle/crs/crs/public/* rac02:/oracle/crs/crs/public/

5.启动所有的资源
下面可以看到,在crs启动和关闭资源的过程中,其顺序是按照前面定义的资源依赖关系进行的:

root@rac01 # crs_stop -all
Attempting to stop `orcl_db` on member `rac01`
Stop of `orcl_db` on member `rac01` succeeded.
Attempting to stop `oradata_mount` on member `rac01`
Stop of `oradata_mount` on member `rac01` succeeded.
Attempting to stop `disk_group` on member `rac01`
Stop of `disk_group` on member `rac01` succeeded.
Attempting to stop `listener` on member `rac01`
Stop of `listener` on member `rac01` succeeded.
Attempting to stop `havip` on member `rac01`
Stop of `havip` on member `rac01` succeeded.

root@rac01 # crs_start -all
Attempting to start `havip` on member `rac01`
Start of `havip` on member `rac01` succeeded.
Attempting to start `listener` on member `rac01`
Start of `listener` on member `rac01` succeeded.
Attempting to start `disk_group` on member `rac01`
Start of `disk_group` on member `rac01` succeeded.
Attempting to start `oradata_mount` on member `rac01`
Start of `oradata_mount` on member `rac01` succeeded.
Attempting to start `orcl_db` on member `rac01`
Start of `orcl_db` on member `rac01` succeeded.

检查资源状态是否正常:

oracle@rac01 $ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
disk_group application ONLINE ONLINE rac01
havip application ONLINE ONLINE rac01
listener application ONLINE ONLINE rac01
oradata_mount application ONLINE ONLINE rac01
orcl_db application ONLINE ONLINE rac01

三、Oracle 集群软件的资源的管理

1.如果需要修改资源的属性,使用crs_profile -update 选项;具体例子可以参见五-1的错误二;
2.如果资源的状态为UNKNOWN,要对该资源进行关闭,使用crs_stop的命令的时候需要加入-f参数;
3.使用crs_profile -print <resource_name> 来查看资源的属性情况,包括依赖关系等等,
同样也可以使用crs_stat -p <resource_name> 来实现;
4.关于CRS的日志:主要在$CRS_HOME/log/node_name目录下,但需要提醒的是,系统日志中也会有
比较重要的日志信息,比如Solaris下的/var/adm/messages、linux一般在/var/log/messages ,
HPUX则是/var/adm/syslog/syslog.log文件;
5.启动、关闭、和查看crs资源的命令分别为crs_start 和crs_stop 和crs_stat,
每个命令都可以使用-H参数得到相应语法;
也可以使用stvctl start nodeapps -n rac1命令;

四、对集群软件进行测试
1.手工切换节点:
在任意节点上依次使用下面的命令,以oracle或者root执行均可,只要配置正确的$PATH环境变量
crs_stop -all;
crs_start havip -c rac02;
crs_start listener -c rac02;
crs_start disk_group -c rac02;
crs_start oradata_mount -c rac02;
crs_start orcl_db -c rac02;
然后,登陆到rac02(现在为主节点)使用df -h /oradata 检查共享盘是否挂载,
使用ps -ef|grep ora_检查到oracle启动,检查后台报警日志中没有错误信息,

2.自动切换:

手工模拟主节点的故障情况:使用reboot命令
root@rac01 # reboot

Jan 8 14:53:57 rac01 reboot: [ID 662345 auth.crit] rebooted by root

从日志中看到备用节点rac02上的crs感应到了主节点的失败,并接管相关服务:

2008-01-08 14:30:33.929: [ CRSMAIN][1] Starting Threads
2008-01-08 14:30:33.929: [ CRSMAIN][1] CRS Daemon Started.
2008-01-08 14:52:18.777: [ CRSEVT][71] Processing member leave for rac01, incarnation: 2
2008-01-08 14:52:18.878: [ CRSEVT][71] Do failover for: rac01
2008-01-08 14:52:42.180: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:42.193: [ CRSRES][73] Attempting to start `disk_group` on member `rac02`
2008-01-08 14:52:45.722: [ CRSRES][73] Start of `disk_group` on member `rac02` succeeded.
2008-01-08 14:52:45.731: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:45.732: [ CRSRES][73] Attempting to start `oradata_mount` on member `rac02`
2008-01-08 14:52:45.986: [ CRSRES][73] Start of `oradata_mount` on member `rac02` succeeded.
2008-01-08 14:52:46.013: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:46.015: [ CRSRES][73] Attempting to start `orcl_db` on member `rac02`
2008-01-08 14:53:31.486: [ CRSRES][73] Start of `orcl_db` on member `rac02` succeeded.
2008-01-08 14:53:31.487: [ CRSEVT][71] Post recovery done evmd event for: rac01
2008-01-08 14:53:31.603: [ CRSEVT][75] Processing RecoveryDone

然后再登陆rac02,查看文件系统是否挂载,确认数据状态正常。

from:【IT168】

http://tech.it168.com/d/2008-01-31/200801311012828.shtml

你可能感兴趣的:(oracle)