利用Oracle CRS搭建应用的高可用集群(二)

二、Oracle 集群软件资源的手工注册

1. 注销crs本身自带的ons、gsd、vip资源

root@rac01 # crs_stop -all
Attempting to stop `ora.rac01.gsd` on member `rac01`
Attempting to stop `ora.rac01.ons` on member `rac01`
Attempting to stop `ora.rac02.gsd` on member `rac02`
Attempting to stop `ora.rac02.ons` on member `rac02`
Stop of `ora.rac02.gsd` on member `rac02` succeeded.
Stop of `ora.rac02.ons` on member `rac02` succeeded.
Stop of `ora.rac01.gsd` on member `rac01` succeeded.
Stop of `ora.rac01.ons` on member `rac01` succeeded.
Attempting to stop `ora.rac01.vip` on member `rac01`
Attempting to stop `ora.rac02.vip` on member `rac02`
Stop of `ora.rac02.vip` on member `rac02` succeeded.
Stop of `ora.rac01.vip` on member `rac01` succeeded.

root@rac01 # crs_unregister ora.rac01.gsd
root@rac01 # crs_unregister ora.rac01.ons
root@rac01 # crs_unregister ora.rac01.vip
root@rac01 # crs_unregister ora.rac02.vip
root@rac01 # crs_unregister ora.rac02.ons
root@rac01 # crs_unregister ora.rac02.gsd
root@rac01 # crs_stat -t
CRS-0202: No resources are registered.

2.创建虚拟IP资源:

root@rac01 # crs_profile -create havip -t application -a /oracle/crs/bin/usrvip /
-o oi=e1000g0,ov=10.198.94.139,on=255.255.248.0
root@rac01 # crs_register havip
root@rac01 # crs_setperm havip -o root
root@rac01 # crs_setperm havip -u user:oracle:r-x
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ha_vip application 0/1 0/0 OFFLINE OFFLINE
root@rac01 # crs_start havip
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
havip application 0/1 0/0 ONLINE ONLINE rac01


3.准备控制其他资源启动、关闭、检查的脚本文件dg.sh/fs.sh/db.sh/lsnr.sh
这四个脚本文件内容参考“五-3/4/5/6”部分

对crs_profile命令中的选项和参数做简单说明:
(1) 选项-r定义了该资源所依赖的资源,在下面的例子中,资源oradata_mount启动时依赖于
disk_group先 启动,需要停止disk_group的时候必须先停止资源oradata_mount,
资源orcl_db的启动则同时依赖于oradata_mount/disk_group/havip/listener;
(2) 参数-o 包括:ci的意思是crs对资源状态的监测间隔(check interval),单位为秒;
ra : crs重启资源的尝试次数,RESTART_ATTEMPTS,次数到达以后将重新分配;
fi : 资源状态出现错误以后,crs的尝试间隔,FAILURE_INTERVAL,单位是秒;
ft : 资源状态出现错误以后,crs的尝试次数,FAILURE_THRESHOLD;
这些参数可以使用默认值,分别是60秒/1/0秒/0。
(3) 参数-a 是指ACTION_SCRIPT,参数值为资源启动、关闭、监测的脚本,脚本固定的三个参数为
start/stop/check;

管理数据库监听的部分:

修改$ORACLE_HOME/network/admin/listener.ora文件,
将其中(HOST = rac01 )部分修改成(HOST = 10.198.94.139 ) (虚拟IP地址)

crs_profile -create listener -t application -a /oracle/crs/crs/public/lsnr.sh -r havip -o /
ci=180,ra=6,ft=2,fi=12
crs_register listener
crs_setperm listener -o root
crs_setperm listener -u user:oracle:r-x
crs_start listener

管理磁盘组和逻辑卷的部分:

crs_profile -create disk_group -t application -a /oracle/crs/crs/public/dg.sh -r havip -o /
ci=180,ra=6,ft=2,fi=12
crs_register disk_group
crs_setperm disk_group -o root
crs_setperm disk_group -u user:oracle:r-x

注:本身磁盘组的启动并不依赖于虚拟IP的启动,这里之所以设置两者的依赖关系,
是为了防止虚拟IP在一个节点启动,而磁盘组在另外一个节点启动,造成资源不一致的情况出现。

管理文件系统的部分:

crs_profile -create oradata_mount -t application -a /oracle/crs/crs/public/fs.sh -r disk_group -o /
ci=180,ra=6,ft=2,fi=12
crs_register oradata_mount
crs_setperm oradata_mount -o root
crs_setperm oradata_mount -u user:oracle:r-x

管理数据库实例的部分:

crs_profile -create orcl_db -t application -a /oracle/crs/crs/public/db.sh -r /
"oradata_mount listener" -o ci=180,ra=6,ft=2,fi=12
crs_register orcl_db
crs_setperm orcl_db -o root
crs_setperm orcl_db -u user:oracle:r-x
crs_start orcl_db

4.确保脚本具有执行属性,并把public 和profile的内容拷到第二个节点上。
# chmod +x /oracle/crs/crs/public/*
# rcp -r -p /oracle/crs/crs/public/* rac02:/oracle/crs/crs/public/

5.启动所有的资源
下面可以看到,在crs启动和关闭资源的过程中,其顺序是按照前面定义的资源依赖关系进行的:

root@rac01 # crs_stop -all
Attempting to stop `orcl_db` on member `rac01`
Stop of `orcl_db` on member `rac01` succeeded.
Attempting to stop `oradata_mount` on member `rac01`
Stop of `oradata_mount` on member `rac01` succeeded.
Attempting to stop `disk_group` on member `rac01`
Stop of `disk_group` on member `rac01` succeeded.
Attempting to stop `listener` on member `rac01`
Stop of `listener` on member `rac01` succeeded.
Attempting to stop `havip` on member `rac01`
Stop of `havip` on member `rac01` succeeded.

root@rac01 # crs_start -all
Attempting to start `havip` on member `rac01`
Start of `havip` on member `rac01` succeeded.
Attempting to start `listener` on member `rac01`
Start of `listener` on member `rac01` succeeded.
Attempting to start `disk_group` on member `rac01`
Start of `disk_group` on member `rac01` succeeded.
Attempting to start `oradata_mount` on member `rac01`
Start of `oradata_mount` on member `rac01` succeeded.
Attempting to start `orcl_db` on member `rac01`
Start of `orcl_db` on member `rac01` succeeded.

检查资源状态是否正常:

oracle@rac01 $ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
disk_group application ONLINE ONLINE rac01
havip application ONLINE ONLINE rac01
listener application ONLINE ONLINE rac01
oradata_mount application ONLINE ONLINE rac01
orcl_db application ONLINE ONLINE rac01

三、Oracle 集群软件的资源的管理

1.如果需要修改资源的属性,使用crs_profile -update 选项;具体例子可以参见五-1的错误二;
2.如果资源的状态为UNKNOWN,要对该资源进行关闭,使用crs_stop的命令的时候需要加入-f参数;
3.使用crs_profile -print <resource_name> 来查看资源的属性情况,包括依赖关系等等,
同样也可以使用crs_stat -p <resource_name> 来实现;
4.关于CRS的日志:主要在$CRS_HOME/log/node_name目录下,但需要提醒的是,系统日志中也会有
比较重要的日志信息,比如Solaris下的/var/adm/messages、linux一般在/var/log/messages ,
HPUX则是/var/adm/syslog/syslog.log文件;
5.启动、关闭、和查看crs资源的命令分别为crs_start 和crs_stop 和crs_stat,
每个命令都可以使用-H参数得到相应语法;
也可以使用stvctl start nodeapps -n rac1命令;

四、对集群软件进行测试
1.手工切换节点:
在任意节点上依次使用下面的命令,以oracle或者root执行均可,只要配置正确的$PATH环境变量
crs_stop -all;
crs_start havip -c rac02;
crs_start listener -c rac02;
crs_start disk_group -c rac02;
crs_start oradata_mount -c rac02;
crs_start orcl_db -c rac02;
然后,登陆到rac02(现在为主节点)使用df -h /oradata 检查共享盘是否挂载,
使用ps -ef|grep ora_检查到oracle启动,检查后台报警日志中没有错误信息,

2.自动切换:

手工模拟主节点的故障情况:使用reboot命令
root@rac01 # reboot

Jan 8 14:53:57 rac01 reboot: [ID 662345 auth.crit] rebooted by root

从日志中看到备用节点rac02上的crs感应到了主节点的失败,并接管相关服务:

2008-01-08 14:30:33.929: [ CRSMAIN][1] Starting Threads
2008-01-08 14:30:33.929: [ CRSMAIN][1] CRS Daemon Started.
2008-01-08 14:52:18.777: [ CRSEVT][71] Processing member leave for rac01, incarnation: 2
2008-01-08 14:52:18.878: [ CRSEVT][71] Do failover for: rac01
2008-01-08 14:52:42.180: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:42.193: [ CRSRES][73] Attempting to start `disk_group` on member `rac02`
2008-01-08 14:52:45.722: [ CRSRES][73] Start of `disk_group` on member `rac02` succeeded.
2008-01-08 14:52:45.731: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:45.732: [ CRSRES][73] Attempting to start `oradata_mount` on member `rac02`
2008-01-08 14:52:45.986: [ CRSRES][73] Start of `oradata_mount` on member `rac02` succeeded.
2008-01-08 14:52:46.013: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:46.015: [ CRSRES][73] Attempting to start `orcl_db` on member `rac02`
2008-01-08 14:53:31.486: [ CRSRES][73] Start of `orcl_db` on member `rac02` succeeded.
2008-01-08 14:53:31.487: [ CRSEVT][71] Post recovery done evmd event for: rac01
2008-01-08 14:53:31.603: [ CRSEVT][75] Processing RecoveryDone

然后再登陆rac02,查看文件系统是否挂载,确认数据状态正常。

1

你可能感兴趣的:(oracle)