Corosync的配置:
配置准备工作:准备两台机器,分布分别是node1.a.org ,node2.a.org ,相应的IP地址:192.168.0.3 ,192.168.0.134 ,安装集群服务apache的httpd服务:
一:编辑/etc/host文件加入以下内容:
192.168.0.134 node1.a.org node1
192.168.0.3 node2.a.org node2
1在node1, node2上用hostname命名或者直接编辑/etc/sysconfig/network文件更改主机名
2、设置两个节点基于密钥进行ssh通信
- node1:
- #ssh-keygen –t rsa
- #ssh-copy-id –I /root/.ssh/id_rsa.pub node2
- node2:
- #ssh-keygen –t rsa
- #ssh-copy-id –I /root/.ssh/id_rsa.pub node2
在node1, node2上安装apache服务,为了测试在node1上创建含’node1.a.org’的index.html文件,在node2上创建’node2.a.org’的index.html确保服务能启动,这里采用yum安装:
- #yum install httpd –y
- #chkconfig httpd stop
- #chkconfig httpd off
二:安装软件包:
libibverbs, librdmacm, lm_sensors, libtool-ltdl, openhpi-libs, openhpi, perl-TimeDate 1 将这些软件放在/root/cluster
- #cd /root/cluster
- #yum –y localinstall *.rpm –nogpgcheck
2编辑 配置corosync文件:
- # cp corosync.conf.example corosync.conf
- 在该文件中加入以下内容:
- service {
- ver: 0
- name: pacemaker
- }
- ai***ec {
- user: root
- group: root
- }
- 将bindnet addr该成:bindnet addr: 192.168.0.0
3 节点通信时生成认证密钥文件:
- #corosync-keygen
- #scp –p authkey node:/etc/corosync
- #mkdir /var/log/cluster
说明:以上操作是在node1节点中进行的,在节点node2上做相同的操作然后在node1节点上启动node2的服务:ssh node2 ‘/etc/init.d/corosync start’启动
验证启动corosync是否正常:
查看corosync引擎是否正常启动:
- # grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
- Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
- Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
- Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
- Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
- Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
- # grep TOTEM /var/log/messages
- Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
- Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
- Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [192.168.0.5] is now up.
- Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
- # grep ERROR: /var/log/messages | grep -v unpack_resources
查看pacemaker是否正常启动:
- # grep pcmk_startup /var/log/messages
- Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
- Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
- Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
- Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
- Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.a.org
配置集群服务:
为web集群创建一个ip地址资源:
- # crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=192.168.0.99
修改忽略quorum不能满足的集群状态检查:
- # crm configure property no-quorum-policy=ignore
为资源设置默认黏性值:
- # crm configure rsc_defaults resource-stickiness=100
- # crm configure property stonith-enabled=false
WebIP和WebSite可能会运行于不同节点的问题,通过以下解决
- # crm configure colocation website-with-ip INFINITY: WebSite WebIP
确保website在魔鬼节点启动前先启动webip
- # crm configure order httpd-after-ip mandatory: WebIP WebSite
设置约束
- # crm configure location prefer-node1 WebSite rule 200: node1
在node1,node2上启动corosync服务:
通过游览器访问192.168.0.99看是否有效果,然后任意停止一个服务在此访问验证:
到现在配置openais完成:
DRBD的配置
配置前,需要在node1,node2上添加一块硬盘并创建分区:
#fdisk /dev/sdb
安装软件包:
drbd共有两部分组成:内核模块和用户空间的管理工具。其中drbd内核模块代码已经整合进Linux内核2.6.33以后的版本中,因此,如果您的内核版本高于此版本的话,你只需要安装管理工具即可;否则,您需要同时安装内核模块和管理工具两个软件包,并且此两者的版本号一定要保持对应。下载这些软件并安装在node1,node2做相同的操作配置
- # yum -y --nogpgcheck localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm
配置drbd:
主要配置/etc/drbd.conf文件:
- # cp /usr/share/doc/drbd83-8.3.8/drbd.conf /etc
- 配置/etc/drbd.d/global-common.conf
- global {
- usage-count no;
- # minor-count dialog-refresh disable-ip-verification
- }
- common {
- protocol C;
- handlers {
- pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
- pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
- local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
- # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
- # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
- # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
- # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
- # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
- }
- startup {
- wfc-timeout 120;
- degr-wfc-timeout 120;
- }
- disk {
- on-io-error detach;
- fencing resource-only;
- }
- net {
- cram-hmac-alg "sha1";
- shared-secret "mydrbdlab";
- }
- syncer {
- rate 100M;
- }
- }
3、定义一个资源/etc/drbd.d/web.res,内容如下:
- resource web {
- on node1.a.org {
- device /dev/drbd0;
- disk /dev/sdb1;
- address 192.168.0.134:7789;
- meta-disk internal;
- }
- on node2.a.org {
- device /dev/drbd0;
- disk /dev/sdb1;
- address 192.168.0.3:7789;
- meta-disk internal;
- }
- }
初始化资源并启动服务:
#drbdad create-md web
在node1, node2上启动服务: # /etc/init.d/drbd start
设置主节点:
- # drbdsetup /dev/drbd0 primary –o
- 或
- # drbdadm -- --overwrite-data-of-peer primary web
创建文件系统,文件系统的挂载的primary节点进行:
- # mke2fs -j -L DRBD /dev/drbd0
- # mkdir /web
- # mount /dev/drbd0 /web
验证drbd:
在主节点(node1)上/web的文件中复制一些内容并设置为从服务然后:
- #umount /web
- #drbdadm secondary web
- 在node2上:drbdm primary web 设置为主节点
- #mount /dev/drbd0 /web
有关cororync,drdb的命令有关介绍
- corosync常用命令
- corosync-keygen 生成密钥
- crm status 查看集群状态
- crm_verify –L 检查集群是否出现故障
- 在ra模式下:classes显示资源的子类
- crm_attribute 修改某个或全局属性
- crm_node 修改跟节点有关命令
- crm_node –q 显示票数
cibadmin 集群配置修改工具
常用 –Q显示CIB文档 , -E 清空CIB内容, -R 修改替换CIB, -D 删除某个选项, -d清空所有资源
例:cibadmin –Q >/tem/qq.xml 修改qq.xml文件后在替换cibadmin –Q /tem/qq.xml
删除某个资源:crm(live)configure#edit 直接编辑,或在该模式下用delete删除,或cibadmin
crm_shadow
crm(live)configure ra# list ocf heartbeat 查看文件系统
资源约束:
- 位置:资源更乐意留在哪个节点上
- help location 查看帮助
- 例:location Web_on_node1 Web 500: node1.a.org
- 次序:定义资源的先后顺序
- help order 查看帮助
- 例:order WebServer_after_WebIP mandatory: WebServer:start WebIP
- 排序:是否能同时运行在两节点上
- help colocation 查看帮助
DRBD常用命令介绍:
- # drbd-overview 查看主从
- # cat /proc/drbd 查看启动状态
crm交互式模式介绍:
在shell中直接输入crm进入交互式:
- [root@node1 ~]# crm
- crm(live)# help 查看帮助
- This is the CRM command line interface program.
- Available commands:
- cib manage shadow CIBs
- resource resources management
- node nodes management
- options user preferences
- configure CRM cluster configuration
- ra resource agents information center
- status show cluster status
- quit,bye,exit exit the program
- help show help
- end,cd,up go back one level
- crm(live)#
在输入configure进入配置模式:
- crm(live)configure#
- crm(live)configure# cd 用来切换
- crm(live)# status 查看状态
- ============
- Last updated: Wed Sep 14 22:09:13 2011
- Stack: openais
- Current DC: node1.a.org - partition WITHOUT quorum
- Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
- 2 Nodes configured, 2 expected votes
- 2 Resources configured.
- ============
- Online: [ node1.a.org ]
- OFFLINE: [ node2.a.org ]
- Master/Slave Set: MS_Webdrbd
- Slaves: [ node1.a.org ]
- Stopped: [ webdrbd:1 ]
ra 可以查看资源代理类型:
- crm(live)configure ra# classes
- heartbeat
- lsb
- ocf / heartbeat linbit pacemaker
- stonith
在configure模式中配置完需要用commit提交才保存并能生效:
- crm node standby 在某个节点上执行将该节点将模拟故障
- crm node online让该节点重新上线
drbd+pacemaker配置:
drbd配置如上下面配置pacemaker:
- [root@node1 ~]# crm configure show
- node node1.a.org
- node node2.a.org
- property $id="cib-bootstrap-options" \
- dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
- cluster-infrastructure="openais" \
- expected-quorum-votes="2" \
- no-quorum-policy="ignore" \ 确保含有此项
- stonith-enabled="false" \ 确保含有此项
- [root@node1 ~]#
- [root@node1 ~]#
- [root@node1 ~]# /etc/init.d/drbd stop 将node1,node2的drbd关掉
- Stopping all DRBD resources: .
- [root@node1 ~]# chkconfig drbd off
配置drbd资源:
- ]# crm
- crm(live)# configure
- crm(live)configure# primitive webdrbd ocf:heartbeat:drbd params drbd_resource=web op monitor role=Master interval=50s timeout=30s op monitor role=Slave interval=60s timeout=30s
- WARNING: webdrbd: default timeout 20s for start is smaller than the advised 240
- WARNING: webdrbd: default timeout 20s for stop is smaller than the advised 100
- crm(live)configure# master MS_Webdrbd webdrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
- crm(live)configure# show webdrbd
- primitive webdrbd ocf:heartbeat:drbd \
- params drbd_resource="web" \
- op monitor interval="50s" role="Master" timeout="30s" \
- op monitor interval="60s" role="Slave" timeout="30s"
- crm(live)configure# show MS_Webdrbd
- ms MS_Webdrbd webdrbd \
- meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
- crm(live)configure#
在node2上查看主机是否成为primary节点:
- # drbdadm role web
- Primary/Secondary
为Primary节点上的web资源创建自动挂载的集群服务
才