基于corosync+pacemaker的高可用集群


          主机名                    IP地址
主机1  node1.wang.com   网卡0:192.168.1.10/24
          网卡1:172.16.1.1/24
主机2  node2.wang.com   网卡0:192.168.1.11/24
          网卡1:172.16.1.2/24
一、环境设置

1、使用SETUP命令配置IP
   [root@localhost ~]#setup                       #使用图形配置界面
   配置完成后使用如下命令重启服务
   [root@localhost ~]# service network restart    #重启网络服务
   [root@localhost ~]# ifconfig                   #查看设置是否正确

2、主机1上修改主机名
   [root@localhost ~]# sed -i 's@^\(HOSTNAME=\).*@   

\1node1.wang.com@g' /etc/sysconfig/network     #使用sed命令修改
   [root@localhost ~]# cat !$                     #查看修改结果
   cat /etc/sysconfig/network
   NETWORKING=yes
   NETWORKING_IPV6=no
   HOSTNAME=node1.wang.com
   
   [root@localhost ~]# hostname node1.wang.com    #使用hostname命令修改
   [root@localhost ~]# uname -n                   #查看修改后的主机名
   node1.wang.com
 
主机2上修改主机名
    [root@localhost ~]# sed -i 's@^\(HOSTNAME=\).*@\1node2.wang.com@g'  

    /etc/sysconfig/network
    [root@localhost ~]# cat !$
     cat /etc/sysconfig/network
     NETWORKING=yes
     NETWORKING_IPV6=no
     HOSTNAME=node2.wang.com
    [root@localhost ~]# hostname node2.wang.com
    [root@localhost ~]# uname -n
    node2.wang.com

说明:主机名修改完成后退出重新登录

3、在主机1和主机2上的/etc/hosts文件中添加如下两行
   [root@node1 ~]# vim /etc/hosts
   #添加两行
   192.168.1.11 node2.wang.com     node2
   192.168.1.10 node1.wang.com     node1
   [root@node2 ~]# cat /etc/hosts         #查看修改的结果


4、在主机1和主机2上都要做的如下操作
   以实现在node1和node2两个主机不需要密码ssh连接
   
   #生成密钥
   [root@node1 ha.d]# ssh-keygen -t rsa
   [root@node1 ha.d]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
   
   #测试连接
   [root@node1 ha.d]# ssh node2
   #退回原节点
   [root@node2 ~]# exit
   
   [root@node1 ~]# hwclock -s #同步下系统时间同硬件时间
   [root@node2 ~]# hwclock -s


二、准备安装

1、准备以下RPM安装包
cluster-glue-1.0.6-1.6.el5.i386.rpm       
heartbeat-libs-3.0.3-2.3.el5.i386.rpm  
pacemaker-libs-1.0.11-1.2.el5.i386.rpm
cluster-glue-libs-1.0.6-1.6.el5.i386.rpm  
libesmtp-1.0.4-5.el5.i386.rpm          
perl-TimeDate-1.16-5.el5.noarch.rpm
corosync-1.2.7-1.1.el5.i386.rpm           
openais-1.1.3-1.6.el5.i386.rpm         
resource-agents-1.0.4-1.1.el5.i386.rpm
corosynclib-1.2.7-1.1.el5.i386.rpm        
openaislib-1.1.3-1.6.el5.i386.rpm
heartbeat-3.0.3-2.3.el5.i386.rpm          
pacemaker-1.0.11-1.2.el5.i386.rpm

2、进行安装

在节点1上的操作
[root@node1 ~]# yum --nogpgcheck -y localinstall *.rpm
[root@node1 ~]# scp *.rpm node2:/root
[root@node1 ~]# ssh node2 'yum --nogpgcheck -y localinstall *.rpm'


3、配置

corosync的所有配置文件都在/etc/corosync目录下

[root@node1 ~]# cd /etc/corosync/          

#查看目录
[root@node1 corosync]# ls
amf.conf.example  corosync.conf.example  service.d  uidgid.d

#将corosync的配置样本拷贝一份并命令为corosync.conf
[root@node1 corosync]# cp corosync.conf.example corosync.conf

#检查操作是否正确
[root@node1 corosync]# ls
amf.conf.example  corosync.conf  corosync.conf.example  service.d  

uidgid.d

[root@node1 corosync]# vim corosync.conf

#将totem段做此修改
totem {
        version: 2
        secauth: on
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 172.16.1.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }

#添加以下两段内容
service {
   ver:0
   name:pacemaker
}
aisexec {
        user:root
        group:root
}

#创建日志存放目录
[root@node1 corosync]# mkdir /var/log/cluster
[root@node1 corosync]# ls /var/log/


[root@node1 corosync]# corosync-keygen

[root@node1 corosync]# ll
total 28
-rw-r--r-- 1 root root 5384 Jul 28  2010 amf.conf.example
-r-------- 1 root root  128 Dec 18 22:30 authkey
-rw-r--r-- 1 root root  624 Dec 18 22:21 corosync.conf
-rw-r--r-- 1 root root  436 Jul 28  2010 corosync.conf.example
drwxr-xr-x 2 root root 4096 Jul 28  2010 service.d
drwxr-xr-x 2 root root 4096 Jul 28  2010 uidgid.d

#将node1上的配置文件和认证文件拷贝到node2上
[root@node1 corosync]# scp -p authkey corosync.conf 

node2:/etc/corosync/

#在节点2上创建存放日志的目录
[root@node1 corosync]# ssh node2 'mkdir /var/log/cluster'

三、启动服务

1、先在节点1上启动服务
[root@node1 corosync]# service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]

2、查看日志信息
[root@node1 corosync]# grep -e "Corosync Cluster Engine" -e 

"configuration file" /var/log/cluster/corosync.log 
Dec 18 22:38:16 corosync [MAIN  ] Corosync Cluster Engine ('1.2.7'): 

started and ready to provide service.
Dec 18 22:38:16 corosync [MAIN  ] Successfully read main configuration 

file '/etc/corosync/corosync.conf'.

#查看心跳传递信息是否正常
[root@node1 corosync]# grep TOTEM /var/log/cluster/corosync.log 
Dec 18 22:38:16 corosync [TOTEM ] Initializing transport (UDP/IP).
Dec 18 22:38:16 corosync [TOTEM ] Initializing transmit/receive 

security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 18 22:38:17 corosync [TOTEM ] The network interface [172.16.1.1] is 

now up.
Dec 18 22:38:21 corosync [TOTEM ] Process pause detected for 1026 ms, 

flushing membership messages.
Dec 18 22:38:21 corosync [TOTEM ] A processor joined or left the 

membership and a new membership was formed.
Dec 18 22:38:21 corosync [TOTEM ] A processor failed, forming new 

configuration.
Dec 18 22:38:21 corosync [TOTEM ] A processor joined or left the 

membership and a new membership was formed.

#查看是否有比较严重的错误报告(以下报告可以忽略)
[root@node1 corosync]# grep ERROR: !$
grep ERROR: /var/log/cluster/corosync.log
Dec 18 22:39:26 node1.wang.com pengine: [4644]: ERROR: 

unpack_resources: Resource start-up disabled since no STONITH resources 

have been defined
Dec 18 22:39:26 node1.wang.com pengine: [4644]: ERROR: 

unpack_resources: Either configure some or disable STONITH with the 

stonith-enabled option
Dec 18 22:39:26 node1.wang.com pengine: [4644]: ERROR: 

unpack_resources: NOTE: Clusters with shared data need STONITH to 

ensure data integrity
Dec 18 22:39:26 node1.wang.com pengine: [4644]: ERROR: 

unpack_resources: Resource start-up disabled since no STONITH resources 

have been defined
Dec 18 22:39:26 node1.wang.com pengine: [4644]: ERROR: 

unpack_resources: Either configure some or disable STONITH with the 

stonith-enabled option
Dec 18 22:39:26 node1.wang.com pengine: [4644]: ERROR: 

unpack_resources: NOTE: Clusters with shared data need STONITH to 

ensure data integrity

#查看pacemarker的相关日志信息
[root@node1 corosync]# grep pcmk_startup !$
grep pcmk_startup /var/log/cluster/corosync.log
Dec 18 22:38:19 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Dec 18 22:38:19 corosync [pcmk  ] Logging: Initialized pcmk_startup
Dec 18 22:38:19 corosync [pcmk  ] info: pcmk_startup: Maximum core file 

size is: 4294967295
Dec 18 22:38:19 corosync [pcmk  ] info: pcmk_startup: Service: 9
Dec 18 22:38:20 corosync [pcmk  ] info: pcmk_startup: Local hostname: 

node1.wang.com

3、启动节点2上的corosync服务
[root@node1 corosync]# ssh node2 '/etc/init.d/corosync start'
Starting Corosync Cluster Engine (corosync): [  OK  ]

在节点2上做同节点1同样的日志查看操作

4、查看资源信息

[root@node1 corosync]# crm_mon
#显示如下内容
============
Last updated: Sun Dec 18 22:59:41 2011
Stack: openais
Current DC: node1.wang.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.wang.com node2.wang.com ]

#ctrl+c退出此界面


5、安装httpd服务

[root@node1 corosync]# yum install -y httpd
[root@node1 corosync]# echo "web111" > /var/www/html/index.html
[root@node1 corosync]# service httpd restart
#使用浏览器测试网页
#一定要关闭服务并开机不能启动
[root@node1 corosync]# service httpd stop
Stopping httpd:                                            [  OK  ]
[root@node1 corosync]# chkconfig httpd off


[root@node2 ~]# yum install -y httpd
[root@node2 ~]# echo "web222" > /var/www/html/index.html
[root@node2 ~]# service httpd restart
#使用浏览器测试网页
#一定要关闭服务并开机不能启动
[root@node2 ~]# service httpd stop
Stopping httpd:                                            [  OK  ]
[root@node2 ~]# chkconfig httpd off


6、使用crm工具配置资源

[root@node1 corosync]# crm

1、禁用stonith设备

#进入全局配置模式
crm(live)# configure

#禁用stonith设备
crm(live)configure# property stonith-enabled=false

#commit是让先前做的操作生效
crm(live)configure# commit

#验证操作是否生效
crm(live)configure# show
node node1.wang.com
node node2.wang.com
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"

#此时再进行操作就不会报错了
crm(live)configure# verify
#退出crm工具
crm(live)configure# exit   

2、添加资源

查看支持的资源代理的类别
[root@node1 corosync]# crm
crm(live)# ra
crm(live)ra# help
crm(live)ra# classes

#显示资源代理的类别
heartbeat
lsb
ocf / heartbeat pacemaker
stonith

#使用list命令来查看某种类别的资源代理
crm(live)ra# list lsb

#使用meta或info命令查看某个类别下的某个子类的某个命令的选项有那些
crm(live)ra# meta ocf:heartbeat:IPaddr

#切换回父目录
crm(live)ra# cd
#进入配置模式
crm(live)# configure

#配置VIP 名字为WebIP ip为 192.168.1.98
crm(live)configure# primitive WebIP ocf:heartbeat:IPaddr params 

ip=192.168.1.98
#查看信息
crm(live)configure# show
node node1.wang.com
node node2.wang.com
primitive WebIP ocf:heartbeat:IPaddr \
params ip="192.168.1.98"
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"

#提交配置信息
crm(live)configure# commit

#查看状态信息

crm(live)# status 
============
Last updated: Mon Dec 19 00:20:18 2011
Stack: openais
Current DC: node1.wang.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1.wang.com node2.wang.com ]

 WEBIP (ocf::heartbeat:IPaddr): Started node1.wang.com


#在节点1上查看VIP是否在此节点上

[root@node1 ~]# ifconfig

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:AA:31:10  
          inet addr:192.168.1.98  Bcast:192.168.1.255  

Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:67 Base address:0x2000


#添加httpd资源
crm(live)# configure 
crm(live)configure# primitive WEBSERVER lsb:httpd
crm(live)configure# commit
#查看资源
crm(live)configure# show
node node1.wang.com
node node2.wang.com
primitive WEBIP ocf:heartbeat:IPaddr \
params ip="192.168.1.98"
primitive WEBSERVER lsb:httpd
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
查看资源运行状态
crm(live)# status 
============
Last updated: Mon Dec 19 00:28:08 2011
Stack: openais
Current DC: node1.wang.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ node1.wang.com node2.wang.com ]

 WEBIP (ocf::heartbeat:IPaddr): Started node1.wang.com
 WEBSERVER (lsb:httpd): Started node2.wang.com

通过查看源源运行状态我们发现两个资源没有在同一个服务器上所以我们要添加约束

crm(live)# configure 
#添加组WEBSERVICE
crm(live)configure# group WEBSERVICE WEBIP WEBSERVER
crm(live)configure# commit
#查看资源
crm(live)configure# show
node node1.wang.com
node node2.wang.com
primitive WEBIP ocf:heartbeat:IPaddr \
params ip="192.168.1.98"
primitive WEBSERVER lsb:httpd
group WEBSERVICE WEBIP WEBSERVER
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
#查看资源运行状态是不是在同一个节点上
crm(live)# status 
============
Last updated: Mon Dec 19 00:35:31 2011
Stack: openais
Current DC: node1.wang.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1.wang.com node2.wang.com ]

 Resource Group: WEBSERVICE
     WEBIP (ocf::heartbeat:IPaddr): Started node1.wang.com
     WEBSERVER (lsb:httpd): Started node1.wang.com

使用浏览器查看测试

资源转移

[root@node2 ~]# ssh node1 'service corosync stop'
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:....[  OK  ]
[root@node2 ~]# crm status
============
Last updated: Mon Dec 19 00:44:04 2011
Stack: openais
Current DC: node2.wang.com - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node2.wang.com ]
OFFLINE: [ node1.wang.com ]

#WITHOUT quorum说明法定票数不够,默认做负载平担,所以一个节点启动一个资源
#忽略法定票数
[root@node2 ~]# crm
crm(live)# configure 
INFO: building help index
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit 
crm(live)configure# show
node node1.wang.com
node node2.wang.com
primitive WEBIP ocf:heartbeat:IPaddr \
params ip="192.168.1.98"
primitive WEBSERVER lsb:httpd
group WEBSERVICE WEBIP WEBSERVER
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"

crm(live)# status 
============
Last updated: Mon Dec 19 00:46:25 2011
Stack: openais
Current DC: node2.wang.com - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node2.wang.com ]
OFFLINE: [ node1.wang.com ]

 Resource Group: WEBSERVICE
     WEBIP (ocf::heartbeat:IPaddr): Started node2.wang.com
     WEBSERVER (lsb:httpd): Started node2.wang.com

使用浏览器进行测试


#启动节点1上的服务查看服务是否又流动回去
[root@node2 ~]# ssh node1 'service corosync start'
Starting Corosync Cluster Engine (corosync): [  OK  ]
[root@node2 ~]# crm status
============
Last updated: Mon Dec 19 00:49:19 2011
Stack: openais
Current DC: node2.wang.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1.wang.com node2.wang.com ]

 Resource Group: WEBSERVICE
     WEBIP (ocf::heartbeat:IPaddr): Started node1.wang.com
     WEBSERVER (lsb:httpd): Started node1.wang.com

#因为服务第一次启动是在节点一上所以服务倾向于节点1为了避免服务来回流动所以我们定义资源粘性值或约束

#定义资源粘性值
crm(live)# configure 
crm(live)configure# rsc_defaults resource-stickiness=100
crm(live)configure# show
node node1.wang.com
node node2.wang.com
primitive WEBIP ocf:heartbeat:IPaddr \
params ip="192.168.1.98"
primitive WEBSERVER lsb:httpd
group WEBSERVICE WEBIP WEBSERVER
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

然后停止节点1上的corosync服务查看资源状态,在启动节点1上corosync服务查看资源的状态


到此处利用corosync所做的高可用集群基本上完成


本文出自 “成功每一天” 博客,请务必保留此出处http://1567045.blog.51cto.com/1557045/777966

你可能感兴趣的:(集群,高可用,职场,休闲,pacemaker,corosync)