项目拓扑图:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第1张图片

corosync 具体配置:

1.配置IP   setup

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第2张图片

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第3张图片

2.保证名称你能够相互解析:uname –r 必须相同

[root@www1 ~]# uname -rn
www1.gjp.com 2.6.18-164.el5

www1.gjp.com上的配置:

[root@gjp99 ~]# cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=yes
HOSTNAME=www1.gjp.com
[root@gjp99 ~]# hostname www1.gjp.com
[root@gjp99 ~]# hostname
www1.gjp.com

logout登出重新登陆即可!

3.保证系统时钟一致

[root@www1 ~]# hwclock -s
[root@www1 ~]# clock
Tue 23 Oct 2012 05:20:36 PM CST  -0.017990 seconds

4.修改hosts(代替dns)

[root@www1 ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1   localhost.localdomain  localhost
::1        localhost6.localdomain6 localhost6
192.168.2.1     www1.gjp.com    www1
192.168.2.2     www2.gjp.com    www2

[root@www1 ~]# ping www2.gjp.com
PING www2.gjp.com (192.168.2.2) 56(84) bytes of data.
64 bytes from www2.gjp.com (192.168.2.2): icmp_seq=1 ttl=64 time=3.45 ms
64 bytes from www2.gjp.com (192.168.2.2): icmp_seq=2 ttl=64 time=0.658 ms

名称已经能够相互解析!

5. 挂载光盘并安装corosync所需安装包

[root@www1 ~]# mkdir /mnt/cdrom
[root@www1 ~]# mount /dev/cdrom /mnt/cdrom
mount: block device /dev/cdrom is write-protected, mounting read-only

[root@www2 ~]# scp *.rpm www2:/root

在www2上拷贝上传的rpm包到www1的root目录下:
[root@www1 ~]# yum localinstall -y *.rpm –nogpgcheck

6.编辑corosync的配置文档

[root@www1 ~]# cd /etc/corosync/
[root@www1 corosync]# ll
total 20
-rw-r--r-- 1 root root 5384 Jul 28  2010 amf.conf.example
-rw-r--r-- 1 root root  436 Jul 28  2010 corosync.conf.example
drwxr-xr-x 2 root root 4096 Jul 28  2010 service.d
drwxr-xr-x 2 root root 4096 Jul 28  2010 uidgid.d

[root@www1 corosync]# cp corosync.conf.example corosync.conf
[root@www1 corosync]# vim corosync.conf

compatibility: whitetank  (表示兼容corosync 0.86的版本,向后兼容,兼容老的版本,一些
                           新的功能可能无法实用)

(图腾的意思  ,多个节点传递心跳时的相关协议的信息)
totem {
        version: 2  版本号
        secauth: off  是否×××安全认证
        threads: 0   多少个现成认证  0 无限制
        interface {
                ringnumber: 0  
                bindnetaddr: 192 168.2.0  通过哪个网络地址进行通讯,可以给个主机地址(给成192.168.2.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }  
}

logging {
        fileline: off
        to_stderr: no  是否发送标准出错
        to_logfile: yes  日志
        to_syslog: yes   系统日志  (建议关掉一个),会降低性能
        logfile: /var/log/cluster/corosync.log  (手动创建目录)
        debug: off  排除时可以起来
        timestamp: on 日志中是否记录时间

      一下是openais的东西,可以不用×××
        logger_subsys {
                subsys: AMF
                debug: off
        }  
}

amf {
        mode: disabled
}
补充一些东西,前面只是底层的东西,因为要用pacemaker

service {
        ver: 0
        name: pacemaker
}
虽然用不到openais ,但是会用到一些子选项

aisexec {
        user: root
        group: root
}

7.为了便面其他主机加入该集群,需要认证,生成一个authkey

[root@www1 corosync]# corosync-keygen

[root@www1 corosync]# ll
total 28
-rw-r--r-- 1 root root 5384 Jul 28  2010 amf.conf.example
-r-------- 1 root root  128 Oct 24 13:59 authkey
-rw-r--r-- 1 root root  538 Oct 24 13:56 corosync.conf
-rw-r--r-- 1 root root  436 Jul 28  2010 corosync.conf.example
drwxr-xr-x 2 root root 4096 Jul 28  2010 service.d
drwxr-xr-x 2 root root 4096 Jul 28  2010 uidgid.d

[root@www1 corosync]# scp -p authkey corosync.conf www2:/etc/corosync/

8.该目录必须提前创建

[root@www1 ~]# mkdir /var/log/cluster

[root@www1 corosync]# ssh www2 'mkdir  /var/log/cluster

9.启动corosync服务

[root@www1 corosync]# service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]
[root@www1 corosync]# ssh www2 'service corosync start'
root@www2's password:
Starting Corosync Cluster Engine (corosync):
[  OK  ]

10.检测corosync是否无误

验证corosync引擎是否正常启动了

[root@www1 corosync]# grep -i  -e "corosync cluster engine" -e "configuration file" /var/log/messages
Oct 24 11:09:04 www1 smartd[3260]: Opened configuration file /etc/smartd.conf
Oct 24 11:09:04 www1 smartd[3260]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Oct 24 17:08:33 www1 corosync[26362]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Oct 24 17:08:33 www1 corosync[26362]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

查看初始化成员节点通知是否发出

[root@www1 corosync]# grep -i totem /var/log/messages
Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] Initializing transport (UDP/IP).
Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] The network interface is down.
Oct 24 17:08:34 www1 corosync[26362]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

[root@www2 ~]# grep -i totem /var/log/messages
Oct 24 17:09:07 www2 corosync[28610]:   [TOTEM ] Initializing transport (UDP/IP).
Oct 24 17:09:07 www2 corosync[28610]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 24 17:09:07 www2 corosync[28610]:   [TOTEM ] The network interface is down.
Oct 24 17:09:08 www2 corosync[28610]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

检查过程中是否有错误产生

[root@www1 corosync]# grep -i error:  /var/log/messages  |grep -v unpack_resources

[root@www2 ~]# grep -i error:  /var/log/messages  |grep -v unpack_resources

不显示任何信息证明正确无误!

检查pacemaker时候已经启动了

[root@www1 corosync]# grep -i totem /var/log/messages
Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] Initializing transport (UDP/IP).
Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] Initializing transmit/receive sec
Oct 24 17:08:33 www1 corosync[26362]:   [TOTEM ] The network interface is down.
Oct 24 17:08:34 www1 corosync[26362]:   [TOTEM ] A processor joined or left the me
[root@www1 corosync]# grep -i error:  /var/log/messages  |grep -v unpack_resources
[root@www1 corosync]# grep -i pcmk_startup /var/log/messages
Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] Logging: Initialized pcmk_startup
Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295
Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: Service: 9
Oct 24 17:08:34 www1 corosync[26362]:   [pcmk  ] info: pcmk_startup: Local hostname: www1.gjp.com

[root@www2 ~]# grep -i pcmk_startup /var/log/messages

前集群的节点上启动另外一个节点

[root@www1 ~]# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]
[root@www1 ~]# ssh www2  '/etc/init.d/corosync start'
root@www2's password:
Starting Corosync Cluster Engine (corosync): [  OK  ]

[root@www2 corosync]# crm status
============
Last updated: Wed Oct 24 20:11:19 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

提示:集群的节点之间的时间应该是同步的,

提供高可用服务
在corosync中,定义服务可以用两种借口

1.图形接口  (使用hb—gui)
2.crm  (pacemaker 提供,是一个shell)

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第4张图片

用于查看cib的相关信息

如何验证该文件的语法错误

[root@www1 corosync]# crm_verify  -L
crm_verify[4329]: 2012/10/25_14:59:35 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[4329]: 2012/10/25_14:59:35 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[4329]: 2012/10/25_14:59:35 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
  -V may provide more details

可以看到有stonith错误,在高可用的环境里面,会禁止实用任何支援
可以禁用stonith

[root@www1 corosync]# crm
crm(live)# configure
crm(live)configure#  property stonith-enabled=false
crm(live)configure# commit
crm(live)configure# show
node www1.gjp.com
node www2.gjp.com
property $id="cib-bootstrap-options" \
    dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    stonith-enabled="false"

再次进行检查

[root@www1 corosync]# crm_verify  -L

    没有错误了
    系统上有专门的stonith命令

stonith   -L   显示stonith所指示的类型
crm可以使用交互式模式 
可以执行help
保存在cib里面,以xml的格式

11.资源的配置

集群的资源类型有4种
primitive   本地主资源 (只能运行在一个节点上)
group     把多个资源轨道一个组里面,便于管理
clone    需要在多个节点上同时启用的  (如ocfs2  ,stonith ,没有主次之分)
master    有主次之分,如drbd

现在用的资源

ip地址  http服务  共享存储
用资源代理进行配置
ocf  lsb的
使用list可以查看

[root@www1 corosync]# crm
crm(live)# help

This is the CRM command line interface program.

Available commands:

    cib              manage shadow CIBs
    resource         resources management
    configure        CRM cluster configuration
    node             nodes management
    options          user preferences
    ra               resource agents information center
    status           show cluster status
    quit,bye,exit    exit the program
    help             show help
    end,cd,up        go back one level

crm(live)# ra
Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第5张图片

(是/etc/init.d目录下的)

crm(live)ra# list ocf heartbeat

实用info或者meta 用于显示一个资源的详细信息

  meta ocf:heartbeat:IPaddr  各个子项用:分开

crm(live)ra# meta ocf:heartbeat:IPaddr 

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第6张图片

配置一个资源,可以在configuration 下面进行配置

1.先资源名字

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第7张图片

crm(live)configure# commit
crm(live)configure# end
crm(live)# status
============
Last updated: Thu Oct 25 15:18:54 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com

可以看出该资源在node1上启动

[root@www1 corosync]# ifconfig |less

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第8张图片

[root@www1 corosync]# mount /dev/cdrom /mnt/cdrom
mount: block device /dev/cdrom is write-protected, mounting read-only
[root@www1 corosync]# yum install httpd -y

[root@www1 corosync]# service httpd status
httpd is stopped
[root@www1 corosync]# chkconfig --list |grep httpd
httpd              0:off    1:off    2:off    3:off    4:off    5:off    6:off
[root@www1 corosync]# crm
crm(live)# ra
crm(live)ra# classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith

定义web服务资源
  在两个节点上都要进行安装
  安装完毕后,可以查看httpd的lsb脚本

[root@www1 corosync]# crm ra list lsb

[root@www1 corosync]# crm
或者

crm(live)# ra
crm(live)ra# list lsb

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第9张图片

crm(live)ra# end
crm(live)# configure
crm(live)configure# primitive webserver lsb:httpd

定义httpd的资源
crm(live)configure# show
node www1.gjp.com
node www2.gjp.com
primitive webip ocf:heartbeat:IPaddr \
    params ip="192.168.2.66"
primitive webserver lsb:httpd
property $id="cib-bootstrap-options" \
    dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    stonith-enabled="false"
crm(live)configure# commit
crm(live)configure# end
crm(live)# status
============
Last updated: Thu Oct 25 16:06:46 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com
webserver    (lsb:httpd):    Started www1.gjp.com

Failed actions:
    webserver_monitor_0 (node=www2.gjp.com, call=3, rc=5, status=complete): not installed

如果www2.gjp.com上面已安装http服务,则会出现,ip在www1上,服务在www2上运行!

[root@www1 ~]# service httpd status
httpd (pid  4897) is running...

[root@www1 ~]# echo "www1.gjp.com">/var/www/html/index.html
[root@www1 ~]# crm
crm(live)# configure
crm(live)configure# help group

The `group` command creates a group of resources.

Usage:
...............
        group [...]
          [meta attr_list]
          [params attr_list]

        attr_list :: [$id=] = [=...] | $id-ref=
...............
Example:
...............
        group internal_www disk0 fs0 internal_ip apache \
          meta target_role=stopped
...............

crm(live)configure# group web webip webserver
crm(live)configure# commit
Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第10张图片

 

客户机测试:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第11张图片 

[root@www1 ~]# crm status
============
Last updated: Thu Oct 25 16:34:28 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

Resource Group: web
     webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com
     webserver    (lsb:httpd):    Started www1.gjp.com

Failed actions:
    webserver_monitor_0 (node=www2.gjp.com, call=3, rc=5, status=complete): not installed

模拟www1已经死掉:
[root@www1 ~]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:.......            [  OK  ]

[root@www1 ~]# service httpd status
httpd is stopped

[root@www2 Server]# crm status
============
Last updated: Thu Oct 25 16:43:01 2012
Stack: openais
Current DC: www2.gjp.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ www2.gjp.com ]
OFFLINE: [ www1.gjp.com ]

Failed actions:
    webserver_monitor_0 (node=www2.gjp.com, call=3, rc=5, status=complete): not installed

发现总有这个错误提示:

安装提示在www2上安装http服务,必须重启服务,否则,不能识别到,错误仍在!

[root@www2 Server]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:^[[A.^H.....       [  OK  ]
[root@www2 Server]# service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]
[root@www2 Server]# crm status
============
Last updated: Thu Oct 25 16:47:18 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

Resource Group: web
     webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com
     webserver    (lsb:httpd):    Started www1.gjp.com

解决www2接管不了服务的问题:

[root@www1 ~]# service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]
[root@www1 ~]# service httpd status
httpd (pid  5233) is running...

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第12张图片

在www2上创建网站:

[root@www2 Server]# echo "www2.gjp.com " >/var/www/html/index.html

一旦www1死掉:

[root@www1 ~]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:.......            [  OK  ]

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第13张图片

能够正常访问:

[root@www2 Server]# service httpd status
httpd (pid  4656) is running...

[root@www2 Server]# crm status
============
Last updated: Thu Oct 25 17:12:16 2012
Stack: openais
Current DC: www2.gjp.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ www2.gjp.com ]
OFFLINE: [ www1.gjp.com ]

Resource Group: web
     webip    (ocf::heartbeat:IPaddr):    Started www2.gjp.com
     webserver    (lsb:httpd):    Started www2.gjp.com

如果www1恢复了,不能把权利争夺过来!观看如下配置:

[root@www1 ~]# service corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]
[root@www1 ~]# crm status
============
Last updated: Thu Oct 25 17:20:44 2012
Stack: openais
Current DC: www2.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

Resource Group: web
     webip    (ocf::heartbeat:IPaddr):    Started www2.gjp.com
     webserver    (lsb:httpd):    Started www2.gjp.com

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第14张图片

随便刷新都定位到www2上!

除非www2上的corosync服务死掉

[root@www2 Server]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Waiting for corosync services to unload:.......            [  OK  ]

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第15张图片

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第16张图片

www2.gjp.com上的配置:

[root@gjp99 ~]# cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=yes
HOSTNAME=www2.gjp.com
[root@gjp99 ~]# hostname www2.gjp.com

[root@www2 ~]# hwclock -s
[root@www2 ~]# clock
Tue 23 Oct 2012 05:20:32 PM CST  -0.018132 seconds

[root@www2 .ssh]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1   localhost.localdomain  localhost
::1        localhost6.localdomain6 localhost6
192.168.2.1     www1.gjp.com      www1
192.168.2.2     www2.gjp.com      www2

[root@www2 ~]# ping www1.gjp.com
PING www1.gjp.com (192.168.2.1) 56(84) bytes of data.
64 bytes from www1.gjp.com (192.168.2.1): icmp_seq=1 ttl=64 time=1.11 ms
64 bytes from www1.gjp.com (192.168.2.1): icmp_seq=2 ttl=64 time=0.506 ms

名称已经能够相互解析!

[root@www2 ~]# cat /etc/yum.repos.d/rhel-debuginfo.repo
[rhel-server]
name=Red Hat Enterprise Linux Server
baseurl=file:///mnt/cdrom/Server
enabled=1
gpgcheck=1
gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release

[rhel-cluster]
name=Red Hat Enterprise Linux Cluster
baseurl=file:///mnt/cdrom/Cluster
enabled=1
gpgcheck=1
gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release

要保证光驱已连接:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第17张图片

[root@www2 ~]# mkdir /mnt/cdrom
[root@www2 ~]# mount /dev/cdrom /mnt/cdrom

[root@www2 ~]# yum grouplist all
Loaded plugins: rhnplugin, security
This system is not registered with RHN.
RHN support will be disabled.
Setting up Group Process
rhel-cluster                                                                | 1.3 kB     00:00    
rhel-cluster/primary                                                        | 6.5 kB     00:00    
rhel-server                                                                 | 1.3 kB     00:00    
rhel-server/primary                                                         | 732 kB     00:00    
rhel-cluster/group                                                          | 101 kB     00:00    
rhel-server/group                                                           | 1.0 MB     00:00    
Done

实现同一网段内的无障碍通讯!

[root@www2 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)?
[root@www2 ~]# cd .ssh/
[root@www2 .ssh]# ls
id_rsa  id_rsa.pub
[root@www2 .ssh]# ssh-copy-id -i id_rsa.pub www1
10
The authenticity of host 'www1 (192.168.2.1)' can't be established.
RSA key fingerprint is 87:be:8b:a4:bd:11:11:10:c2:ec:2d:ef:02:68:f6:0e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'www1,192.168.2.1' (RSA) to the list of known hosts.
root@www1's password:
Now try logging into the machine, with "ssh 'www1'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@www2 .ssh]# scp /etc/yum.repos.d/rhel-debuginfo.repo  www1:/etc/yum.repos.d/
rhel-debuginfo.repo                                              100%  318     0.3KB/s   00:00   
[root@www2 .ssh]# date
Wed Oct 24 11:30:30 CST 2012
[root@www2 .ssh]# ssh www1 'date'
Wed Oct 24 11:30:40 CST 2012

[root@www1 ~]# ssh-keygen -t rsa

[root@www1 .ssh]# ssh-copy-id -i id_rsa.pub www2

 

上传所需软件包:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第18张图片

[root@www2 ~]# mount /dev/cdrom /mnt/cdrom
mount: block device /dev/cdrom is write-protected, mounting read-only
[root@www2 ~]# yum localinstall -y *.rpm –nogpgcheck

 

验证corosync引擎是否正常启动了

[root@www2 ~]#  grep -i  -e "corosync cluster engine" -e "configuration file" /var/log/messages
Oct 24 11:09:03 www2 smartd[3259]: Opened configuration file /etc/smartd.conf
Oct 24 11:09:03 www2 smartd[3259]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Oct 24 17:09:07 www2 corosync[28610]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Oct 24 17:09:07 www2 corosync[28610]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

[root@www2 Server]# yum install httpd -y

 

DRBD的配置:

www1 的配置:

[root@www1 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
e
Selected partition 4
First cylinder (1354-2610, default 1354):
Using default value 1354
Last cylinder or +size or +sizeM or +sizeK (1354-2610, default 2610):
Using default value 2610

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1288    10241437+  83  Linux
/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris
/dev/sda4            1354        2610    10096852+   5  Extended

Command (m for help): n
First cylinder (1354-2610, default 1354): p
First cylinder (1354-2610, default 1354):
Using default value 1354
Last cylinder or +size or +sizeM or +sizeK (1354-2610, default 2610): +2g

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1288    10241437+  83  Linux
/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris
/dev/sda4            1354        2610    10096852+   5  Extended
/dev/sda5            1354        1597     1959898+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

[root@www1 ~]# partprobe /dev/sda
[root@www1 ~]# cat /proc/partitions
major minor  #blocks  name

   8     0   20971520 sda
   8     1     104391 sda1
   8     2   10241437 sda2
   8     3     522112 sda3
   8     4          0 sda4
   8     5    1959898 sda5

在节点2上做同样配置

安装drbd,用来构建分布式存储。

这里要选用适合自己系统的版本进行安装,我用到的是

drbd83-8.3.8-1.el5.centos.i386.rpm

kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

image

[root@www1 ~]# yum localinstall -y drbd83-8.3.8-1.el5.centos.i386.rpm –nogpgcheck

[root@www1 ~]# yum localinstall -y kmod-drbd83-8.3.8-1.el5.centos.i686.rpm --nogpgcheck

在节点2上做同样操作

[root@www1 ~]# cp /usr/share/doc/drbd83-8.3.8/drbd.conf  /etc
cp: overwrite `/etc/drbd.conf'? y  必须选择覆盖!  
[root@www1 ~]# scp /etc/drbd.conf  www2:/etc/

[root@www1 ~]# vim /etc/drbd.d/global_common.conf
[root@www1 ~]# cat /etc/drbd.d/global_common.conf

global {
        usage-count yes;
        # minor-count dialog-refresh disable-ip-verification
}

common {
        protocol C;

        startup {
                wfc-timeout  120;
                degr-wfc-timeout 120;
         }
        disk {
                  on-io-error detach;
                  fencing resource-only;

          }
        net {
                cram-hmac-alg "sha1";
                shared-secret  "mydrbdlab";
         }
        syncer {
                  rate  100M;
         }

}

[root@www1 ~]# vim /etc/drbd.d/web.res
[root@www1 ~]# cat /etc/drbd.d/web.res
resource  web {
        on www1.gjp.com {
        device   /dev/drbd0;
        disk    /dev/sda5;
        address  192.168.2.1:7789;
        meta-disk       internal;
        }  

        on www2.gjp.com {
        device   /dev/drbd0;
        disk    /dev/sda5;
        address  192.168.2.2:7789;
        meta-disk       internal;
        }  
}

开始初始化

双方借点上都要执行

drbdadm   create-md web

在双方的节点上启动服务

service drbd start

查看状态

[root@www1 ~]# drbdadm create-md web

[root@www1 ~]# service drbd start

启动时必须双方一块启动!

 

www2的配置:

[root@www2 Server]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1288    10241437+  83  Linux
/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
e
Selected partition 4
First cylinder (1354-2610, default 1354):
Using default value 1354
Last cylinder or +size or +sizeM or +sizeK (1354-2610, default 2610):
Using default value 2610

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1288    10241437+  83  Linux
/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris
/dev/sda4            1354        2610    10096852+   5  Extended

Command (m for help): n
First cylinder (1354-2610, default 1354):
Using default value 1354
Last cylinder or +size or +sizeM or +sizeK (1354-2610, default 2610): +2g

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1288    10241437+  83  Linux
/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris
/dev/sda4            1354        2610    10096852+   5  Extended
/dev/sda5            1354        1597     1959898+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

[root@www2 Server]# partprobe /dev/sda
[root@www2 Server]# cat /proc/partitions
major minor  #blocks  name

   8     0   20971520 sda
   8     1     104391 sda1
   8     2   10241437 sda2
   8     3     522112 sda3
   8     4          0 sda4
   8     5    1959898 sda5

[root@www1 ~]# scp drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm www2:/root
root@www2's password:
drbd83-8.3.8-1.el5.centos.i386.rpm              100%  217KB 216.7KB/s   00:00   
kmod-drbd83-8.3.8-1.el5.centos.i686.rpm         100%  123KB 123.0KB/s   00:00 

[root@www2 ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm
warning: drbd83-8.3.8-1.el5.centos.i386.rpm: Header V3 DSA signature: NOKEY, key ID e8562897
Preparing...                ########################################### [100%]
   1:drbd83                 warning: /etc/drbd.conf created as /etc/drbd.conf.rpmnew
########################################### [100%]
[root@www2 ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm
warning: kmod-drbd83-8.3.8-1.el5.centos.i686.rpm: Header V3 DSA signature: NOKEY, key ID e8562897
Preparing...                ########################################### [100%]
   1:kmod-drbd83            ########################################### [100%]
[root@www2 ~]# cp /usr/share/doc/drbd83-8.3.8/drbd.conf   /etc/
cp: overwrite `/etc/drbd.conf'? y

[root@www2 ~]# scp www1:/etc/drbd.d/global_common.conf  /etc/drbd.d/global_common.conf
global_common.conf                              100%  505     0.5KB/s   00:00

[root@www2 ~]# scp www1:/etc/drbd.d/web.res  /etc/drbd.d/web.res
web.res                                         100%  348     0.3KB/s   00:00

[root@www2 ~]# drbdadm   create-md web

[root@www2 ~]# service drbd start
Starting DRBD resources: [
web
Found valid meta data in the expected location, 2006929408 bytes into /dev/sda5.
d(web) s(web) n(web) ].

[root@www1 ~]# service drbd start
Starting DRBD resources: [ ].
[root@www1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:195980

都为second 状态,没有同步

也可以

drbd-overview
[root@www1 ~]# drbdadm   -- --overwrite-data-of-peer primary web

[root@www1 ~]# vim /etc/drbd.d/global_common.conf

可调整同步速率:rate

[root@www1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:259716 nr:0 dw:0 dr:267904 al:0 bm:15 lo:1 pe:31 ua:256 ap:0 ep:1 wo:b oos:1701048
    [=>..................] sync'ed: 13.4% (1701048/1959800)K delay_probe: 25
    finish: 0:00:37 speed: 45,120 (23,520) K/sec

[root@www1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:1959800 nr:0 dw:0 dr:1959800 al:0 bm:120 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@www1 ~]# drbd-overview
  0:web  Connected Primary/Secondary UpToDate/UpToDate C r----

[root@www2 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1959800
[root@www2 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:1959800 dw:1959800 dr:0 al:0 bm:120 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:

创建文件系统(在主节点上实现)

mkfs -t ext3  -L drbdweb  /dev/drbd0

[root@www1 ~]# mkfs -t ext3  -L drbdweb  /dev/drbd0
mke2fs 1.39 (29-May-2006)
Filesystem label=drbdweb
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
245280 inodes, 489950 blocks
24497 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=503316480
15 block groups
32768 blocks per group, 32768 fragments per group
16352 inodes per group
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912

Writing inode tables: done                           
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 37 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@www1 ~]# mkdir /web
[root@www1 ~]# mount /dev/drbd0 /web/
[root@www1 ~]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/hdc on /mnt/cdrom type iso9660 (ro)
/dev/drbd0 on /web type ext3 (rw)

[root@www1 ~]# cd /web
[root@www1 web]# echo "web1 " >index.html
[root@www1 web]# ll
total 20
-rw-r--r-- 1 root root     6 Oct 25 21:11 index.html
drwx------ 2 root root 16384 Oct 25 20:57 lost+found

[root@www2 ~]# mkdir /web2
[root@www2 ~]# mount /dev/drbd0 /web2
mount: block device /dev/drbd0 is write-protected, mounting read-only
mount: Wrong medium type

从设备没有任何权限!

[root@www1 ~]# umount /web
[root@www1 ~]# drbdadm secondary web
[root@www1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----
    ns:2024140 nr:0 dw:64340 dr:1959937 al:24 bm:135 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@www2 ~]# drbdadm primary web
[root@www2 ~]# mount /dev/drbd0 /web2
[root@www2 ~]# ll /web2
total 20
-rw-r--r-- 1 root root     6 Oct 25 21:11 index.html
drwx------ 2 root root 16384 Oct 25 20:57 lost+found

[root@www2 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:40 nr:2024140 dw:2024180 dr:221 al:1 bm:120 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
[root@www2 ~]# cd /web2
[root@www2 web2]# touch gjp.txt
[root@www2 web2]# ll
total 20
-rw-r--r-- 1 root root     0 Oct 25 21:16 gjp.txt
-rw-r--r-- 1 root root     6 Oct 25 21:11 index.html
drwx------ 2 root root 16384 Oct 25 20:57 lost+found

注意:还原为www1为主,www2为辅,则必须把www2上的挂载点卸掉!然后再设置主备!

[root@www1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:2024140 nr:96 dw:64436 dr:1959937 al:24 bm:135 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

[root@www1 ~]# cd /var/www/html
[root@www1 html]# ll
total 4
-rw-r--r-- 1 root root  0 Oct 25 16:14 gjp1
-rw-r--r-- 1 root root 13 Oct 25 16:23 index.html
[root@www1 html]# mv index.html /web/

mv: overwrite `/web/index.html'? y

必须覆盖,原来的index.html是随便写的,不是网站

[root@www1 html]# cd /web/
[root@www1 web]# ll
total 20
-rw-r--r-- 1 root root     0 Oct 25 21:16 gjp.txt
-rw-r--r-- 1 root root    13 Oct 25 16:23 index.html
drwx------ 2 root root 16384 Oct 25 20:57 lost+found

[root@www1 web]# vim /etc/httpd/conf/httpd.conf

image

将其修改为:

image

在www2上改变为

image

下面实现COROSYNC自动调用brdb

brdb自动挂载挂载点, 由于访问的网站已经放到挂载点 /web下,所以全都能够自动实现!

修改如下:

由于两台corosync使用的是同一个配置文件,所以两台设备上的挂载点必须相同

即在www2上建立挂载点/web   并修改httpd.conf的网站默认目录/web

corosync 如何与drbd绑定?

把drbd添加到corosync服务上

代码添加

crm configure primitive drbd_web_FS ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/web" fstype="ext3"

crm configure primitive httpd_drbd_web ocf:heartbeat:drbd params drbd_resource="web" op monitor interval="60s" role="Master" timeout="40s" op monitor interval="70s" role="Slave" timeout="40s"

crm configure master MS_Webdrbd httpd_drbd_web meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

crm configure colocation drbd_web_FS_on_MS_Webdrbd inf: drbd_web_FS MS_Webdrbd:Master

crm configure order drbd_web_FS_after_MS_Webdrbd inf: MS_Webdrbd:promote drbd_web_FS:start

crm configure property no-quorum-policy="ignore"

配置查看:

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第19张图片

[root@www1 ~]# cd /etc/drbd.d/
[root@www1 drbd.d]# vim global_common.conf

global {
        usage-count no;   //注意该为no 
        # minor-count dialog-refresh disable-ip-verification
}

common {
        protocol C;

handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; 
                fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; 
                split-brain "/usr/lib/drbd/notify-split-brain.sh root"; 
                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; 
                before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; 
                after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
        startup {
                wfc-timeout  120;
                degr-wfc-timeout 120;
         }  
        disk {
                  on-io-error detach;

                 fencing resource-only;

          }
        net {
                cram-hmac-alg "sha1";
                shared-secret  "mydrbdlab";
         }
        syncer {
                  rate  100M;
         }

}

 

由于我们在/etc/drbd.d/global_common.conf配置文件中开启了资源隔离和脑列处理机制,所以在crm的配置文件cib中将会自动出现一个位置约束配置,当主节点宕机之后,禁止从节点变为主节点,以免当主节点恢复的时候产生脑裂,进行资源争用,但是我们此时只是为了验证资源能够流转,所以将这个位置约束删除:

[root@www1 drbd.d]# crm configure edit

image

两台都要删除这两行!

node www1.gjp.com \
        attributes standby="on"  
node www2.gjp.com \
        attributes standby="off"
primitive drbd_web_FS ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/web" fstype="ext3"
primitive httpd_drbd_web ocf:heartbeat:drbd \
        params drbd_resource="web" \
        op monitor interval="60s" role="Master" timeout="40s" \
        op monitor interval="70s" role="Slave" timeout="40s"
primitive webip ocf:heartbeat:IPaddr \
        params ip="192.168.2.66"
primitive webserver lsb:httpd
group web webip webserver
ms MS_Webdrbd httpd_drbd_web \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation drbd_web_FS_on_MS_Webdrbd inf: drbd_web_FS MS_Webdrbd:Master
order drbd_web_FS_after_MS_Webdrbd inf: MS_Webdrbd:promote drbd_web_FS:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"

[root@www1 drbd.d]# crm status
============
Last updated: Sun Oct 28 15:55:51 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

Resource Group: web
     webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com
     webserver    (lsb:httpd):    Started www1.gjp.com
drbd_web_FS    (ocf::heartbeat:Filesystem):    Started www1.gjp.com
Master/Slave Set: MS_Webdrbd [httpd_drbd_web]
     Masters: [ www1.gjp.com ]
     Stopped: [ httpd_drbd_web:1 ]

[root@www1 drbd.d]# service drbd status

drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res  cs            ro               ds                         p                 mounted  fstype
0:web  WFConnection  Primary/Unknown  UpToDate/Outdated  C  /web     ext3

[root@www1 drbd.d]# crm status
============
Last updated: Sun Oct 28 16:08:38 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ www1.gjp.com www2.gjp.com ]

Resource Group: web
    webip    (ocf::heartbeat:IPaddr):    Started www1.gjp.com
     webserver    (lsb:httpd):    Started www1.gjp.com
drbd_web_FS
    (ocf::heartbeat:Filesystem):    Started www2.gjp.com
Master/Slave Set: MS_Webdrbd [httpd_drbd_web]
     Masters: [ www2.gjp.com ]
     Slaves: [ www1.gjp.com ]

发现出现脑裂现象,解决如下:

image 

[root@www1 drbd.d]# watch -n 1 'crm status'

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第20张图片

已可以同步:查看挂载点

[root@www1 drbd.d]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/drbd0 on /web type ext3 (rw)

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第21张图片

观看www2上的状态:

[root@www2 drbd.d]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[root@www2 drbd.d]# service httpd status
httpd is stopped
[root@www2 drbd.d]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res  cs          ro                 ds                 p      mounted  fstype
0:web  StandAlone  Secondary/Unknown  UpToDate/Outdated  r----

模拟www1死掉了!

image

[root@www2 drbd.d]# crm status
============
Last updated: Sun Oct 28 17:25:27 2012
Stack: openais
Current DC: www1.gjp.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Node www1.gjp.com: standby
Online: [ www2.gjp.com ]

Resource Group: web
     webip    (ocf::heartbeat:IPaddr):    Started www2.gjp.com
     webserver    (lsb:httpd):    Started www2.gjp.com
Master/Slave Set: MS_Webdrbd [httpd_drbd_web]
     Masters: [ www2.gjp.com ]
     Stopped: [ httpd_drbd_web:0 ]
drbd_web_FS    (ocf::heartbeat:Filesystem):    Started www2.gjp.com
[root@www2 drbd.d]# service httpd status
httpd (pid  8509) is running...

Linux下高可用群集之corosync+openais+pacemaker+web+drbd_第22张图片 

能够正常访问!

eth0      Link encap:Ethernet  HWaddr 00:0C:29:99:12:74 
          inet addr:192.168.2.2  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe99:1274/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:192191 errors:0 dropped:0 overruns:0 frame:0
          TX packets:103068 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:121841514 (116.1 MiB)  TX bytes:13390418 (12.7 MiB)
          Interrupt:67 Base address:0x2000

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:99:12:74 
          inet addr:192.168.2.66  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:67 Base address:0x2000

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host

[root@www2 drbd.d]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res  cs          ro               ds                 p      mounted  fstype
0:web  StandAlone  Primary/Unknown  UpToDate/Outdated  r----  ext3