drbd+pacemaker的配置管理

此过程分为三个阶段:
                                       一.drbd的编译安装
                                        node1:
      新建一个空闲分区:/dev/sda5
       vi /etc/sysconfig/network-scripts/ifcfg-eth0
        修改内容为下面几行:DEVICE=eth0
                            BOOTPROTO=static
                            HWADDR=00:0C:29:77:F8:D2
                            IPADDR=192.168.0.2
                            NETMASK=255.255.255.0
                            ONBOOT=yes
          vi /etc/sysconfig/network           
              
              修改内容为下面几行   
                            NETWORKING=yes
                            NETWORKING_IPV6=no
                            HOSTNAME=node1.a.org                  
             hostname node1.a.org
          vi /etc/hosts
          添加:192.168.0.2  node1.a.org node1
                192.168.0.3  node1.a.org node2
scp  /etc/hosts  192.168.0.3:/etc
    下载源码包:drbd-8.3.10.tar.gz 
     tar xf drbd-8.3.10.tar.gz
     cd drbd-8.3.10
     ./configure
     make rpm
     make km-rpm
  
     cd /usr/src/redhat/RPMS/i386/
     rpm -ivh drbd*
     modprobe drbd//重读模块
     lsmod | grep drbd//查看模块是否存在
     vi /etc/drbd.d/global_common.conf添加global {
        usage-count no;
        # minor-count dialog-refresh disable-ip-verification
}
common {
        protocol C;
        handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }
        startup {
                wfc-timeout 120;
                degr-wfc-timeout 120;
        }
        disk {
                on-io-error detach;
        fencing resource-only;
        }
        net {
        cram-hmac-alg "sha1";
         shared-secret "mydrbdlab";
        }
        syncer {
                rate 100M;
        }
}
 
3、定义一个资源/etc/drbd.d/web.res,内容如下:
resource web {
  on node1.a.org {
    device    /dev/drbd0;
    disk      /dev/sda5;
    address   192.168.0.2:7789;    meta-disk internal;
  }
  on node2.a.org {
    device    /dev/drbd0;
    disk      /dev/sda5;
    address   192.168.0.3:7789;
    meta-disk internal;
  }
}
scp /etc/drbd.d/global_common.conf /etc/drbd.d/web.res node2:/etc/drbd.d/
初始化:
      drbdadm create-md web
启动服务:/etc/init.d/drbd start
node2:新建一个空闲分区:/dev/sda5
         vi /etc/sysconfig/network-scripts/ifcfg-eth0
        修改内容为下面几行:DEVICE=eth0
                            BOOTPROTO=static
                            HWADDR=00:0C:29:77:F8:D2
                            IPADDR=192.168.0.3
                            NETMASK=255.255.255.0
                            ONBOOT=yes
           vi /etc/sysconfig/network           
              
              修改内容为下面几行   
                            NETWORKING=yes
                            NETWORKING_IPV6=no
                            HOSTNAME=node2.a.org                  
           hostname node2.a.org
  下载源码包:drbd-8.3.10.tar.gz 
     tar xf drbd-8.3.10.tar.gz
     cd drbd-8.3.10
     ./configure
     make rpm
     make km-rpm
 
     cd /usr/src/redhat/RPMS/i386/
     rpm -ivh drbd*
     modprobe drbd//重读模块
     lsmod | grep drbd//查看模块是否存在
 初始化:
    drbdadm create-md web
启动服务:/etc/init.d/drbd start
把当前节点设置为主节点:drbdadm -- --overwrite-data-of-peer primary web
  二.openais的安装
node1:设定两个节点可以基于密钥进行ssh通信
          ssh-keygen -t rsa
          ssh-copy-id -i .ssh/id_rsa.pub [email protected]
         scp /etc/hosts  192.168.0.12:/etc/
         安装软件包:cluster-glue
                     cluster-glue-libs
                     heartbeat
                     openaislib
                     resource-agents
                     corosync
                     heartbeat-libs
                     pacemaker
                     corosynclib
                     libesmtp
                     pacemaker-libs
 yum --nogpgcheck -y localinstall *.rpm
  修改配置文件:cd /etc/corosync
                cp corosync.conf.example corosync.conf
              添加vi corosync.conf
            
               service {
 ver:  0
 name: pacemaker
}
aisexec {
 user: root
 group:  root
}//此文件中的   bindnetaddr应为节点对应的网络地址:192.168.0.0
mkdir /var/log/cluster
ssh node2  'mkdir /var/log/cluster'
 生成节点通信时的认证密钥文件:
       corosync-keygen
scp -p corosync.conf authkey  node2:/etc/corosync/复制这两个文件到node2节点
启动服务:/etc/init.d/corosync start
验证是否有错:
查看corosync引擎是否正常启动:
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
Jun 14 19:02:08 node1 corosync[5103]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:02:08 node1 corosync[5103]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jun 14 19:02:08 node1 corosync[5103]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Jun 14 19:03:49 node1 corosync[5120]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:03:49 node1 corosync[5120]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
# grep  TOTEM  /var/log/messages
Jun 14 19:03:49 node1 corosync[5120]:   [TOTEM ] Initializing transport (UDP/IP).
Jun 14 19:03:49 node1 corosync[5120]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 14 19:03:50 node1 corosync[5120]:   [TOTEM ] The network interface [192.168.0.5] is now up.
Jun 14 19:03:50 node1 corosync[5120]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
# grep ERROR: /var/log/messages | grep -v unpack_resources
查看pacemaker是否正常启动:
# grep pcmk_startup /var/log/messages
Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] Logging: Initialized pcmk_startup
Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295
Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: Service: 9
Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: Local hostname: node1.a.org
如果以上都没错,可以启动node2了
  ssh node2 -- /etc/init.d/corosync start
  crm status       //查看节点的启动状态 
  crm configure property stonith-enabled=false//禁用stnoith,因为没有stonith设备,可能会出错  
  crm configure show//查看当前的配置信息
添加资源:crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=192.168.0.10
crm status//验证资源是否已经在node1.a.org上启动
crm configure property no-quorum-policy=ignore//此设置可以在node1.a.org离线时,将资源运行在node2.a.org上
将此集群配置为WEB的高可用集群:
       yum -y install httpd
crm configure primitive WebSite lsb:httpd //将httpd服务添加为集群资源 
 node2:
         vi /etc/sysconfig/network-scripts/ifcfg-eth0
        修改内容为下面几行:DEVICE=eth0
                            BOOTPROTO=static
                            HWADDR=00:0C:29:77:F8:D2
                            IPADDR=192.168.0.12
                            NETMASK=255.255.255.0
                            ONBOOT=yes
           vi /etc/sysconfig/network           
              
              修改内容为下面几行   
                            NETWORKING=yes
                            NETWORKING_IPV6=no
                            HOSTNAME=node2.a.org                  
           hostname node2.a.org
           设定两个节点可以基于密钥进行ssh通信
          ssh-keygen -t rsa
          ssh-copy-id -i .ssh/id_rsa.pub [email protected]   
yum -y install httpd
三.drbd+pacemaker
 
1、查看当前集群的配置信息,确保已经配置全局属性参数为两节点集群所适用:
# crm configure show
node node1.a.org
node node2.a.org
property $id="cib-bootstrap-options" \
 dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
 cluster-infrastructure="openais" \
 expected-quorum-votes="2" \
 stonith-enabled="false" \
 last-lrm-refresh="1308059765" \
 no-quorum-policy="ignore"
在如上输出的信息中,请确保有stonith-enabled和no-quorum-policy出现且其值与如上输出信息中相同。否则,可以分别使用如下命令进行配置:
# crm configure property stonith-enabled=false
# crm configure property no-quorum-policy=ignore
2、将已经配置好的drbd设备/dev/drbd0定义为集群服务;
1)按照集群服务的要求,首先确保两个节点上的drbd服务已经停止,且不会随系统启动而自动启动:
# drbd-overview
 0:web Unconfigured . . . .
# chkconfig drbd off
2)配置drbd为集群资源:
提供drbd的RA目前由OCF归类为linbit,其路径为/usr/lib/ocf/resource.d/linbit/drbd。我们可以使用如下命令来查看此RA及RA的meta信息:
# crm ra classes
heartbeat
lsb
ocf / heartbeat linbit pacemaker
stonith
# crm ra list ocf linbit
drbd
# crm ra info ocf:linbit:drbd
This resource agent manages a DRBD resource
as a master/slave resource. DRBD is a shared-nothing replicated storage
device. (ocf:linbit:drbd)
Master/Slave OCF Resource Agent for DRBD
Parameters (* denotes required, [] the default):
drbd_resource* (string): drbd resource name
 The name of the drbd resource from the drbd.conf file.
drbdconf (string, [/etc/drbd.conf]): Path to drbd.conf
 Full path to the drbd.conf file.
Operations' defaults (advisory minimum):
 start timeout=240
 promote timeout=90
 demote timeout=90
 notify timeout=90
 stop timeout=100
 monitor_Slave interval=20 timeout=20 start-delay=1m
 monitor_Master interval=10 timeout=20 start-delay=1m

drbd需要同时运行在两个节点上,但只能有一个节点(primary/secondary模型)是Master,而另一个节点为Slave;因此,它是一种比较特殊的集群资源,其资源类型为多状态(Multi-state)clone类型,即主机节点有Master和Slave之分,且要求服务刚启动时两个节点都处于slave状态。
[root@node1 ~]# crm
crm(live)# configure
crm(live)configure# primitive webdrbd ocf:heartbeat:drbd params drbd_resource=web op monitor role=Master interval=50s timeout=30s op monitor role=Slave interval=60s timeout=30s
crm(live)configure# master MS_Webdrbd webdrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)configure# show webdrbd
primitive webdrbd ocf:linbit:drbd \
 params drbd_resource="web" \
 op monitor interval="15s"
crm(live)configure# show MS_Webdrbd
ms MS_Webdrbd webdrbd \
 meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)configure# verify
crm(live)configure# commit

查看当前集群运行状态:
# crm status
============
Last updated: Fri Jun 17 06:24:03 2011
Stack: openais
Current DC: node2.a.org - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.a.org node1.a.org ]
 Master/Slave Set: MS_Webdrbd
 Masters: [ node2.a.org ]
 Slaves: [ node1.a.org ]
由上面的信息可以看出此时的drbd服务的Primary节点为node2.a.org,Secondary节点为node1.a.org。当然,也可以在node2上使用如下命令验正当前主机是否已经成为web资源的Primary节点:
# drbdadm role web
Primary/Secondary
        

你可能感兴趣的:(职场,drbd,休闲,pacemaker,openais)