corosync配置与详解

 

Corosync:

 

corosync: votes

 

corosync: votequorum

 

cman+corosync

 

cman+rgmanager, cman+pacemaker

corosync+pacemaker

 

前提

1)本配置共有两个测试节点,分别hadoop1.abc.com和hadoop2.abc.com,相的IP地址分别为172.16.100.15和172.16.100.16;

2)集群服务为apache的httpd服务;

3)提供web服务的地址为172.16.100.11,即vip;

4)系统为CentOS 6.4 64bits

 

1、准备工作

 

为了配置一台Linux主机成为HA的节点,通常需要做出如下的准备工作:

 

1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证两个节点上的/etc/hosts文件均为下面的内容:

192.168.1.3   hadoop1.abc.com hadoop1

192.168.1.4   hadoop2.abc.com hadoop2

 

为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令:

 

Node1:

# sed -i 's@\(HOSTNAME=\).*@\1hadoop1.abc.com@g'  /etc/sysconfig/network

# hostname hadoop1.abc.com

 

Node2:

# sed -i 's@\(HOSTNAME=\).*@\1hadoop2.abc.com@g' /etc/sysconfig/network

# hostname hadoop2.abc.com

 

2、安装pacemaker

[root@hadoop1 corosync]# yum install pacemaker

[root@hadoop2 corosync]# yum install pacemaker

3、配置corosync

[root@hadoop1 ~]# yum install corosync

[root@hadoop1 ~]# cd /etc/corosync/

[root@hadoop1 corosync]# ll

总用量 16

-rw-r--r--. 1 root root 2663 10月 15 2014 corosync.conf.example

-rw-r--r--. 1 root root 1073 10月 15 2014 corosync.conf.example.udpu

drwxr-xr-x. 2 root root 4096 10月 15 2014 service.d

drwxr-xr-x. 2 root root 4096 10月 15 2014 uidgid.d

[root@hadoop1 corosync]# cp corosync.conf.example corosync.conf

[root@hadoop1 corosync]# vim corosync.conf

接着编辑corosync.conf,添加如下内容:表示corosync启动自动启动pacemaker

service {

  ver:  0

  name: pacemaker

  # use_mgmtd: yes

}

 

aisexec {

  user: root

  group:  root

}

并设定此配置文件中 bindnetaddr后面的IP地址为你的网卡所在网络的网络地址,我们这里的两个节点在192.168.1.0网络,因此这里将其设定为172.16.0.0;如下

bindnetaddr: 172.16.0.0

 

4、安装crmsh

RHEL自6.4起不再提供集群的命令行配置工具crmsh,转而使用pcs;如果你习惯了使用crm命令,可下载相关的程序包自行安装即可。crmsh依赖于pssh,因此需要一并下载。

[root@hadoop1 ~]# cd /etc/yum.repos.d/

[root@hadoop1 yum.repos.d]# wget http://download.opensuse.org/repositories/network:ha-clustering:Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo

[root@hadoop1 yum.repos.d]# yum install crmsh

[root@hadoop1 yum.repos.d]# yum install pssh

 

[root@hadoop1 corosync]# ll

总用量 28

-rw-r--r--. 1 root root  989 7月  14 19:05 \

-r--------. 1 root root  128 7月  14 19:30 authkey //自动征收authkey文件了

-rw-r--r--. 1 root root 2811 7月  14 19:15 corosync.conf

-rw-r--r--. 1 root root 2663 10月 15 2014 corosync.conf.example

-rw-r--r--. 1 root root 1073 10月 15 2014 corosync.conf.example.udpu

drwxr-xr-x. 2 root root 4096 10月 15 2014 service.d

drwxr-xr-x. 2 root root 4096 10月 15 2014 uidgid.d

 

将corosync和authkey复制至hadoop2:

[root@hadoop1 corosync]# scp -p authkey corosync.conf hadoop2:/etc/corosync/

authkey                                       100%  128     0.1KB/s   00:00    

corosync.conf                                 100% 2811     2.8KB/s   00:00 

 

5、启动corosync

[root@hadoop1 corosync]# service corosync start

Starting Corosync Cluster Engine (corosync):               [确定]

[root@hadoop1 corosync]# ssh hadoop2 'service corosync start'

Starting Corosync Cluster Engine (corosync): [确定]

[root@hadoop1 corosync]# 

 

查看corosync引擎是否正常启动:

[root@hadoop1 cluster]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log 

Jul 14 19:36:33 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Jul 14 19:36:33 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

 

查看初始化成员节点通知是否正常发出:

[root@hadoop1 cluster]# grep  TOTEM  /var/log/cluster/corosync.log

Jul 14 19:36:33 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Jul 14 19:36:33 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Jul 14 19:36:33 corosync [TOTEM ] The network interface [192.168.1.3] is now up.

Jul 14 19:36:33 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

 

 

检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务;此处可安全忽略。

[root@hadoop1 cluster]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources

Jul 14 19:36:33 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Jul 14 19:36:33 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

 

 

查看pacemaker是否正常启动:

[root@hadoop1 cluster]# grep pcmk_startup /var/log/cluster/corosync.log 

Jul 14 19:36:33 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized

Jul 14 19:36:33 corosync [pcmk  ] Logging: Initialized pcmk_startup

Jul 14 19:36:33 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615

Jul 14 19:36:33 corosync [pcmk  ] info: pcmk_startup: Service: 9

Jul 14 19:36:33 corosync [pcmk  ] info: pcmk_startup: Local hostname: hadoop1.abc.com


 如果上面命令执行均没有问题,接着可以执行如下命令启动hadoop2上的corosync

[root@hadoop1 ~]# ssh hadoop2 -- /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync): [确定]

 


注意:启动hadoop2需要在hadoop1上使用如上命令进行,不要在hadoop2节点上直接启动。下面是node1上的相关日志。

 [root@hadoop1 ~]# tail /var/log/cluster/corosync.log
Jul 15 15:44:28 [1771] hadoop1.abc.com    pengine:     info: determine_online_status:  Node hadoop2.abc.com is online
Jul 15 15:44:28 [1771] hadoop1.abc.com    pengine:   notice: stage6:  Delaying fencing operations until there are resources to manage
Jul 15 15:44:28 [1772] hadoop1.abc.com       crmd:     info: do_state_transition:  State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jul 15 15:44:28 [1772] hadoop1.abc.com       crmd:     info: do_te_invoke:  Processing graph 6 (ref=pe_calc-dc-1436946268-37) derived from /var/lib/pacemaker/pengine/pe-input-14.bz2
Jul 15 15:44:28 [1772] hadoop1.abc.com       crmd:   notice: run_graph:  Transition 6 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-14.bz2): Complete
Jul 15 15:44:28 [1772] hadoop1.abc.com       crmd:     info: do_log:  FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Jul 15 15:44:28 [1772] hadoop1.abc.com       crmd:   notice: do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jul 15 15:44:28 [1771] hadoop1.abc.com    pengine:   notice: process_pe_message:  Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-14.bz2
Jul 15 15:44:28 [1771] hadoop1.abc.com    pengine:   notice: process_pe_message:  Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
Jul 15 15:44:33 [1767] hadoop1.abc.com        cib:     info: cib_process_ping:  Reporting our current digest to hadoop1.abc.com: 24973b4c6ef4c32f7c580bdd07cc1753 for 0.5.28 (0x277e390 0)

 

 如果安装了crmsh,可使用如下命令查看集群节点的启动状态:

[root@hadoop1 ~]# crm status
Last updated: Wed Jul 15 15:49:09 2015
Last change: Wed Jul 15 15:37:07 2015
Stack: classic openais (with plugin)
Current DC: hadoop1.abc.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
0 Resources configured


Online: [ hadoop1.abc.com hadoop2.abc.com ]

 

6、

配置集群的工作属性,禁用stonith

corosync默认启用了stonith,而当前集群并没有相应的stonith设备,因此此默认配置目前尚不可用,这可以通过如下命令验正:

[root@hadoop1 ~]# crm_verify -L -V
   error: unpack_resources:  Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:  Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:  NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid


可以通过如下命令先禁用stonith:

[root@hadoop1 ~]# crm configure
crm(live)configure# property  stonith-enabled=false

 

使用如下命令查看当前的配置信息:

[root@hadoop1 ~]# crm configure show
node hadoop1.abc.com
node hadoop2.abc.com
property cib-bootstrap-options: \
 dc-version=1.1.11-97629de \
 cluster-infrastructure="classic openais (with plugin)" \
 expected-quorum-votes=2

  stonith-enabled=false

 

7、为集群添加集群资源

corosync支持heartbeat,LSB和ocf等类型的资源代理,目前较为常用的类型为LSB和OCF两类,stonith类专为配置stonith设备而用;

可以通过如下命令查看当前集群系统所支持的类型:

[root@hadoop1 ~]# crm ra
crm(live)ra#help

        cd             Navigate the level structure
        classes        List classes and providers
        help           Show help (help topics for list of topics)
        info           Show meta data for a RA
        list           List RA for a class (and provider)
        ls             List levels and commands
        providers      Show providers for a RA and a class
        quit           Exit the interactive shell
        up             Go back to previous level

 

列出类别

crm(live)ra# classes
lsb
ocf / heartbeat pacemaker
service
stonith

 

如果想要查看某种类别下的所用资源代理的列表,可以使用类似如下命令实现:

# crm ra list lsb
# crm ra list ocf heartbeat
# crm ra list ocf pacemaker
# crm ra list stonith 

crm(live)ra# list ocf
CTDB                ClusterMon          Delay               Dummy
Filesystem          HealthCPU           HealthSMART         IPaddr
IPaddr2             IPsrcaddr           LVM                 MailTo
Route               SendArp             Squid               Stateful
SysInfo             SystemHealth        VirtualDomain       Xinetd
apache              conntrackd          controld            db2
dhcpd               ethmonitor          exportfs            iSCSILogicalUnit
mysql               named               nfsnotify           nfsserver
pgsql               ping                pingd               postfix
remote              rsyncd              symlink             tomcat

 

# crm ra info [class:[provider:]]resource_agent

如:crm(live)ra# info ocf:heartbeat:IPaddr

 

8、接下来要创建的web集群创建一个IP地址资源,以在通过集群提供web服务时使用;这可以通过如下方式实现:

 

语法:
primitive <rsc> [<class>:[<provider>:]]<type>
          [params attr_list]
          [operations id_spec]
            [op op_type [<attribute>=<value>...] ...]

op_type :: start | stop | monitor

 

例子:
 primitive apcfence stonith:apcsmart \
          params ttydev=/dev/ttyS0 hostlist="node1 node2" \
          op start timeout=60s \
          op monitor interval=30m timeout=60s

 

应用:

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.12

crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
node hadoop1.abc.com
node hadoop2.abc.com
primitive webip IPaddr \
 params ip=192.168.1.12
property cib-bootstrap-options: \
 dc-version=1.1.11-97629de \
 cluster-infrastructure="classic openais (with plugin)" \
 expected-quorum-votes=2 \
 stonith-enabled=false

 

[root@hadoop1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:50:3b:a4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.3/24 brd 192.168.1.255 scope global eth0
    inet 192.168.1.12/24 brd 192.168.1.255 scope global secondary eth0
    inet6 fe80::20c:29ff:fe50:3ba4/64 scope link
       valid_lft forever preferred_lft forever

 

[root@hadoop2~]# ssh hadoop1 '/etc/init.d/corosync stop'
Signaling Corosync Cluster Engine (corosync) to terminate: [确定]
Waiting for corosync services to unload:.[确定]
[root@hadoop2~]# crm status
Last updated: Wed Jul 15 23:07:07 2015
Last change: Wed Jul 15 21:53:01 2015
Stack: classic openais (with plugin)
Current DC: hadoop1.abc.com - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
1 Resources configured


Online: [ hadoop2.abc.com ]
OFFLINE: [ hadoop1.abc.com ]

上面的信息显示hadoop1.abc.com已经离线,但资源WebIP却没能在hadoop2.abc.com上启动。这是因为此时的集群状态为"WITHOUT quorum",即已经失去了quorum,此时集群服务本身已经不满足正常运行的条件,这对于只有两节点的集群来讲是不合理的。因此,我们可以通过如下的命令来修改忽略quorum不能满足的集群状态检查:

[root@hadoop2 ~]# crm
crm(live)# configure
crm(live)configure# property no-quorum-policy=ignore

 

正常启动hadoop1.abc.com后,集群资源WebIP很可能会重新从hadoop2.abc.com转移回hadoop1.abc.com。资源的这种在节点间每一次的来回流动都会造成那段时间内其无法正常被访问,所以,我们有时候需要在资源因为节点故障转移到其它节点后,即便原来的节点恢复正常也禁止资源再次流转回来。这可以通过定义资源的黏性(stickiness)来实现。在创建资源时或在创建资源后,都可以指定指定资源黏性。

资源黏性值范围及其作用:
0:这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复,只是资源可能会转移到非之前活动的节点上;
大于0:资源更愿意留在当前位置,但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置;
小于0:资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置;
INFINITY:如果不是因节点不适合运行资源(节点关机、节点待机、达到migration-threshold 或配置更改)而强制资源转移,资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复;
-INFINITY:资源总是移离当前位置;

9、

结合上面已经配置好的IP地址资源,将此集群配置成为一个active/passive模型的web(httpd)服务集群

为了将此集群启用为web(httpd)服务器集群,我们得先在各节点上安装httpd,并配置其能在本地各自提供一个测试页面。

[root@hadoop1 ~]# echo "<h1>hadoop1</h1>">/var/www/html/index.html
[root@hadoop1 ~]# service httpd stop
停止 httpd:                                               [失败]
[root@hadoop1 ~]# service httpd start
正在启动 httpd:                                           [确定]
[root@hadoop1 ~]# service httpd stop
停止 httpd:                                               [确定]
[root@hadoop1 ~]# chkconfig httpd off

 

 

crm(live)configure# primitive webserver lsb:httpd
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
node hadoop1.abc.com
node hadoop2.abc.com
primitive webip IPaddr \
 params ip=192.168.1.12
primitive webserver lsb:httpd
property cib-bootstrap-options: \
 dc-version=1.1.11-97629de \
 cluster-infrastructure="classic openais (with plugin)" \
 expected-quorum-votes=2 \
 stonith-enabled=false

 

接下来我们将此httpd服务添加为集群资源。将httpd添加为集群资源有两处资源代理可用:lsb和ocf:heartbeat,为了简单起见,我们这里使用lsb类型:

首先可以使用如下命令查看lsb类型的httpd资源的语法格式:

crm(live)# ra info lsb:httpd
start and stop Apache HTTP Server (lsb:httpd)

The Apache HTTP Server is an efficient and extensible  \
         server implementing the current HTTP standards.

Operations' defaults (advisory minimum):

    start         timeout=15
    stop          timeout=15
    status        timeout=15
    restart       timeout=15
    force-reload  timeout=15
    monitor       timeout=15 interval=15

接下来新建资源WebSite:

crm(live)# configure primitive WebSever lsb:httpd
configure  corosync  
crm(live)# configure
crm(live)configure# verify
crm(live)configure# commit
INFO: apparently there is nothing to commit
INFO: try changing something first
crm(live)configure# show
node hadoop1.abc.com
node hadoop2.abc.com
primitive WebSever lsb:httpd
primitive webip IPaddr \
 params ip=192.168.1.12
primitive webserver lsb:httpd
property cib-bootstrap-options: \
 dc-version=1.1.11-97629de \
 cluster-infrastructure="classic openais (with plugin)" \
 expected-quorum-votes=2 \
 stonith-enabled=false

 

先停止,再删除

[root@hadoop1 ~]# crm status
Last updated: Thu Jul 16 01:15:18 2015
Last change: Thu Jul 16 01:11:16 2015
Stack: classic openais (with plugin)
Current DC: hadoop2.abc.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured


Online: [ hadoop1.abc.com hadoop2.abc.com ]

 webip (ocf::heartbeat:IPaddr): Started hadoop1.abc.com
 webserver (lsb:httpd): Started hadoop2.abc.com
 WebSever (lsb:httpd): Started hadoop2.abc.com

 

crm(live)resource# stop WebSever

crm(live)resource# status WebSever
resource WebSever is NOT running

 

验证一下

[root@hadoop1 ~]# crm status
Last updated: Thu Jul 16 05:11:27 2015
Last change: Thu Jul 16 05:08:50 2015
Stack: classic openais (with plugin)
Current DC: hadoop2.abc.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured


Online: [ hadoop1.abc.com hadoop2.abc.com ]

 webip (ocf::heartbeat:IPaddr): Started hadoop1.abc.com
 webserver (lsb:httpd): Started hadoop2.abc.com

 

让hadoop1成为备节点

crm(live)node# standby hadoop1.abc.com
crm(live)node# cd ..
crm(live)# status
Last updated: Thu Jul 16 05:28:46 2015
Last change: Thu Jul 16 05:28:32 2015
Stack: classic openais (with plugin)
Current DC: hadoop2.abc.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
2 Resources configured


Node hadoop1.abc.com: standby
Online: [ hadoop2.abc.com ]

 webip (ocf::heartbeat:IPaddr): Started hadoop2.abc.com
 webserver (lsb:httpd): Started hadoop2.abc.com

 

此时在浏览器输入192.168.1.12显示的网页内容是haddop2

 

 

10、定义排列约束

crm(live)configure# colocation webserver_with_webip -inf: webserver webip

crm(live)configure# show xml

<?xml version="1.0" ?>
<cib num_updates="2" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="20" admin_epoch="0" cib-last-written="Thu Jul 16 05:42:29 2015" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="hadoop2.abc.com" uname="hadoop2.abc.com">
        <instance_attributes id="nodes-hadoop2.abc.com">
          <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/>
        </instance_attributes>
      </node>
      <node id="hadoop1.abc.com" uname="hadoop1.abc.com">
        <instance_attributes id="nodes-hadoop1.abc.com">
          <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">
        <instance_attributes id="webip-instance_attributes">
          <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/>
        </instance_attributes>
      </primitive>
      <primitive id="webserver" class="lsb" type="httpd"/>
    </resources>
    <constraints>以下这一行显示,谁先谁后
      <rsc_colocation id="webserver_with_webip" score="-INFINITY" rsc="webserver" with-rsc="webip"/>
    </constraints>
  </configuration>

 

11、定义顺序约束

先启动webip再启动webserver

crm(live)configure# order webip_befor_webserver  Mandatory: webip:start webserver
crm(live)configure# show xm 

<?xml version="1.0" ?>
<cib num_updates="1" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="22" admin_epoch="0" cib-last-written="Thu Jul 16 06:03:57 2015" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="hadoop2.abc.com" uname="hadoop2.abc.com">
        <instance_attributes id="nodes-hadoop2.abc.com">
          <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/>
        </instance_attributes>
      </node>
      <node id="hadoop1.abc.com" uname="hadoop1.abc.com">
        <instance_attributes id="nodes-hadoop1.abc.com">
          <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">
        <instance_attributes id="webip-instance_attributes">
          <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/>
        </instance_attributes>
      </primitive>
      <primitive id="webserver" class="lsb" type="httpd"/>
<?xml version="1.0" ?>
<cib num_updates="1" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="22" admin_epoch="0" cib-last-written="Thu Jul 16 06:03:57 2015" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="hadoop2.abc.com" uname="hadoop2.abc.com">
        <instance_attributes id="nodes-hadoop2.abc.com">
          <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/>
        </instance_attributes>
      </node>
      <node id="hadoop1.abc.com" uname="hadoop1.abc.com">
        <instance_attributes id="nodes-hadoop1.abc.com">
          <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">
        <instance_attributes id="webip-instance_attributes">
          <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/>
        </instance_attributes>
      </primitive>
      <primitive id="webserver" class="lsb" type="httpd"/>
<?xml version="1.0" ?>
<cib num_updates="1" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="22" admin_epoch="0" cib-last-written="Thu Jul 16 06:03:57 2015" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="hadoop2.abc.com" uname="hadoop2.abc.com">
        <instance_attributes id="nodes-hadoop2.abc.com">
          <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/>
        </instance_attributes>
      </node>
      <node id="hadoop1.abc.com" uname="hadoop1.abc.com">
        <instance_attributes id="nodes-hadoop1.abc.com">
          <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/>
        </instance_attributes>
      </node>
    </nodes>
    <resources>
      <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">
        <instance_attributes id="webip-instance_attributes">
          <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/>
        </instance_attributes>
      </primitive>
      <primitive id="webserver" class="lsb" type="httpd"/>
    </resources>
    <constraints>
      <rsc_order id="webip_befor_webserver" kind="Mandatory" first="webip" first-action="start" then="webserver"/>
      <rsc_colocation id="webserver_with_webip" score="-INFINITY" rsc="webserver" with-rsc="webip"/>
    </constraints>
  </configuration>
</cib>

 

12、更倾向运行在hadoop2节点上

crm(live)configure# location webserver_hadoop2 webserver 200: hadoop2.abc.com
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd ..

 

13、还可以定义资源默认属性

14、定义监控功能

crm(live)resource# stop webip
crm(live)resource# stop webserver
crm(live)resource# status
 webip (ocf::heartbeat:IPaddr): Stopped
 webserver (lsb:httpd): Stopped

资源非法关掉,最后做一次清理

crm(live)resource# cleanup webip
Cleaning up webip on hadoop1.abc.com
Cleaning up webip on hadoop2.abc.com
Waiting for 2 replies from the CRMd.. OK
crm(live)resource# cleanup webserver
Cleaning up webserver on hadoop1.abc.com
Cleaning up webserver on hadoop2.abc.com
Waiting for 2 replies from the CRMd.. OK

crm(live)resource# cd ..
crm(live)# configure
crm(live)configure# help monitor
crm(live)configure#
crm(live)configure# monitor webserver 20s:10s
crm(live)configure# verify
WARNING: webserver: specified timeout 10s for monitor is smaller than the advised 15
crm(live)configure# edit 编辑 monitor interval=30s timeout=15s
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
crm(live)configure# cd ..
crm(live)# status
Last updated: Thu Jul 16 23:13:42 2015
Last change: Thu Jul 16 22:43:40 2015
Stack: classic openais (with plugin)
Current DC: hadoop2.abc.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
2 Resources configured


Online: [ hadoop1.abc.com hadoop2.abc.com ]
crm(live)# resource
crm(live)resource# start webip
crm(live)resource# start webserver

[root@hadoop2 ~]# ss -tnl | grep 80
LISTEN     0      128                      :::80                      :::* 

 

[root@hadoop2 ~]# service httpd stop
停止 httpd:                                               [确定]
[root@hadoop2 ~]# tail -f /var/log/cluster/corosync.log
Jul 16 23:23:03 [7736] hadoop2.abc.com        cib:     info: cib_perform_op:  Diff: --- 0.61.25 2
Jul 16 23:23:03 [7736] hadoop2.abc.com        cib:     info: cib_perform_op:  Diff: +++ 0.61.26 (null)
Jul 16 23:23:03 [7736] hadoop2.abc.com        cib:     info: cib_perform_op:  +  /cib:  @num_updates=26
Jul 16 23:23:03 [7736] hadoop2.abc.com        cib:     info: cib_perform_op:  +  /cib/status/node_state[@id='hadoop2.abc.com']/lrm[@id='hadoop2.abc.com']/lrm_resources/lrm_resource[@id='webserver']/lrm_rsc_op[@id='webserver_monitor_30000']:  @transition-key=1:45:0:5c125c03-7d52-4d11-b5ee-ec4bc424ed07, @transition-magic=0:0;1:45:0:5c125c03-7d52-4d11-b5ee-ec4bc424ed07, @call-id=68, @last-rc-change=1437060183, @exec-time=31
Jul 16 23:23:03 [7736] hadoop2.abc.com        cib:     info: cib_process_request:  Completed cib_modify operation for section status: OK (rc=0, origin=hadoop2.abc.com/crmd/278, version=0.61.26)
Jul 16 23:23:03 [7741] hadoop2.abc.com       crmd:     info: match_graph_event:  Action webserver_monitor_30000 (1) confirmed on hadoop2.abc.com (rc=0)
Jul 16 23:23:03 [7741] hadoop2.abc.com       crmd:   notice: run_graph:  Transition 45 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-95.bz2): Complete
Jul 16 23:23:03 [7741] hadoop2.abc.com       crmd:     info: do_log:  FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Jul 16 23:23:03 [7741] hadoop2.abc.com       crmd:   notice: do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jul 16 23:23:08 [7736] hadoop2.abc.com        cib:     info: cib_process_ping:  Reporting our current digest to hadoop2.abc.com: ae8ef3d1bb7af4518c2c6ce7c4db1f08 for 0.61.26 (0x1b35dc0 0)
^C
[root@hadoop2 ~]# ss -tnl
State      Recv-Q Send-Q                                                                     Local Address:Port                                                                       Peer Address:Port
LISTEN     0      128                                                                                   :::34476                                                                                :::*    
LISTEN     0      128                                                                                   :::111                                                                                  :::*    
LISTEN     0      128                                                                                    *:111                                                                                   *:*    
LISTEN     0      128                                                                                   :::80                                                                                   :::*    
LISTEN     0      128                                                                                   :::22                                                                                   :::*    
LISTEN     0      128                                                                                    *:22                                                                                    *:*    
LISTEN     0      128                                                                            127.0.0.1:631                                                                                   *:*    
LISTEN     0      128                                                                                  ::1:631                                                                                  :::*    
LISTEN     0      100                                                                                  ::1:25                                                                                   :::*    
LISTEN     0      100                                                                            127.0.0.1:25                                                                                    *:*    
LISTEN     0      128                                                                                    *:42907                     

 

 

定义多一个Vip地址

crm(live)# configure
crm(live)configure# primitive vip ocf:heartbeat:IP
ocf:heartbeat:IPaddr     ocf:heartbeat:IPaddr2    ocf:heartbeat:IPsrcaddr 
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 monitor interval=30s timeout=15s
ERROR: syntax in primitive: Unknown arguments: monitor interval=30s timeout=15s near <monitor> parsing 'primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 monitor interval=30s timeout=15s'
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 op monitor interval=30s timeout=15s
crm(live)configure# verify
WARNING: vip: specified timeout 15s for monitor is smaller than the advised 20s
crm(live)configure# delete vip
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 op monitor interval=30s timeout=20s
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show

 

 

 

 

 

 

 

node hadoop1.abc.com \
        attributes standby=off
node hadoop2.abc.com \
        attributes standby=off
primitive vip IPaddr \
        params ip=192.168.1.13 \
        op monitor interval=30s timeout=20s
primitive webip IPaddr \
        params ip=192.168.1.12 \
        meta target-role=Started
primitive webserver lsb:httpd \
        meta target-role=Started \
        op monitor interval=30s timeout=15s
location webip_on_hadoop2 webip 200: hadoop2.abc.com
location webserver_on_hadoop2 webserver 200: hadoop2.abc.com
order webip_befor_webserver Mandatory: webip:start webserver
property cib-bootstrap-options: \
        dc-version=1.1.11-97629de \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes=2 \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        last-lrm-refresh=1437057604

你可能感兴趣的:(corosync配置详解)