菜鸟学Linux 第089篇笔记 corosync+pacemaker




内容总览

corosync

pacemaker

crmsh






What is High Availability?

Simple Equation: A=MTBF/(MTBF+MTTR)

MTBF = mean time between failures(无故障运行时长)

MTTR = mean time to repair(故障修复时长)

A = probability system will provide service at a random time

(ranging from 0 to 1)





RHEL 5.x RHCS: openais, cman, rgmanager

REHL 6.x RHCS: corosync

corosync: Messaging Layer

openais 插件

openais: AIS




corosync 是一个可以提供集群信息传递的一个软件,用来收集集群信息的 Messaging Layer

pacemaker 是一个crm软件,它可以结合corosync或heartbeat v3来进行集群资源管理

SUSE linux Enterprise Server: Hawk, webGUI

LCMC Linux Cluster Management Console 自学使用gui 的 lcmc


RHCS (RedHat Cluster Suite) 

conga(luci(主控台)/ricci(集群节点)) luci webGUI



keepalived: VRRP, 仅支持2节点




配置corosync集群

时间同步 

ssh互信




1. 安装pacemaker 和 corosync

yum install pacemaker corosync

yum install crmsh (目前官方未提供可以去 opensuse里找 非官方所写 不过有源码包)




2.配置corosync

/etc/corosync.conf


corosync.conf,添加如下内容:

service {

 ver:  0

 name: pacemaker

 # use_mgmtd: yes

}


aisexec {

 user: root

 group:  root

}


# corosync-keygen


并将其复制到另外一台节点中

# scp authkey corosync.conf [email protected]:/etc/corosync/


# service NetworkManager stop

# chkconfig NetworkManager off


至此便可启动corosync

# serivice corosync start




检查corosync启动是否正确


查看corosync引擎是否正常启动:

# grep -e "Corosync Cluster Engine" /var/log/cluster/corosync.log 

# grep -e "configuration file" /var/log/cluster/corosync.log

Jun 14 19:02:08 node1 corosync[5103]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.

Jun 14 19:02:08 node1 corosync[5103]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

Jun 14 19:02:08 node1 corosync[5103]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1397.

Jun 14 19:03:49 node1 corosync[5120]:   [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.

Jun 14 19:03:49 node1 corosync[5120]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.


查看初始化成员节点通知是否正常发出:

# grep  TOTEM  /var/log/cluster/corosync.log

Jun 14 19:03:49 node1 corosync[5120]:   [TOTEM ] Initializing transport (UDP/IP).

Jun 14 19:03:49 node1 corosync[5120]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Jun 14 19:03:50 node1 corosync[5120]:   [TOTEM ] The network interface [172.16.100.11] is now up.

Jun 14 19:03:50 node1 corosync[5120]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.


检查启动过程中是否有错误产生:

# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources


查看pacemaker是否正常启动:

# grep pcmk_startup /var/log/cluster/corosync.log

Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: CRM: Initialized

Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] Logging: Initialized pcmk_startup

Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295

Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: Service: 9

Jun 14 19:03:50 node1 corosync[5120]:   [pcmk  ] info: pcmk_startup: Local hostname: node1.magedu.com


如果上面命令执行均没有问题,接着可以执行如下命令启动node2上的corosync

# ssh node2 -- /etc/init.d/corosync start


注意:启动node2需要在node1上使用如上命令进行,不要在node2节点上直接启动;


使用如下命令查看集群节点的启动状态:

# crm status

============

Last updated: Tue Jun 14 19:07:06 2011

Stack: openais

Current DC: node1.magedu.com - partition with quorum

Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87

2 Nodes configured, 2 expected votes

0 Resources configured.

============


Online: [ node1.magedu.com node2.magedu.com ]


从上面的信息可以看出两个节点都已经正常启动,并且集群已经处于正常工作状态。


执行ps auxf命令可以查看corosync启动的各相关进程。

root      4665  0.4  0.8  86736  4244 ?        Ssl  17:00   0:04 corosync

root      4673  0.0  0.4  11720  2260 ?        S    17:00   0:00  \_ /usr/lib/heartbeat/stonithd

101       4674  0.0  0.7  12628  4100 ?        S    17:00   0:00  \_ /usr/lib/heartbeat/cib

root      4675  0.0  0.3   6392  1852 ?        S    17:00   0:00  \_ /usr/lib/heartbeat/lrmd

101       4676  0.0  0.4  12056  2528 ?        S    17:00   0:00  \_ /usr/lib/heartbeat/attrd

101       4677  0.0  0.5   8692  2784 ?        S    17:00   0:00  \_ /usr/lib/heartbeat/pengine

101       4678  0.0  0.5  12136  3012 ?        S    17:00   0:00  \_ /usr/lib/heartbeat/crmd



crm资源管理交互界面

子模式

resources 资源管理

status 状态查看

configure

group


查询使用

help 

meta



资源粘性大于资源约束的location分数时,资源约束的分数会失效





然后接下来就是工具的使用了

cli

crm

pcs(web-gui)

gui

lcmc




crmsh  是一个命令行式接口用来管理集群  

添加资源

添加节点 等等  这里不细说  咱等下回分解哈哈 其实是我玩得不太六哈哈




马上过年了,,前年这几天就光找rpm包了所以没有笔记更新时间明显间隔太长,,

回家再继续    keep going!