菜鸟学Linux 第089篇笔记 corosync+pacemaker
内容总览
corosync
pacemaker
crmsh
What is High Availability?
Simple Equation: A=MTBF/(MTBF+MTTR)
MTBF = mean time between failures(无故障运行时长)
MTTR = mean time to repair(故障修复时长)
A = probability system will provide service at a random time
(ranging from 0 to 1)
RHEL 5.x RHCS: openais, cman, rgmanager
REHL 6.x RHCS: corosync
corosync: Messaging Layer
openais 插件
openais: AIS
corosync 是一个可以提供集群信息传递的一个软件,用来收集集群信息的 Messaging Layer
pacemaker 是一个crm软件,它可以结合corosync或heartbeat v3来进行集群资源管理
SUSE linux Enterprise Server: Hawk, webGUI
LCMC Linux Cluster Management Console 自学使用gui 的 lcmc
RHCS (RedHat Cluster Suite)
conga(luci(主控台)/ricci(集群节点)) luci webGUI
keepalived: VRRP, 仅支持2节点
配置corosync集群
时间同步
ssh互信
1. 安装pacemaker 和 corosync
yum install pacemaker corosync
yum install crmsh (目前官方未提供可以去 opensuse里找 非官方所写 不过有源码包)
2.配置corosync
/etc/corosync.conf
corosync.conf,添加如下内容:
service {
ver: 0
name: pacemaker
# use_mgmtd: yes
}
aisexec {
user: root
group: root
}
# corosync-keygen
并将其复制到另外一台节点中
# scp authkey corosync.conf [email protected]:/etc/corosync/
# service NetworkManager stop
# chkconfig NetworkManager off
至此便可启动corosync
# serivice corosync start
检查corosync启动是否正确
查看corosync引擎是否正常启动:
# grep -e "Corosync Cluster Engine" /var/log/cluster/corosync.log
# grep -e "configuration file" /var/log/cluster/corosync.log
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
# grep TOTEM /var/log/cluster/corosync.log
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [172.16.100.11] is now up.
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
查看pacemaker是否正常启动:
# grep pcmk_startup /var/log/cluster/corosync.log
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.magedu.com
如果上面命令执行均没有问题,接着可以执行如下命令启动node2上的corosync
# ssh node2 -- /etc/init.d/corosync start
注意:启动node2需要在node1上使用如上命令进行,不要在node2节点上直接启动;
使用如下命令查看集群节点的启动状态:
# crm status
============
Last updated: Tue Jun 14 19:07:06 2011
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.magedu.com node2.magedu.com ]
从上面的信息可以看出两个节点都已经正常启动,并且集群已经处于正常工作状态。
执行ps auxf命令可以查看corosync启动的各相关进程。
root 4665 0.4 0.8 86736 4244 ? Ssl 17:00 0:04 corosync
root 4673 0.0 0.4 11720 2260 ? S 17:00 0:00 \_ /usr/lib/heartbeat/stonithd
101 4674 0.0 0.7 12628 4100 ? S 17:00 0:00 \_ /usr/lib/heartbeat/cib
root 4675 0.0 0.3 6392 1852 ? S 17:00 0:00 \_ /usr/lib/heartbeat/lrmd
101 4676 0.0 0.4 12056 2528 ? S 17:00 0:00 \_ /usr/lib/heartbeat/attrd
101 4677 0.0 0.5 8692 2784 ? S 17:00 0:00 \_ /usr/lib/heartbeat/pengine
101 4678 0.0 0.5 12136 3012 ? S 17:00 0:00 \_ /usr/lib/heartbeat/crmd
crm资源管理交互界面
子模式
resources 资源管理
status 状态查看
configure
group
查询使用
help
meta
资源粘性大于资源约束的location分数时,资源约束的分数会失效
然后接下来就是工具的使用了
cli
crm
pcs(web-gui)
gui
lcmc
crmsh 是一个命令行式接口用来管理集群
添加资源
添加节点 等等 这里不细说 咱等下回分解哈哈 其实是我玩得不太六哈哈
马上过年了,,前年这几天就光找rpm包了所以没有笔记更新时间明显间隔太长,,
回家再继续 keep going!