weixin_34245749

Pacemaker+corosync实现高可用集群

一：Pacemaker和corosync概述：

Pacemaker(心脏起搏器），是一个集群管理资源器。但是其不提供心跳信息。pacemaker是一个延续的CRM。Pacemaker到了V3的版本以后

拆分了多个项目，其中pacemaker就是拆分出来的资源管理器。

Heart 3.0拆分之后的组成部分：

*Heartbeat:将原来的消息通信层独立为heartbeat项目，新的heartbeat只负责维护集群各个节点的信息以及他们之间的通信。

*Cluster Glue:相当于一个中间层，它用来将Heartbeat和pacemaker关联起来，主要包含2个部分，即：LRM和STONITH

*Resource Agent:用来控制服务启停，监控服务状态的脚本集合，这些脚本将被LRM调用从而实现各种资源启动，停止，监控等。

*pacemaker：也就是Cluster Resource Manager(简称CRM),用来管理整个HA的控制中心，客户端通过pacemaker来配置管理监控

整个集群。

Pacemaker特点：

&主机和应用程序级别的故障检测和恢复。

&几乎支持任何冗余设置

&同时支持多种集群配置模式

&配置策略处理法定人数损失

&支持应用启动和关机顺序

&支持多种模式的应用程序（如主/从）

&可以测试任何故障或集群的状态

集群组件说明：

*stonith：心跳系统

*LRMD:本地资源管理守护进程。它提供了一个通用的接口支持的资源类型。直接调用资源代理

*pengine:政策引擎。根据当前状态和配置集群计算的下一个状态。产生一个过渡图。包含行动和

依赖关系的列表。

*CIB:集群信息库。包含所有集群选项，节点，资源，他们彼此之间的关系和现状的定义，同步更新到

所有集群节点。 CIB使用XML表示集群的集群中的所有资源的配置和当前状态。CIB的内容会被自动在整个集群中同步

*CRMD:集群资源管理守护进程。主要是消息代理的PENGINE和LRM,还选举一个领导者（DC)统筹活动的集群。

*OPENAIS:OpenAIS的消息和成员层。

*Heartbe：心跳消息层。OpenAIS的一种替代。

*CCM：共识集群成员

Corosync最初只是用来演示OpenAIS集群框架接口规范的一个应用，可以实现HA心跳信息传输功能，RHCS集群套件就是基于corosync

实现。corosync只提供了message layer（即实现Heartbeat+CCM),没有直接提供CRM，一般使用Pacemaker进行资源管理。

Pacemaker是一个开源的高可用资源管理器（CRM),位于HA架构中资源管理，资源代理（RA）这个层次，它不能提供底层心跳信息传递的功能。

要想与对方节点通信需要借助底层的心跳传递服务器，将信息通告给对方。

Corosync主要就是实现集群中Message layer层的功能：完成集群心跳及事务信息的传递,Pacemaker主要实现的是管理集群中的资源(CRM),真正

启用，停止集群中的服务是RA（资源代理）这个子组件。RA的类别有下面几种类型LSB:位于/etc/rc.d/init.d/目录下。至少支持start，stop

restart，status，reload，force-reload等命令。

OCF:/usr/lib/ocf/resource.d/provider/,类似于LSB脚本，但支持start stop status monitor,meta-data;

STONITH：调用stonith设备的功能

systemd:unit file, /usr/lib/systemd/system/这类服务必须设置成开机自启动（enable）。

service:调用用户的自定义脚本

二：部署Pacemaker+corosync

2.1安装软件包

pacemaker依赖corosync，安装pacemaker包会连带安装corosync包；yum -y install pacemaker

[root@node2 ~]# yum -y install pacemaker;ssh root@node1 'yum -y install pacemaker'

[root@node2 ~]# rpm -ql corosync

/etc/corosync

/etc/corosync/corosync.conf.example #配置文件模板

/etc/corosync/corosync.conf.example.udpu

/etc/corosync/service.d

/etc/corosync/uidgid.d

/etc/dbus-1/system.d/corosync-signals.conf

/etc/rc.d/init.d/corosync #服务脚本

/etc/rc.d/init.d/corosync-notifyd

/etc/sysconfig/corosync-notifyd

/usr/bin/corosync-blackbox

/usr/libexec/lcrso

/usr/libexec/lcrso/coroparse.lcrso

...

/usr/sbin/corosync

/usr/sbin/corosync-cfgtool

/usr/sbin/corosync-cpgtool

/usr/sbin/corosync-fplay

/usr/sbin/corosync-keygen #生成节点间通信时用到的认证密钥文件，默认从/dev/random读随机数

/usr/sbin/corosync-notifyd

/usr/sbin/corosync-objctl

/usr/sbin/corosync-pload

/usr/sbin/corosync-quorumtool

/usr/share/doc/corosync-1.4.7

...

/var/lib/corosync

/var/log/cluster #日志文件目录

2.2◆安装crmsh

RHEL自6.4起不再提供集群的命令行配置工具crmsh，默认提供的是pcs；本例中使用crmsh，crmsh依赖于pssh，因此需要一并下载安装

[root@node2 ~]# yum -y install pssh-2.3.1-2.el6.x86_64.rpm crmsh-1.2.6-4.el6.x86_64.rpm

...

Installed:

crmsh.x86_64 0:1.2.6-4.el6 pssh.x86_64 0:2.3.1-2.el6

Dependency Installed:

python-dateutil.noarch 0:1.4.1-6.el6 redhat-rpm-config.noarch 0:9.0.3-44.el6.centos

Complete!

2.3◆配置corosync

cd /etc/corosync/

cp corosync.conf.example corosync.conf

vim corosync.conf，在其中加入：

service { #以插件化方式调用pacemaker

ver: 0

name: pacemaker

# use_mgmtd: yes

}

[root@node2 ~]# cd /etc/corosync/

[root@node2 corosync]# cp corosync.conf.example corosync.conf

[root@node2 corosync]# vim corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank

totem {

version: 2

secauth: on #是否进行消息认证；若启用，使用corosync-keygen生成密钥文件

threads: 0

interface {

ringnumber: 0

bindnetaddr: 192.168.30.0 #接口绑定的网络地址

mcastaddr: 239.255.10.1 #传递心跳信息所使用的组播地址

mcastport: 5405

ttl: 1

}

logging {

fileline: off

to_stderr: no

to_logfile: yes

logfile: /var/log/cluster/corosync.log #日志路径

to_syslog: no

debug: off

timestamp: on #是否记录时间戳；当日志量很大时关闭该项可提高性能

logger_subsys {

subsys: AMF

debug: off

}

#下面这段表示以插件的方式调用pacemaker

service {

ver: 0

name: pacemaker

# use_mgmtd: yes

}

2.4◆启动corosync

service corosync start

查看corosync引擎是否正常启动，是否正常读取配置文件：

grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log

查看初始化成员节点通知是否正常发出：

grep TOTEM /var/log/cluster/corosync.log

检查启动过程中是否有错误产生：

grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources

查看pacemaker是否正常启动：

grep pcmk_startup /var/log/cluster/corosync.log

[root@node2 ~]# service corosync start;ssh root@node1 'service corosync start'

Starting Corosync Cluster Engine (corosync): [ OK ]

[root@node2 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log

Apr 28 02:03:08 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Apr 28 02:03:08 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

[root@node2 ~]# grep TOTEM /var/log/cluster/corosync.log

Apr 28 02:03:08 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Apr 28 02:03:08 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Apr 28 02:03:08 corosync [TOTEM ] The network interface [192.168.30.20] is now up.

Apr 28 02:03:08 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

Apr 28 02:03:11 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

Apr 28 02:04:10 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

[root@node2 ~]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources #以下错误提示可忽略

Apr 28 02:03:08 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Apr 28 02:03:08 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

Apr 28 02:03:13 corosync [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=7953, core=true)

...

[root@node2 ~]# grep pcmk_startup /var/log/cluster/corosync.log

Apr 28 02:03:08 corosync [pcmk ] info: pcmk_startup: CRM: Initialized

Apr 28 02:03:08 corosync [pcmk ] Logging: Initialized pcmk_startup

Apr 28 02:03:08 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615

Apr 28 02:03:08 corosync [pcmk ] info: pcmk_startup: Service: 9

Apr 28 02:03:08 corosync [pcmk ] info: pcmk_startup: Local hostname: node2

◆配置接口crmsh的启动命令是crm，其使用方式有两种：

命令行模式，例如 # crm ra list ocf

交互式模式，例如：

# crm

crm(live)# ra

crm(live)ra# list ocf

或者：

# crm

crm(live)# ra list ocf

help：查看帮助信息

end/cd：切回上一级

exit/quit：退出程序

常用子命令：

①status: 查看集群状态

②resource：

start, stop, restart

promote/demote：提升/降级一个主从资源

cleanup：清理资源状态

migrate：将资源迁移到另外一个节点上

③configure：

primitive, group, clone, ms/master（主从资源）

具体用法可使用help命令查看，如crm(live)configure# help primitive

示例：

primitive webstore ocf:Filesystem params device=172.16.100.6:/web/htdocs directory=/var/www/html fstype=nfs op monitor interval=20s timeout=30s

group webservice webip webserver

location, collocation, order

示例：

colocation webserver_with_webip inf: webserver webip

order webip_before_webserver mandatory: webip webserver #mandatory也可换成inf

location webip_on_node2 webip rule inf: #uname eq node2

或location webip_on_node2 webip inf: node2

monitor #pacemaker具有监控资源的功能

monitor [:] [:]

例如：monitor webip 30s:20s

very：CIB语法验证

commit：将更改后的信息提交写入CIB（集群信息库）

注意：配置完后要记得very和commit

show：显示CIB对象

edit：直接以vim模式编辑CIB对象

refresh：重新读取CIB信息

delete：删除CIB对象

erase：擦除所有配置

④node：

standby：让节点离线，强制其成为备节点

online：让节点重新上线

fence：隔离节点

clearstate：清理节点状态信息

delete：删除一个节点

⑤ra：

classes：查看资源代理有哪些种类

有四种：lsb, ocf, service, stonith

list []：列出资源代理

例如：

list ocf #列出ocf类型的资源代理

list ocf linbit #列出ocf类型中，由linbit提供的资源代理

meta/info [:[:]] #查看一个资源代理的元数据，主要是查看其可用参数

例如：info ocf:linbit:drbd

或 info ocf:drbd

或 info drbd

providers []：显示指定资源代理的提供者

例如：providers apache

crm(live)# help #查看有哪些子命令或获取帮助信息

This is crm shell, a Pacemaker command line interface.

Available commands:

cib manage shadow CIBs

resource resources management #资源管理

configure CRM cluster configuration #集群配置

node nodes management #节点管理

options user preferences

history CRM cluster history

site Geo-cluster support

ra resource agents information center #资源代理信息

status show cluster status #显示集群状态

help,? show help (help topics for list of topics)

end,cd,up go back one level

quit,bye,exit exit the program #退出

crm(live)# status #查看集群状态

Last updated: Fri Apr 29 00:19:36 2016

Last change: Thu Apr 28 22:41:38 2016

Stack: classic openais (with plugin)

Current DC: node2 - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

0 Resources configured

Online: [ node1 node2 ]

crm(live)# configure

crm(live)configure# help

...

Commands for resources are: #可配置的资源类型

- `primitive`

- `monitor`

- `group`

- `clone`

- `ms`/`master` (master-slave)

In order to streamline large configurations, it is possible to

define a template which can later be referenced in primitives:

- `rsc_template`

In that case the primitive inherits all attributes defined in the

template.

There are three types of constraints: #可定义的约束

- `location`

- `colocation`

- `order`

...

crm(live)configure# help primitive #查看使用帮助

...

Usage:

...............

primitive {[:[:]]|@

Pacemaker+corosync实现高可用集群

你可能感兴趣的:(Pacemaker+corosync实现高可用集群)