corosync是一个Messaging Layer。它和pacemaker组合,被各个linux系统用来实现服务高可。corosync的历史自己google了解下就行了。高可用集群原理,不明白的可查看之前总结的文章。
##实现httpd高可用
规划:
机器一:ip地址=172.16.100.7
机器二:ip地址=172.16.100.2
机器三:时间服务器 ip=172.16.0.1
我们假设使用172.16.100.1多为对外提供服务的地址——即VIP
1、两台机器安装web服务,并在工作目录下提供测试页面
yum install httpd -y
chkconfig httpd off ##关闭httpd服务自启动。
cd /var/www/html
vim 1.html ###为了测试效果,我们在两台机器的web工作目录中提供名字一样但内容不一样的页面1.html 。因此随意写点东西。
2、安装corosync和pacemaker。并安装crmsh配置高可用集群的工具。这里直接使用这个命令行工具。
yum info corosync ##查看corosync是否安装
yum install corosync ##安装corosync
yum info pacemaker ##查看pacemaker是否安装
yum install pacemaker ##安装pacemaker
vim /etc/yum.repos.d/HA.repo ##添加安装crmsh用的yum源。可能网速有点慢,超时的话,多安两遍。
[network_ha-clustering_Stable] name=Stable High Availability/Clustering packages (CentOS_CentOS-6) type=rpm-md baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/ gpgcheck=1 gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6//repodata/repomd.xml.key enabled=1
yum install crmsh -y ##安装crmsh
3、配置两个机器的主机名字。必须配置,集群间通信基于此。
hostname node1.magedu.com ###机器1上执行 ,临时生效
vim /etc/sysconfig/network ###机器1上执行,永久生效
hostname node2.magedu.com ###机器2上执行,临时生效
vim /etc/sysconfig/network ##机器2上执行,永久生效
4、配置两台机器的名称解析(两台机器都执行相同操作),不能基于DNS,来进行主机名《=》ip地址的转换
vim /etc/hosts
172.16.100.7 node1.magedu.com
172.16.100.2 node2.magedu.com
5、将我们规划的ip配置到两台机器上,并互相ping,是否能ping同
ifconfig eth0 172.16.100.7/16 ##机器一配置ip
ifconfig eth0 172.16.100.2/16 ##机器二配置ip
ping node2.magedu.com ##在node1上
6、配置ssh互信通信。
ssh -keygen -t rsa -P '' ##制作秘钥,机器一上执行
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] ##把公钥拷贝到机器二上
然后到机器二上,执行上边相同的操作,把公钥复制到机器一上。
7、时间同步,必须。集群间需要基于这个标准来进行判断,集群中各节点是否有问题、是否隔离该节点。
service ntpd stop ##两台机器上都需要关闭ntpd服务
chkconfig ntpd off ##两台机器都关闭开机自启动ntpd服务
ntpdate 172.16.0.1 ##从172.16.0.1那台机器同步时间,随便找台机器配置上ip,作为ntpd服务器就行。
如果同步时间失败参看:http://www.blogjava.net/spray/archive/2008/07/10/213964.html
crontab -e ##两台机器上都做成计划任务,每5分钟同步一次。必须 。crontab -l ##查看计划任务
*/5 * * * * /usr/sbin/ntpdate 172.16.0.1 &> /dev/null
8、配置corosync,绿色字体为配置文件内容。英文好的man corosync.conf
vim /etc/corosync/corosync.conf ##修改配置文件
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2 ##这里不用改,指的是配置文件版本。不能修改
# secauth: Enable mutual node authentication. If you choose to
# enable this ("on"), then do remember to create a shared
# secret with "corosync-keygen".
secauth: on ##指的是集群间认证开启,防止其他主机加入集群
threads: 2 ##并发开启的线程数。一般单核cpu修改下即可。多核cpu不需要修改
# interface: define at least one interface to communicate
# over. If you define more than one interface stanza, you must
# also set rrp_mode.
interface {
# Rings must be consecutively numbered, starting at 0.
ringnumber: 0
# This is normally the *network* address of the
# interface to bind to. This ensures that you can use
# identical instances of this configuration file
# across all your cluster nodes, without having to
# modify this option.
bindnetaddr: 172.16.0.0 ##集群工作的网段
# However, if you have multiple physical network
# interfaces configured for the same subnet, then the
# network address alone is not sufficient to identify
# the interface Corosync should bind to. In that case,
# configure the *host* address of the interface
# instead:
# bindnetaddr: 192.168.1.1
# When selecting a multicast address, consider RFC
# 2365 (which, among other things, specifies that
# 239.255.x.x addresses are left to the discretion of
# the network administrator). Do not reuse multicast
# addresses across multiple Corosync clusters sharing
# the same network.
mcastaddr: 239.255.1.1 ##多播的地址。集群节点间通讯使用这个多播地址。具体多播地址有哪些可以用。自己查查。
# Corosync uses the port you specify here for UDP
# messaging, and also the immediately preceding
# port. Thus if you set this to 5405, Corosync sends
# messages over UDP ports 5405 and 5404.
mcastport: 5405 ##多播端口号,保持默认即可
# Time-to-live for cluster communication packets. The
# number of hops (routers) that this ring will allow
# itself to pass. Note that multicast routing must be
# specifically enabled on most network routers.
ttl: 1
}
}
logging { ##配置日志存储的部分,不说了
# Log the source file and line where messages are being
# generated. When in doubt, leave off. Potentially useful for
# debugging.
fileline: off
# Log to standard error. When in doubt, set to no. Useful when
# running in the foreground (when invoking "corosync -f")
to_stderr: no
# Log to a log file. When set to "no", the "logfile" option
# must not be set.
to_logfile: yes
logfile: /var/log/cluster/corosync.log
# Log to the system log daemon. When in doubt, set to yes.
to_syslog: yes
# Log debug messages (very verbose). When in doubt, leave off.
debug: off
# Log messages with time stamps. When in doubt, set to on
# (unless you are only logging to syslog, where double
# timestamps can be annoying).
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf{
mode:disabled
}
service { ##启动corosync完成后,就启动pacemaker。
ver:0
name:pacemaker
}
9、生成认证秘钥,并且每个节点都有一份
corosync-keygen
scp authkey corosync.conf node2.magedu.com:/etc/corosync/
10、启动corosync
service iptables stop ##关闭防火墙
setenforce 0 ##关闭selinux防止影响我们
node1上:service corosync start
node2上:在node1上执行,ssh node2.magedu.com 'service corosync start'
11、查看启动是否有错误
(1).查看corosync引擎是否正常启动
1
2
3
|
[root@node1 ~]
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Aug 17 17:31:20 corosync [MAIN ] Corosync Cluster Engine (
'1.4.1'
): started and ready to provide service.
Aug 17 17:31:20 corosync [MAIN ] Successfully
read
main configuration
file
'/etc/corosync/corosync.conf'
.
|
(2).查看初始化成员节点通知是否正常发出
1
2
3
4
5
|
[root@node1 ~]
# grep TOTEM /var/log/cluster/corosync.log
Aug 17 17:31:20 corosync [TOTEM ] Initializing transport (UDP
/IP
Multicast).
Aug 17 17:31:20 corosync [TOTEM ] Initializing transmit
/receive
security: libtomcrypt SOBER128
/SHA1HMAC
(mode 0).
Aug 17 17:31:21 corosync [TOTEM ] The network interface [192.168.1.201] is now up.
Aug 17 17:31:21 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
|
(3).检查启动过程中是否有错误产生
1
2
3
|
[root@node1 ~]
# grep ERROR: /var/log/cluster/corosync.log
Aug 17 17:31:21 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin
for
Corosync. The plugin is not supported
in
this environment and will be removed very soon.
Aug 17 17:31:21 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of
'Clusters from Scratch'
(http:
//www
.clusterlabs.org
/doc
)
for
details on using Pacemaker with CMAN
|
(4).查看pacemaker是否正常启动
1
2
3
4
5
6
|
[root@node1 ~]
# grep pcmk_startup /var/log/cluster/corosync.log
Aug 17 17:31:21 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Aug 17 17:31:21 corosync [pcmk ] Logging: Initialized pcmk_startup
Aug 17 17:31:21 corosync [pcmk ] info: pcmk_startup: Maximum core
file
size is: 18446744073709551615
Aug 17 17:31:21 corosync [pcmk ] info: pcmk_startup: Service: 9
Aug 17 17:31:21 corosync [pcmk ] info: pcmk_startup: Local
hostname
: node1.
test
.com
|
12、crm添加集群资源
crm: 两种模式
交互式:
配置,执行commit命令以后才生效
批处理:
立即生效
##
crm ##键入crm命令
configure ##配置
primitive webip ocf:heartbeat:IPaddr params ip=172.16.100.1 nic=eth0 cid_netmask=16 添加vip资源(对外服务的ip地址) 。注意如果参数的值中有空格要用“” 引起来。
verify ##校验是否有问题
commit ##提交才会生效,这就是“交互式”
show ##查看下配置
show xml ##xml文件格式查看配置文件
primitive httpd lsb:httpd op start timeout=20 ##添加httpd资源
show ##查看下资源
commit ##提交生效
cd ..
status ##查看各个资源服务运行情况
configure ##进入配置命令
group webservice webip httpd ##定义组资源
verify
cd ..
status ##查看状况,此时资源服务都会运行于一个node上
configure
property no-quorum-policy=ignore ##配置下默认策略,因为我们集群只有两个节点,一个down掉另一个是没有法定票数的。需要设置成ignore才能有效。实际情况下,一般我们提供奇数个节点,或者即使是偶数保证节点多些也没有问题。
cd ..
node standy ##测试下,停止一个节点。
status ##查看下服务资源运行的几点是不是变了
node online ##开启刚才关闭的节点
####上边已经完成了httpd高可用,用的方法是资源组来保证资源运行于同一个节点上,下边使用资源约束来保证所有资源运行于同一节点上
resource stop webservice
configure delete webservice ##删除资源组
commit ##提交生效
status 查看下集群资源服务运行状态
configure colocation httpd_with_webip inf: httpd webip ##添加排列约束,保证httpd和webip资源运行于同一个node。
verify ##校验
commit ##提交
status ##查看状态
configure order webip_before_httpd mandatory: webip httpd ##添加顺序约束,保证节点先有ip才会有httpd
verify
status ##查看状态
configure location webip_on_node1 webip rule 100: #uname eq node1.magedu.com ##添加位置约束,保证服务更倾向于在node1节点运行
verify
commit
node standby ##停掉一个节点
status
node online ##在开启停掉的哪个节点。是不是回来了。
常用命令:
verify 检测配置文件是否有问题
crm configure property stonith-enabled=false
property stonith-enables=false
verify
commit
内容有空格加双引号
primitive webip ocf:heartbeat:IPaddr params ip=172.16.100.1 nic=eth0 cidr_netmask=16
verify
show
show xml
stop webip ##停用一个资源
resources
list
start webip
list
migrate ##迁移资源
crm_mon
ra
providers httpd 看看httpd这个ra是谁提供的
classes
list lsb
meta lsb:httpd ##擦看下
configure
primitive httpd lsb:httpd op start timeout=20
show
verify
commit
show
crm status
group webservice webip httpd
verify
crm status
crm node standby / online
crm status
crm configure property no-quorum-policy=ignore
crm show
resource
stop webservice
list
cleanup webservice
cleanup webip
cleanup httpd
cleanstate node1.magedu.com
cleanstate node2.magedu.com
resource
start webservice
edit
verify
show
quit
resource
migrate ##迁移资源
commit
crm status
crm node standby
crm resource
stop webservice
configure
delete webservice
show
commit
configure
help colocation
colocation httpd_with_webip inf: httpd webip
show xml
verify
commit
crm status
order webip_before_httpd mandatory: webip httpd
show xml
commit
crm status
crm node standby
crm_mon
crm node online
location webip_on_node1 webip rule 100: #uname eq node1.magedu.com
show xml
verify
commit
crm status
crm node standy
crm status
crm configure
rsc_defaults resource-stickiness=200 ##设置默认的粘滞性
verify
commit
crm node standby
crm node online
ra
meta ocf:heartbeat:Filesystem
附加:
REHL 6.x RHCS: corosync
RHEL 5.x RHCS: openais, cman, rgmanager
corosync: Messaging Layer
openais: AIS
corosync --> pacemaker
SUSE Linux Enterprise Server: Hawk, WebGUI
LCMC: Linux Cluster Management Console
RHCS: Conga(luci/ricci)
webGUI
keepalived: VRRP, 2节点
参考网址:http://freeloda.blog.51cto.com/2033581/1275528