HA Cluster基础及heartbeat实现HA


配置环境

node1:192.168.1.121 CentOS6.7

node2:192.168.1.122 CentOS6.7

node3:192.168.1.123 CentOS6.7

vip 192.168.1.88


配置前准备

   # cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.1.121           node1

192.168.1.122           node2

192.168.1.123           node3 

   #  ssh-keygen -t rsa -P ''

   #  ssh-copy-id -i ~/.ssh/id_rsa.pub node1

   #  ssh-copy-id -i ~/.ssh/id_rsa.pub node2

   #  ssh-copy-id -i ~/.ssh/id_rsa.pub node3

   #  rpm -ivh epel-release-latest-6.noarch.rpm 

   #  yum -y install ansible

   # cat /etc/ansible/hosts 

[ha]

192.168.1.121

192.168.1.122

192.168.1.123

   #  ansible ha -m copy -a 'src=/etc/hosts dest=/etc'

   #  ansible ha -m shell -a 'ntpdate 192.168.1.62'

   # ansible ha -m cron -a 'minute="*/3" job="/usr/sbin/ntpdate 192.168.1.62" name="ntpdate"'


   

01 HA Cluster及Corosync

   

[root@node1 ~]# yum info corosync

Loaded plugins: fastestmirror, refresh-packagekit, security

Determining fastest mirrors

epel/metalink                                            | 5.8 kB     00:00     

 * base: mirrors.163.com

 * epel: mirrors.tuna.tsinghua.edu.cn

 * extras: mirrors.zju.edu.cn

 * updates: mirrors.163.com

base                                                     | 3.7 kB     00:00     

extras                                                   | 3.4 kB     00:00     

updates                                                  | 3.4 kB     00:00     

updates/primary_db                                       | 2.6 MB     00:00     

Available Packages

Name        : corosync

Arch        : x86_64

Version     : 1.4.7

Release     : 5.el6

Size        : 216 k

Repo        : base

Summary     : The Corosync Cluster Engine and Application Programming Interfaces

URL         : http://ftp.corosync.org

License     : BSD

Description : This package contains the Corosync Cluster Engine Executive,

            : several default APIs and libraries, default configuration files,

            : and an init script.

[root@node1 ~]# yum -y install corosync pacemaker

[root@node2 ~]# yum -y install corosync pacemaker


[root@node1 ~]# cd /etc/corosync/

[root@node1 corosync]# cp corosync.conf.example corosync.conf

[root@node1 corosync]# vim corosync.conf

1、开启安全认证

修改

     secauth: off

 secauth: on

2、修改网络地址

bindnetaddr: 192.168.1.0 #本次测试不需要修改


3、末行添加:

service {

ver:    0   

name:   pacemaker

use_mgmtd:  yes 

}


aisexec {

user:   root

group:  root

}


验证网卡是否支持MULTICAST

命令:ip link show


生成安全验证文件

[root@node1 corosync]# corosync-keygen

Corosync Cluster Engine Authentication key generator.

Gathering 1024 bits for key from /dev/random.

Press keys on your keyboard to generate entropy.

Writing corosync key to /etc/corosync/authkey.


#备注:如果随机文件不够1024字节导致无法生成验证文件,可以在服务器随机操作


[root@node1 corosync]# ll

total 24

-r-------- 1 root root  128 Oct 10 09:49 authkey


[root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/

[root@node1 corosync]# ansible ha -m shell -a 'service corosync start'


查看corosync引擎是否正常启动:

[root@node1 corosync]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log

Oct 10 09:55:06 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Oct 10 09:55:06 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.


查看初始化成员节点通知是否正常发出:

[root@node1 corosync]# grep  TOTEM  /var/log/cluster/corosync.log

Oct 10 09:55:06 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Oct 10 09:55:06 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Oct 10 09:55:06 corosync [TOTEM ] The network interface [192.168.1.121] is now up.


检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务;此处可安全忽略。

[root@node1 corosync]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources

Oct 10 09:55:06 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Oct 10 09:55:06 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

Oct 10 09:55:07 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=3434, rc=100)


查看pacemaker是否正常启动:

[root@node1 corosync]# grep pcmk_startup /var/log/cluster/corosync.log

Oct 10 09:55:06 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized

Oct 10 09:55:06 corosync [pcmk  ] Logging: Initialized pcmk_startup

Oct 10 09:55:06 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615

Oct 10 09:55:06 corosync [pcmk  ] info: pcmk_startup: Service: 9

Oct 10 09:55:06 corosync [pcmk  ] info: pcmk_startup: Local hostname: node1


[root@node1 ~]# yum --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm pssh-2.0-1.el6.rf.noarch.rpm

[root@node2 ~]# yum --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm pssh-2.0-1.el6.rf.noarch.rpm


02 使用crmsh配置pacemaker

crmsh没有配置成功,实验无法正常操作

03 drbd基础及应用实现



配置环境

node1:192.168.1.151 CentOS6.5

node2:192.168.1.152 CentOS6.5


配置前提:时间同步、基于主机名访问

在两台主机上各配置一个5G大小的磁盘分区(注:不要格式)

[root@node1 ~]# rpm -ivh drbd-8.4.3-33.el6.x86_64.rpm drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm 

[root@node2 ~]# 

[root@node1 ~]# rpm -qa | grep drbd

drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64

drbd-8.4.3-33.el6.x86_64

[root@node1 ~]# cd /etc/drbd.d/

[root@node1 drbd.d]# vim global_common.conf 

修改 

usage-count yes;

usage-count no;

在diks {}段内添加

on-io-error detach;

在net{}段内添加

cram-hmac-alg "sha1";

shared-secret "mydrbdshared123";

在net{}段后添加

syncer  {

rate 500M;

}  


[root@node1 drbd.d]# vim mystore.res

resource mystore {

device /dev/drbd0;

disk /dev/sda4;

meta-disk internal;

on node1    {

address 192.168.1.151:7789;

}

on node2    {

address 192.168.1.152:7789;

}

}


[root@node1 ~]# scp -r /etc/drbd.* node2:/etc

drbd.conf                                     100%  133     0.1KB/s   00:00    

global_common.conf                            100% 1942     1.9KB/s   00:00    

mystore.res                                   100%  169     0.2KB/s   00:00

[root@node1 ~]# drbdadm create-md mystore

[root@node2 ~]# drbdadm create-md mystore

启动brdb

[root@node1 ~]# service drbd start

[root@node2 ~]# service drbd start

查看brdb的运行状态

[root@node1 ~]# cat /proc/drbd 

指定当前结点为主结点

[root@node1 ~]# drbdadm primary --force mystore

查看结点同步过程

[root@node1 ~]# watch -n1 'cat /proc/drbd'

[root@node1 ~]# mke2fs -t ext4 /dev/drbd0 

[root@node1 ~]# mount /dev/drbd0 /mnt

[root@node1 ~]# cd /mnt/

[root@node1 mnt]# ls

lost+found

[root@node1 mnt]# cp /etc/issue .

切换主从结点

[root@node1 mnt]# cd

[root@node1 ~]# umount /mnt

[root@node1 ~]# drbdadm secondary mystor #把自己降为从结点

[root@node1 ~]# drbd-overview 

0:mystore/0  Connected Secondary/Secondary UpToDate/UpToDate C r----- 

[root@node2 ~]# drbdadm primary mystore

[root@node2 ~]# mount /dev/drbd0 /mnt

[root@node2 ~]# cd /mnt/

[root@node2 mnt]# ls

issue  lost+found

[root@node2 mnt]# vim issue

添加

hell drbd #该内容随机

重新切换主从结果,查看添加的内容,结果一致

04 用drbd实现HA的MySQL

[root@node1 ~]# ansible ha -m shell -a 'service drbd stop'

[root@node1 ~]# ansible ha -m shell -a 'chkconfig drbd off'

由于crm问题,本次测试没有成功

05 corosync、pacemaker集群及pcs


[root@node1 ~]# ansible ha -m shell -a 'yum -y install corosync pacemaker'

[root@node1 ~]# cd /etc/corosync/

[root@node1 corosync]# cp corosync.conf.example corosync.conf

[root@node1 corosync]# vim corosync.conf

修改 

secauth: off

secauth: on

在to_syslog: yes前面加#

在末尾添加

service {

ver: 0

name: pacemaker

}

生成密钥文件

[root@node1 corosync]# corosync-keygen 

[root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/

[root@node1 corosync]# service corosync start

[root@node2 corosync]# service corosync start

[root@node1 corosync]# yum -y install pcs

[root@node2 ~]# yum -y install pcs

[root@node1 corosync]# pcs status

[root@node1 corosync]# pcs property set stonith-enabled=false

[root@node1 corosync]# pcs property set no-quorum-policy=ignore

[root@node1 ~]# service pcsd start

[root@node2 ~]# service pcsd start

定义webip

[root@node1 corosync]# pcs resource create webip ocf:heartbeat:IPaddr params ip=192.168.1.88 op monitor interval=10s timeout=20s

[root@node1 corosync]# pcs status

Cluster name: 

Last updated: Wed Oct 12 13:22:01 2016          Last change: Wed Oct 12 13:21:48 2016 by root via cibadmin on node1

Stack: classic openais (with plugin)

Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 1 resource configured, 2 expected votes


Online: [ node1 node2 ]


Full list of resources:


webip  (ocf::heartbeat:IPaddr):        Started node1

 

 

定义webserver

[root@node1 corosync]# ansible ha -m shell -a 'yum -y install httpd'

[root@node1 corosync]# vim /var/www/html/index.html

node1

[root@node1 corosync]# chkconfig httpd off

[root@node2 ~]# vim /var/www/html/index.html

node2

[root@node2 ~]# chkconfig httpd off

[root@node1 ~]# pcs resource create webserver lsb:httpd op monitor interval=20s timeout=30s

[root@node1 ~]# pcs status

Cluster name: 

Last updated: Wed Oct 12 13:30:34 2016          Last change: Wed Oct 12 13:30:14 2016 by root via cibadmin on node1

Stack: classic openais (with plugin)

Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes


Online: [ node1 node2 ]


Full list of resources:


webip  (ocf::heartbeat:IPaddr):        Started node1

webserver      (lsb:httpd):    Started node2

添加约束

1)colocation约束

[root@node1 ~]# pcs constraint colocation add webserver with webip

[root@node1 ~]# pcs status

Cluster name: 

Last updated: Wed Oct 12 14:34:11 2016          Last change: Wed Oct 12 14:33:50 2016 by root via cibadmin on node1

Stack: classic openais (with plugin)

Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes


Online: [ node1 node2 ]

2)order约束

[root@node1 ~]# pcs constraint order webip then webserver

Adding webip webserver (kind: Mandatory) (Options: first-action=start then-action=start)

3)位置约束

[root@node1 ~]# pcs constraint location webip prefers node1=300

查看约束

1)order约束

[root@node1 ~]# pcs constraint order show

Ordering Constraints:

 start webip then start webserver (kind:Mandatory)

 

2)colocation约束

[root@node1 ~]# pcs constraint colocation show

Colocation Constraints:

 webserver with webip (score:INFINITY)

 

3)查看位置约束

[root@node1 ~]# pcs constraint location show

Location Constraints:

 Resource: webip

Enabled on: node1 (score:300)

使node1节点离线

[root@node1 ~]# pcs cluster standby node1

[root@node1 ~]# pcs status

Cluster name: 

Last updated: Wed Oct 12 14:49:08 2016          Last change: Wed Oct 12 14:48:54 2016 by root via crm_attribute on node1

Stack: classic openais (with plugin)

Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes


Node node1: standby

Online: [ node2 ]


Full list of resources:


webip  (ocf::heartbeat:IPaddr):        Started node2

webserver      (lsb:httpd):    Started node2

离线结点重新上线

[root@node1 ~]# pcs cluster unstandby node1

[root@node1 ~]# pcs status

Cluster name: 

Last updated: Wed Oct 12 14:50:46 2016          Last change: Wed Oct 12 14:50:37 2016 by root via crm_attribute on node1

Stack: classic openais (with plugin)

Current DC: node2 (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum

2 nodes and 2 resources configured, 2 expected votes


Online: [ node1 node2 ]


Full list of resources:


webip  (ocf::heartbeat:IPaddr):        Started node1

webserver      (lsb:httpd):    Started node1

[root@node1 ~]# vim /usr/lib/python2.6/site-packages/pcs/utils.py

[root@node1 src]# ls *rpm

crmsh-1.2.6-4.el6.x86_64.rpm  pssh-2.3.1-2.el6.x86_64.rpm

[root@node1 src]# yum localinstall crmsh-1.2.6-4.el6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm --nogpgcheck -y

测试成功