介绍篇
高可用集群主要是有两个或者多个节点进行工作,ha基本组成部分包括四个部分:
1、位于最底层的信息和基础架构层(Messaging Layer),主要用于节点之间传递心跳信息,故也称为心跳层。节点之间传递心跳信息可以通过广播,组播,单播等方式。
2、第二层为成员关系(Membership)层,这层最重要的作用是主节点通过cluster consensus menbership service(CCM或者CCS)这种服务由第一层提供的信息,来生产一个完整的成员关系。这层主要实现承上启下的作用,承上->将下层产生的信息生产成员关系图传递给上层以通知各个节点的工作状态;启下->将上层对于隔离某一设备予欲实施。
3、第三层为资源分配层(Resource Allocation),真正实现集群服务的层。在该层中每个节点都运行一个集群资源管理器(CRM,cluster Resource Manager),它能为实现高可用提供核心组件,包括资源定义,属性等。在每一个节点上CRM都维护有一个CIB(集群信息机库 XML文档)和LRM(本地资源管理器)组件。
对于CIB,只有工作在DC(主节点)上的文档是可以修改的,其他CIB都是复制DC上的那个文档而来的。对于LRM,是执行CRM传递过来的在本地执行某个资源的执行和停止的具体执行人.当某个节点发生故障之后,是由DC通过PE(策略引擎)和TE(实施引擎)来决定是否抢夺资源。
4、第四层为资源代理层(Resource Agent),集群资源代理(能够管理本节点上的属于集群资源的某一资源的启动,停止和状态信息的脚本),资源代理分为:LSB(/etc/init.d),OCF(比LSB更专业),Legacy heartbeat(v1版本的资源管理)。
系统环境:
安装篇
准备:
1、配置epel源;
2、配置两节点时间同步;
3、配置两节点/etc/hosts,能使用主机名通信;
4、如果可以,你也可以配置一个ansible批量部署;
一、安装DRBD
DRBD共有两部分组成:内核模块(drbd-kmdl)和用户空间的管理工具(drbd)。其中drbd内核模块代码已经整合进Linux内核2.6.33以后的版本中,因此,如果内核版本高于此版本的话,只需要安装管理工具即可;否则,需要同时安装内核模块和管理工具两个软件包,并且此两者的版本号一定要保持对应。
由于网上找到很长时间都无法找到“2.6.32-358.el6.x86_64”内核版本的drbd内核模块;所以我们可以使用源码编译安装或者升级kernel后再下载相应内核版本的RPM包进行安装,下面分别进行介绍:
1、源码编译安装drbd;
1)安装用户空间工具: # cd /tmp # wget http://oss.linbit.com/drbd/8.4/drbd-8.4.1.tar.gz # tar xzf drbd-8.4.1.tar.gz # cd drbd-8.4.1 # ./configure --prefix=/usr/local/drbd --with-km # make KDIR=/usr/src/kernels/2.6.32-358.el6.x86_64/ # make install # mkdir -p /usr/local/drbd/var/run/drbd # cp /usr/local/drbd/etc/rc.d/init.d/drbd /etc/rc.d/init.d # chkconfig --add drbd # chkconfig drbd on 2)安装drbd模块: # cd drbd # make clean # make KDIR=/usr/src/kernels/2.6.32-358.el6.x86_64/ # cp drbd.ko /lib/modules/`uname -r`/kernel/lib/ # depmod
2、升级内核,RPM包安装;
# yum update kernel # cat /boot/grub/grub.conf kernel /vmlinuz-2.6.32-431.20.3.el6.x86_64 # init 6 # yum install drbd-8.4.3-33.el6.x86_64.rpm drbd-kmdl-2.6.32-431.20.3.el6-8.4.3-33.el6.x86_64.rpm -y
3、准备一个5GB的空白分区
# cat fdisk.sh #!/bin/bash fdisk /dev/sdb <<EOF n p 1 +5G w EOF
二、配置DRBD服务
# cat /etc/drbd.d/global_common.conf global { usage-count no; common { protocol C; // 使用同步协议(A:表示异步;B:表示半同步;C:表示同步。) handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { } options { } disk { on-io-error detach; } net { cram-hmac-alg "sha1"; shared-secret "asdasdsdfsd"; } syncer { rate 200M; } } # vim /etc/drbd.d/mystore.res resource mystore { on server12.neo.com { device /dev/drbd0; disk /dev/sdb1; address 192.168.100.12:7789; meta-disk internal; } on server13.neo.com { device /dev/drbd0; disk /dev/sdb1; address 192.168.100.13:7789; meta-disk internal; } } # chkconfig --add drbd # chkconfig drbd off # drbdadm create-md mystore # service drbd start
在某一节点上将其提升为主节点,然后格式化,挂载测试;
[root@server12 ~]# drbdadm primary --force mystore //第一次将节点提升为主时必须使用--force; 而后再次查看状态,可以发现数据同步过程已经开始: [root@server12 ~]# drbd-overview 0:web SyncSource Primary/Secondary UpToDate/Inconsistent C r---- [============>.......] sync'ed: 66.2% (172140/505964)K delay_probe: 35 [root@server12 ~]# mkfs.ext4 /dev/drbd0 [root@server12 ~]# mount /dev/drbd0 /mydata [root@server12 ~]# mkdir /mydata/data [root@server12 ~]# umount /mydata [root@server12 ~]# drbdadm secondary mystore [root@server13 ~]# drbdadm primary mystore [root@server13 ~]# mount /dev/drbd0 /mydata [root@server13 ~]# ls /mydata data lost+found
三、安装Corosync
由于在el6系统的CentOS/RHEL中官方己将crm交互式命令行界面去除了,所以我们需要去互联网上去下载crmsh软件包;
# yum install corosync pacemaker -y # yum install crmsh -y
四、配置Corosync服务
# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf # cat /etc/corosync/corosync.conf compatibility: whitetank totem { version: 2 secauth: on threads: 0 interface { ringnumber: 0 bindnetaddr: 192.168.100.0 mcastaddr: 226.94.112.12 mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes to_syslog: no logfile: /var/log/cluster/corosync.log debug: off timestamp: off logger_subsys { subsys: AMF debug: off } } amf { mode: disabled } service { ver: 0 name: pacemaker } # corosync-keygen # scp -p /etc/corosync/{corosync.conf,authkey} server13:/etc/corosync/ # chkconfig --add corosync # chkconfig corosync off
五、安装mysql
二进制源码安装
[root@server13 ~]# # tar xf mysql-5.6.13-linux-glibc2.5-x86_64.tar.gz -C /usr/local/ [root@server13 ~]# # groupadd -g 3306 -r mysql [root@server13 ~]# # useradd -g mysql -r -d /mydata/data mysql [root@server13 ~]# # mkdir /mydata [root@server13 ~]# # ln -s /usr/local/mysql-5.6.13-linux-glibc2.5-x86_64 /usr/local/mysql [root@server13 ~]# # cd /usr/local/mysql [root@server13 mysql]# chown -R root.mysql . [root@server13 mysql]# scripts/mysql_install_db --user=mysql --datadir=/mydata/data [root@server13 mysql]# cat my.cnf datadir = /mydata/data server_id = 1 socket = /tmp/mysql.sock log_bin = mysql.bin innodb_file_per_table = on [root@server13 mysql]# cp -p support-files/mysql.server /etc/init.d/mysqld [root@server13 mysql]# chkconfig --add mysqld [root@server13 mysql]# chkconfig mysqld off [root@server13 mysql]# service mysqld start [root@server13 mysql]# cat /etc/profile.d/mysql.sh [root@server13 mysql]# . /etc/profile.d/mysql.sh
集群服务资源配置篇
一、配置drbd资源及主从属性
crm(live)configure# primitive drbd ocf:linbit:drbd params drbd_resource=mystore op monitor role=Master interval=10 timeout=30 op monitor role=Slave interval=30 timeout=30 op start timeout=240 op stop timeout=100 on-fail=restart crm(live)configure# master MS_drbd drbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true crm(live)configure# verify crm(live)configure# commit
二、配置文件系统:FS资源
crm(live)configure# primitive FS ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/mydata fstype=ext4 op start timeout=60 op stop timeout=60 crm(live)configure# colocation FS-ON-MS_drbd inf: FS MS_drbd:Master crm(live)configure# order MS_drbd-before-FS inf: MS_drbd:promote FS:start crm(live)configure# verify crm(live)configure# commit
三、配置mysqld资源
crm(live)configure# primitive mysqld lsb:mysqld op monitor interval=10 timeout=30 on-fail=restart crm(live)configure# colocation mysqld-ON-FS inf: mysqld FS crm(live)configure# order FS-before-mysqld inf: FS:start mysqld crm(live)configure# verify crm(live)configure# commit
四、配置vip
crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=192.168.100.100 iflabel=1 crm(live)configure# colocation vip-ON-mysqld inf: vip mysqld crm(live)configure# order mysqld-before-vip inf: mysqld:start vip crm(live)configure# verify crm(live)configure# commit
五、验证
node server12.neo.com node server13.neo.com primitive FS ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/mydata" fstype="ext4" \ op start timeout="60" interval="0" \ op stop timeout="60" interval="0" primitive drbd ocf:linbit:drbd \ params drbd_resource="mystore" \ op monitor role="Master" interval="10" timeout="30" \ op monitor role="Slave" interval="30" timeout="30" \ op start timeout="240" interval="0" \ op stop timeout="100" on-fail="restart" interval="0" primitive mysqld lsb:mysqld \ op monitor interval="10" timeout="30" on-fail="restart" primitive vip ocf:heartbeat:IPaddr2 \ params ip="192.168.100.100" iflabel="1" ms MS_drbd drbd \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation FS-ON-MS_drbd inf: FS MS_drbd:Master colocation mysqld-ON-FS inf: mysqld FS colocation vip-ON-mysqld inf: vip mysqld order FS-before-mysqld inf: FS:start mysqld order MS_drbd-before-FS inf: MS_drbd:promote FS:start order mysqld-before-vip inf: mysqld:start vip property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1406251626"
# crm status Last updated: Fri Jul 25 17:47:17 2014 Last change: Fri Jul 25 17:35:49 2014 via cibadmin on server12.neo.com Stack: classic openais (with plugin) Current DC: server12.neo.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 5 Resources configured Online: [ server12.neo.com server13.neo.com ] Master/Slave Set: MS_drbd [drbd] Masters: [ server12.neo.com ] Slaves: [ server13.neo.com ] FS (ocf::heartbeat:Filesystem): Started server12.neo.com mysqld (lsb:mysqld): Started server12.neo.com vip (ocf::heartbeat:IPaddr2): Started server12.neo.com [root@server12 ~]# netstat -tunlp |grep mysqld tcp 0 0 :::3306 :::* LISTEN 5781/mysqld [root@server12 ~]# # ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:9c:19:4c brd ff:ff:ff:ff:ff:ff inet 192.168.100.12/24 brd 192.168.100.255 scope global eth0 inet 192.168.100.100/24 brd 192.168.100.255 scope global secondary eth0:1 inet6 fe80::20c:29ff:fe9c:194c/64 scope link valid_lft forever preferred_lft forever
模拟故障,切换主节点为server13.neo.com
crm(live)# node standby server12.neo.com crm(live)# status Last updated: Fri Jul 25 17:50:13 2014 Last change: Fri Jul 25 17:49:36 2014 via crm_attribute on server12.neo.com Stack: classic openais (with plugin) Current DC: server12.neo.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 5 Resources configured Node server12.neo.com: standby Online: [ server13.neo.com ] Master/Slave Set: MS_drbd [drbd] Masters: [ server13.neo.com ] Stopped: [ server12.neo.com ] FS (ocf::heartbeat:Filesystem): Started server13.neo.com mysqld (lsb:mysqld): Started server13.neo.com vip (ocf::heartbeat:IPaddr2): Started server13.neo.com [root@server13 ~]# mysql mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mydb | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.03 sec) [root@server13 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:66:20:47 brd ff:ff:ff:ff:ff:ff inet 192.168.100.13/24 brd 192.168.100.255 scope global eth0 inet 192.168.100.100/24 brd 192.168.100.255 scope global secondary eth0:1 inet6 fe80::20c:29ff:fe66:2047/64 scope link valid_lft forever preferred_lft forever