DRBD:Distributed Replicated Block Dvice 分布式复制块设备,它可以将两个主机的硬盘或者分区做成镜像设备,类似于RAID1的原理,只不过它会将主节点的数据主动通过网络同步到从节点做成镜像,当主节点发生故障,让从节点成为主节点,因为是镜像设备,所以数据不会丢失。corosync在这里的作用就是将drbd通过pacemaker做成高可用服务的资源,以便于主从节点间的自动切换。drbd由于使用各节点之间自身的硬盘设备,因此对于需要共享存储的场景不失为一种节约成本的解决方案。
一,实验规划:
1,本实验采用两个节点node1和node2,系统平台采用Redhat 5.8 x86
node1的ip地址为:192.168.1.65
node2的ip地址为:192.168.1.66
虚拟vip为:192.168.1.60
2,安装drbd相关软件包,配置并启动drbd,测试drbd工作是否正常
3,在各节点安装mysql,这里使用mysql-5.5.42使用通用二进制安装,并测试登录
4,安装corosync以及pacemaker,配置drbd和mysql作为资源实现基于mysql的高可用集群
二,实验步骤:
1,安装配置drbd
(1)准备工作
a.修改主机名,并使其保证重启后仍然生效
node1:
# sed -i 's@\(HOSTNAME=\).*@\1node1.test.com' /etc/sysconfig/network # hostname node1.test.com
node2:
# sed -i 's@\(HOSTNAME=\).*@\1node2.test.com' /etc/sysconfig/network # hostname node2.test.com
b.确保两个节点的主机名解析正常,修改/etc/hosts文件
192.168.1.65 node1.test.com node1 192.168.1.66 node2.test.com node2
c.配置ssh基于秘钥认证(双机互信)
node1:
# ssh-keygen -t rsa # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
node2:
# ssh-keygen -t rsa # ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
d.下载并安装drbd内核模块和用户空间工具,这里使用drbd8.3版本,下载地址为 http://mirrors.sohu.com/centos/5/extras/i386/RPMS/ (这里是centos5,如需其他版本请自行下载)drbd83-8.3.8-1.el5.centos.i386.rpm(用户空间命令行工具) kmod-drbd83-8.3.8-1.el5.centos.i686.rpm (drbd内核模块)
在node1和node2上分别安装即可:
# yum -y --nogpgcheck localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm
e.复制drbd的样例文件为配置文件(node1上完成即可)
# cp /usr/share/doc/drbd83-8.3.8/drbd.conf /etc
f.配置/etc/drbd.d/global-common.conf
global { usage-count no; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; # split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { #wfc-timeout 120; #degr-wfc-timeout 120; } disk { on-io-error detach; #fencing resource-only; } net { cram-hmac-alg "sha1"; shared-secret "mydrbdlab"; } syncer { rate 300M; } }
g.定义一个资源/etc/drbd.d/mydrbd.res,内容如下:
resource mydrbd { device /dev/drbd0; meta-disk internal; disk /dev/vg_root/mydrbd; on node1.test.com { address 192.168.1.65:7789; } on node2.test.com { address 192.168.1.66:7789; } }
h.将node1上的配置文件复制到node2上,使两个节点的配置文件保持一致
# scp -r /etc/drbd.* node2:/etc
i.在两个节点上分别初始化资源并启动服务
# drbdadm create-md web # service drbd start
j.查看启动状态
# drbd-overview 0:mydrbd Connected Secondary/Secondary Inconsistent/Inconsistent C r----
此时,两个节点还都处于不一致状态,因此我们需要手动将一个节点配置为primary节点,这里在node1上配置如下命令:
# drbdadm -- --overwrite-data-of-peer primary mydrbd
再次查看状态:
# drbd-overview 0:mydrbd SyncSource Primary/Secondary UpToDate/Inconsistent C r---- [============>.......] sync'ed: 86.2% (17240/1024300)K delay_probe: 25
数据同步完成后,再次查看:
# drbd-overview
0:mydrbd Connected Primary/Secondary UpToDate/UpToDate C r----
k.在主节点上创建文件系统,drbd只有主节点才能挂载以及读写文件系统,从节点不能挂载
# mkfs.ext3 /dev/drbd0 # mkdir /mydata # mount /dev/drbd0 /mydata
测试将/etc/issue文件复制到/mydata下
# cp /etc/issue /mydata
卸载node1上的文件系统,并将drbd角色转换为secondary,将node2转换为primary,测试查看issue文件是否存在
node1:
# umount /mydata # drbdadm secondary mydrbd
node2:
# drbdadm primary mydrbd # mkdir /mydata # mount /dev/drbd0 /mydata
ls /mydata查看此前复制的issue文件是否存在,如果存在,则drbd配置完成。
2.在node1和node2上安装配置mysql-5.5.42
准备工作:创建mysql用户,mysql组,创建数据目录
# groupadd -g 3306 mysql # useradd -g 3306 -u 3306 -s /sbin/nologin -M mysql # mkdir /mydata/data
(1)下载mysql-5.5.42到本地,并解压缩到/usr/local/
# tar xf mysql-5.5.28-linux2.6-i686.tar.gz -C /usr/local
(2)进入/usr/local/创建软连接
# cd /usr/local # ln -sv mysql-5.5.42-linux2.6-i686 mysql # cd mysql
(3)修改mysql目录的权限
# chown -R root:mysql ./*
(4)初始化mysql数据库
# scripts/mysql_install_db --user=mysql datadir=/mydata/data
(5)复制并修改mysql配置文件和mysql启动脚本
# cp support-files/my-large.cnf /etc/my.cnf # cp support-files/mysql.server /etc/init.d/mysqld
# vim /etc/my.cnf 添加如下内容 datadir = /mydata/data/ innodb_file_per_table = 1
(6)测试启动mysql,并创建测试数据库testdb(测试之前先启动drbd,在两个节点上分别测试)
# service mysqld start # mysql mysql> create database testdb; mysql>\q
(7)测试完成,关闭mysql服务,并使不让其开机启动
# service mysqld stop # chkconfig mysqld off # chkconfig mysqld --list
请确保两个节点的mysql服务都配置完成。
3.安装配置corosync,配置基于drbd的mysql高可用服务
(1)下载安装corosync和pacemaker,下载地址(32位)http://clusterlabs.org/rpm/epel-5/i386/
所需安装包:cluster-glue,cluster-glue-libs,heartbeat, resource-agents,corosync,heartbeat-libs, pacemaker, corosynclib, libesmtp, pacemaker-libs(以下配置在node1上完成)
# yum -y --nogpgcheck localinstall /root/cluster/*.rpm
(2)复制模板配置文件到/etc/corosync/下,并做相应修改
# cd /etc/corosync # cp corosync.conf.example corosync.conf
# vim corosync.conf #修改如下 secauth: on bindnetaddr: 192.168.1.0 #添加如下内容 service { ver: 0 name: pacemaker } aisexec { user: root group: root }
(3)生成节点之间通信的秘钥文件
# corosync-keygen
(4)将corosync.conf和authkey复制到node2上去
# scp -p corosync.conf authkey node2:/etc/corosync/
(5)创建corosync日志所在的目录
# mkdir /var/log/cluster # ssh node2 'mkdir /var/log/cluster'
(6)尝试在node1上启动corosync
# /etc/init.d/corosync start
查看corosync引擎是否正常启动:
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log Jul 28 12:17:28 corosync [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Jul 28 12:17:28 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Jul 28 22:32:06 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:170. Jul 28 22:32:46 corosync [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Jul 28 22:32:46 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
# grep TOTEM /var/log/cluster/corosync.log Jul 28 12:17:28 corosync [TOTEM ] Initializing transport (UDP/IP). Jul 28 12:17:28 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jul 28 12:17:29 corosync [TOTEM ] The network interface [192.168.1.65] is now up. Jul 28 12:17:30 corosync [TOTEM ] Process pause detected for 913 ms, flushing membership messages. Jul 28 12:17:30 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生:
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
查看pacemaker是否正常启动:
# grep pcmk_startup /var/log/cluster/corosync.log Jul 28 12:17:30 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Jul 28 12:17:30 corosync [pcmk ] Logging: Initialized pcmk_startup Jul 28 12:17:30 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295 Jul 28 12:17:30 corosync [pcmk ] info: pcmk_startup: Service: 9 Jul 28 12:17:30 corosync [pcmk ] info: pcmk_startup: Local hostname: node1.test.com Jul 28 22:32:49 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
如果node1上的corosync和pacemaker启动没问题那么就可以启动node2上的相应服务
# ssh node2 '/etc/init.d/corosync start'
(7)查看corosync状态
# crm_mon ============ Last updated: Fri Jul 31 22:52:48 2015 Stack: openais Current DC: node1.test.com - partition with quorum Version: 1.0.12-unknown 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1.hpf.com node2.hpf.com ]
(8)禁用stonith,这里没有stonith设备,因此需要禁用
# crm configure property stonith-enabled=false
(9)配置不具备法定票数的默认策略
# crm configure property no-quorum-policy=ignore
(10)定义资源
a.定义drbd主从资源
# crm crm(live)# configure crm(live)configure# primitive mysqldrbd ocf:heartbeat:drbd params drbd_resource=mydrbd op start timeout=240 op stop timeout=100 op monitor interval=20 role=Master timeout=30 op monitor interval=30 role=Slave timeout=30 crm(live)configure# ms ms_mysqldrbd mysqldrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true crm(live)configure# verify crm(live)configure# commit
b.定义自动挂载文件系统的资源
crm(live)# configure crm(live)configure# primitive mystore ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/mydata fstype=ext3 op start timeout=60 op stop timeout=60
c.定义资源约束,位置约束和顺序约束
crm(live)configure#
crm(live)configure#
d.定义mysql资源
crm(live)configure#
e.定义vip
crm(live)configure# primitive myip ocf:heartbeat:IPaddr params ip=192.168.1.60 nic=eth0 cidr_netmask=255.255.255.0 crm(live)configure# colocation myip_with_ms_mysqldrbd inf: ms_mysqldrbd:Master myip vip和drbd主节点在一起,这里启动顺序无所谓 crm(live)configure# verify crm(live)configure# commit
至此,基于drbd和corosync实现的mysql高可用集群就完成了,可以切换主从测试服务的效果。
# crm node standby # crm node online