HA相关概念:
可用性=平均无故障时间/(平均无故障时间+修复时间) //一般要到达5个9至6个9的可用性;
可通过缩短修复时间提供可用性;
缩短修复时间的方式:提供备用主机,实现Failover(故障转移)
1.转移ip地址,即loating ip(流动的ip地址)
2.转移服务
特定情况下需要转移追踪信息和数据
A.基于rsync+inotify同步
B.共享存储
备用主机如何知道主节点不可用?
主节点周期性向备用节点发送heartbeat(心跳信息)
补充:
SAN:块级别接口(一块硬盘),后端有两台主机都挂载了该SAN,当两台主机同一时刻操作的是同一个文件时,文件系统可能会崩溃;
NAS:文件系统级别
当后端主机使用的SAN的共享存储时,一台主机发生问题,为了防止其自动修复后会争用存储,通常需要用补刀设备,确保出问题的主机无法正常工作;
STONITH:shooting the other node in the head 切断电源
主机都是通过交换机接入到共享存储的,所以在交换机上切断其连接存储的网线也可实现;
Failback:故障夺回;备用主机的性能可能很差,当主节点修复后,应该立即夺回资源;
HA Cluster实现方案:
vrrp协议的实现:keepalived
ais完备HA集群:RHCS(cman),heartbeat,corosync
Vrrp协议:virtual redundant routing protocol 虚拟冗余路由协议
什么是vrrp协议?
将多个物理路由器虚拟成一个或多个虚拟路由器来使用,每个虚拟路由有自己的标识(VRID),虚拟路由内部的物理设备有主节点和备用节点之分,主节点不停的向备用节点发送心跳和优先级信息,备用节点通过心跳信息判定主节点是否有问题,根据优先级决定是否强主节点的资源,抢还是不抢取决于备用节点工作在抢占模式还是非抢占模式下;
抢占式:自己的优先级比别人高,就抢;
非抢占式:自己的优先级比别人高,只要对方能正常工作,就不抢
Vrrp协议仅转移ip地址;
术语:
虚拟路由器:virtual router
虚拟路由器标识:VRID(0-255)
物理路由器:
Master:主设备
Backup:备用设备
Priority:优先级
VIP:virtual ip
VMC:virtual MAC
Keepalived:
vrrp协议的软件实现,只能工作在linux上,实现ip地址漂移;
根据配置生成ipvs规则;
后端主机的健康状态监测;
原生目的高可用lvs;
组件:
Vrrp stack,ipvs wrapper,checkers,配置文件分析器,io复用器,内存管理组件;
配置HA Cluster配置前提:
1.各节点时间同步
2.确保iptables和selinux不会成为阻碍
3.各节点可通过主机名互相通信(对KA并非必须),本地hosts文件
4.各节点基于秘钥认证的ssh服务完成通信(并非必须)
Keepalived安装配置:
CentOS6.4+ 随base仓库提供;
程序环境:
主配置文件:/etc/keepalived/keepalived.conf ``
GLOBAL CONFIGURATION
全局定义
静态路由和地址的相关配置
VRRPD CONFIGURATION
Vrrp同步组;
vrrp instance 对应一个虚拟路由实例
LVS CONFIGURATION
Ipvs集群的vs和rs
配置文件示例:
全局配置段:
global_defs {
notification_email {
root@localhost
//定义收件人地址
}
notification_email_from [email protected] //定义发件人地址
smtp_server 192.168.200.1 //邮件服务器地址
smtp_connect_timeout 30 //连接邮件服务器的超时时常
router_id LVS_DEVEL //当前物理路由器的标识
vrrp_mcast_group4 224-239.x.x.x //组播地址
}
虚拟路由配置段:
vrrp_instance VI_1 { //定义一个虚拟路由实例
state MASTER
interface eth0 //发送heartbeat的网卡
virtual_router_id 51 //虚拟路由器的id
priority 100 //优先级
advert_int 1 //通告时间间隔
authentication { //认证
auth_type PASS
auth_pass 1111
}
virtual_ipaddress { //vip地址
192.168.200.16/24 dev eno16777736 //vip地址配置在哪块网卡的别名
}
}
其他可用的配置指令:
nopreempt 非抢占模式
preempt_delay 300 抢占模式下,节点上线后触发新选举操作的延迟时常;
定义通知脚本:
#notify_master 当前节点成为主节点触发的脚本
notify_backup 当前节点转为备用节点时触发的脚本
notify_fault 当前节点转为失败状态时触发的脚本
实践:Vrrp主备模型实现ip地址漂移:
环境:
各节点时间同步,确保iptables及selinux不会成为阻碍
Node1(CentOS 7)172.18.20.7 MASTER
Node2(CentOS 7)172.18.20.8 BACKUP
VIP 172.18.20.100/16
vrrp_mcast_group4 224.0.0.100
配置node 1:
[root@localhost keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node1
vrrp_mcast_group4 224.0.0.100
}
vrrp_instance myroute {
state MASTER
interface ens33
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass ck2384
}
virtual_ipaddress {
172.18.20.100/16 dev ens33
}
}
配置node 2
[root@localhost keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node2
vrrp_mcast_group4 224.0.0.100
}
vrrp_instance myroute {
state BACKUP
interface eno16777736
virtual_router_id 51
priority 98
advert_int 1
authentication {
auth_type PASS
auth_pass ck2384
}
virtual_ipaddress {
172.18.20.100/16 dev eno16777736
}
}
测试:
1.首先启动node2
[root@node2 ~]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2017-05-30 15:00:36 CST; 6s ago
Process: 5004 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 5005 (keepalived)
CGroup: /system.slice/keepalived.service
├─5005 /usr/sbin/keepalived -D
├─5006 /usr/sbin/keepalived -D
└─5007 /usr/sbin/keepalived -D
May 30 15:00:36 node2 Keepalived_vrrp[5007]: Opening file '/etc/keepalived/keepalived.conf'.
May 30 15:00:36 node2 Keepalived_vrrp[5007]: Configuration is using : 62909 Bytes
May 30 15:00:36 node2 Keepalived_vrrp[5007]: Using LinkWatch kernel netlink reflector...
May 30 15:00:36 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Entering BACKUP STATE
May 30 15:00:36 node2 Keepalived_vrrp[5007]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
May 30 15:00:40 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Transition to MASTER STATE
May 30 15:00:41 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Entering MASTER STATE
May 30 15:00:41 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) setting protocol VIPs.
May 30 15:00:41 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Sending gratuitous ARPs on eno16777736 for 172.18.20.100
May 30 15:00:41 node2 Keepalived_healthcheckers[5006]: Netlink reflector reports IP 172.18.20.100 added
[root@node2 ~]# ip a l
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno16777736: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:cf:ba:90 brd ff:ff:ff:ff:ff:ff
inet 172.18.20.8/16 brd 172.18.255.255 scope global eno16777736
valid_lft forever preferred_lft forever
inet 172.18.20.100/16 scope global secondary eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fecf:ba90/64 scope link
valid_lft forever preferred_lft forever
2.启动node1,vip配置在了master node1上
[root@node1 ~]# systemctl start keepalived
[root@node1 ~]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2017-05-30 03:02:18 EDT; 4s ago
Process: 49646 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 49647 (keepalived)
CGroup: /system.slice/keepalived.service
├─49647 /usr/sbin/keepalived -D
├─49648 /usr/sbin/keepalived -D
└─49649 /usr/sbin/keepalived -D
May 30 03:02:18 node1 Keepalived_vrrp[49649]: Opening file '/etc/keepalived/keepalived.conf'.
May 30 03:02:18 node1 Keepalived_vrrp[49649]: Configuration is using : 62887 Bytes
May 30 03:02:18 node1 Keepalived_vrrp[49649]: Using LinkWatch kernel netlink reflector...
May 30 03:02:18 node1 Keepalived_vrrp[49649]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
May 30 03:02:19 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Transition to MASTER STATE
May 30 03:02:19 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Received lower prio advert, forcing new election
May 30 03:02:20 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Entering MASTER STATE
May 30 03:02:20 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) setting protocol VIPs.
May 30 03:02:20 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Sending gratuitous ARPs on ens33 for 172.18.20.100
May 30 03:02:20 node1 Keepalived_healthcheckers[49648]: Netlink reflector reports IP 172.18.20.100 added
[root@node1 ~]# ip a l
1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:e4:53:15 brd ff:ff:ff:ff:ff:ff
inet 172.18.20.7/16 brd 172.18.255.255 scope global ens33
valid_lft forever preferred_lft forever
inet 172.18.252.52/16 brd 172.18.255.255 scope global secondary dynamic ens33
valid_lft 71521sec preferred_lft 71521sec
inet 172.18.20.100/16 scope global secondary ens33
valid_lft forever preferred_lft forever
inet6 fe80::87aa:10b2:67e3:70d0/64 scope link
valid_lft forever preferred_lft forever
3.在node2上抓包,可以收到主节点的通告信息
[root@node2 ~]# tcpdump -i eno16777736 -nn host 224.0.0.100
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eno16777736, link-type EN10MB (Ethernet), capture size 65535 bytes
15:04:21.217999 IP 172.18.20.7 > 224.0.0.100: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:04:22.221621 IP 172.18.20.7 > 224.0.0.100: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:04:23.227296 IP 172.18.20.7 > 224.0.0.100: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20