HA Cluster之keepalived

HA相关概念:

可用性=平均无故障时间/(平均无故障时间+修复时间) //一般要到达5个9至6个9的可用性;
可通过缩短修复时间提供可用性;
缩短修复时间的方式:提供备用主机,实现Failover(故障转移)
1.转移ip地址,即loating ip(流动的ip地址)
2.转移服务

特定情况下需要转移追踪信息和数据
A.基于rsync+inotify同步
B.共享存储
备用主机如何知道主节点不可用?
主节点周期性向备用节点发送heartbeat(心跳信息)
补充:
SAN:块级别接口(一块硬盘),后端有两台主机都挂载了该SAN,当两台主机同一时刻操作的是同一个文件时,文件系统可能会崩溃;
NAS:文件系统级别
当后端主机使用的SAN的共享存储时,一台主机发生问题,为了防止其自动修复后会争用存储,通常需要用补刀设备,确保出问题的主机无法正常工作;
STONITH:shooting the other node in the head 切断电源
主机都是通过交换机接入到共享存储的,所以在交换机上切断其连接存储的网线也可实现;

Failback:故障夺回;备用主机的性能可能很差,当主节点修复后,应该立即夺回资源;

HA Cluster实现方案:

vrrp协议的实现:keepalived
ais完备HA集群:RHCS(cman),heartbeat,corosync

Vrrp协议:virtual redundant routing protocol 虚拟冗余路由协议

什么是vrrp协议?
将多个物理路由器虚拟成一个或多个虚拟路由器来使用,每个虚拟路由有自己的标识(VRID),虚拟路由内部的物理设备有主节点和备用节点之分,主节点不停的向备用节点发送心跳和优先级信息,备用节点通过心跳信息判定主节点是否有问题,根据优先级决定是否强主节点的资源,抢还是不抢取决于备用节点工作在抢占模式还是非抢占模式下;
抢占式:自己的优先级比别人高,就抢;
非抢占式:自己的优先级比别人高,只要对方能正常工作,就不抢
Vrrp协议仅转移ip地址;

术语:
虚拟路由器:virtual router
虚拟路由器标识:VRID(0-255)
物理路由器:
Master:主设备
Backup:备用设备
Priority:优先级
VIP:virtual ip
VMC:virtual MAC
Keepalived:
vrrp协议的软件实现,只能工作在linux上,实现ip地址漂移;
根据配置生成ipvs规则;
后端主机的健康状态监测;
原生目的高可用lvs;
组件:
Vrrp stack,ipvs wrapper,checkers,配置文件分析器,io复用器,内存管理组件;

配置HA Cluster配置前提:

1.各节点时间同步
2.确保iptables和selinux不会成为阻碍
3.各节点可通过主机名互相通信(对KA并非必须),本地hosts文件
4.各节点基于秘钥认证的ssh服务完成通信(并非必须)

Keepalived安装配置:

CentOS6.4+ 随base仓库提供;
程序环境:
主配置文件:/etc/keepalived/keepalived.conf ``
GLOBAL CONFIGURATION
全局定义
静态路由和地址的相关配置
VRRPD CONFIGURATION
Vrrp同步组;
vrrp instance 对应一个虚拟路由实例
LVS CONFIGURATION
Ipvs集群的vs和rs

配置文件示例:

全局配置段:
global_defs {
   notification_email {
root@localhost
     //定义收件人地址
   }
   notification_email_from [email protected] //定义发件人地址
   smtp_server 192.168.200.1 //邮件服务器地址
   smtp_connect_timeout 30 //连接邮件服务器的超时时常
   router_id LVS_DEVEL //当前物理路由器的标识
   vrrp_mcast_group4 224-239.x.x.x //组播地址
}

虚拟路由配置段:
vrrp_instance VI_1 {  //定义一个虚拟路由实例
    state MASTER 
    interface eth0 //发送heartbeat的网卡
    virtual_router_id 51 //虚拟路由器的id
    priority 100 //优先级
    advert_int 1 //通告时间间隔
    authentication {   //认证
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {    //vip地址
        192.168.200.16/24 dev eno16777736 //vip地址配置在哪块网卡的别名 
    }   
}   
其他可用的配置指令:
nopreempt 非抢占模式
preempt_delay 300 抢占模式下,节点上线后触发新选举操作的延迟时常;
定义通知脚本:
#notify_master 当前节点成为主节点触发的脚本
notify_backup 当前节点转为备用节点时触发的脚本
notify_fault 当前节点转为失败状态时触发的脚本

实践:Vrrp主备模型实现ip地址漂移:

环境:
各节点时间同步,确保iptables及selinux不会成为阻碍
Node1(CentOS 7)172.18.20.7 MASTER
Node2(CentOS 7)172.18.20.8 BACKUP
VIP 172.18.20.100/16
vrrp_mcast_group4 224.0.0.100

配置node 1:

[root@localhost keepalived]# cat keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
    root@localhost
   }
    
   notification_email_from keepalived@localhost
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id node1
   vrrp_mcast_group4 224.0.0.100
}

vrrp_instance myroute {
    state MASTER
    interface ens33
    virtual_router_id 51
    priority 100
    advert_int 1 
    authentication { 
        auth_type PASS
        auth_pass ck2384
    }
    virtual_ipaddress {
    172.18.20.100/16 dev ens33
}
}
配置node 2

[root@localhost keepalived]# cat keepalived.conf
! Configuration File for keepalived

global_defs {
   notification_email {
    root@localhost
   }
    
   notification_email_from keepalived@localhost
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id node2
   vrrp_mcast_group4 224.0.0.100
}

vrrp_instance myroute {
    state BACKUP
    interface eno16777736
    virtual_router_id 51
    priority 98
    advert_int 1 
    authentication { 
        auth_type PASS
        auth_pass ck2384
    }
    virtual_ipaddress {
    172.18.20.100/16 dev eno16777736
}
}

测试:

1.首先启动node2
[root@node2 ~]# systemctl status keepalived 
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-05-30 15:00:36 CST; 6s ago
  Process: 5004 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 5005 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─5005 /usr/sbin/keepalived -D
           ├─5006 /usr/sbin/keepalived -D
           └─5007 /usr/sbin/keepalived -D

May 30 15:00:36 node2 Keepalived_vrrp[5007]: Opening file '/etc/keepalived/keepalived.conf'.
May 30 15:00:36 node2 Keepalived_vrrp[5007]: Configuration is using : 62909 Bytes
May 30 15:00:36 node2 Keepalived_vrrp[5007]: Using LinkWatch kernel netlink reflector...
May 30 15:00:36 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Entering BACKUP STATE
May 30 15:00:36 node2 Keepalived_vrrp[5007]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
May 30 15:00:40 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Transition to MASTER STATE
May 30 15:00:41 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Entering MASTER STATE
May 30 15:00:41 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) setting protocol VIPs.
May 30 15:00:41 node2 Keepalived_vrrp[5007]: VRRP_Instance(myroute) Sending gratuitous ARPs on eno16777736 for 172.18.20.100
May 30 15:00:41 node2 Keepalived_healthcheckers[5006]: Netlink reflector reports IP 172.18.20.100 added
[root@node2 ~]# ip a l
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno16777736:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:cf:ba:90 brd ff:ff:ff:ff:ff:ff
    inet 172.18.20.8/16 brd 172.18.255.255 scope global eno16777736
       valid_lft forever preferred_lft forever
    inet 172.18.20.100/16 scope global secondary eno16777736
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fecf:ba90/64 scope link 
       valid_lft forever preferred_lft forever
2.启动node1,vip配置在了master node1上

[root@node1 ~]# systemctl start keepalived 
[root@node1 ~]# systemctl status keepalived 
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-05-30 03:02:18 EDT; 4s ago
  Process: 49646 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 49647 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─49647 /usr/sbin/keepalived -D
           ├─49648 /usr/sbin/keepalived -D
           └─49649 /usr/sbin/keepalived -D

May 30 03:02:18 node1 Keepalived_vrrp[49649]: Opening file '/etc/keepalived/keepalived.conf'.
May 30 03:02:18 node1 Keepalived_vrrp[49649]: Configuration is using : 62887 Bytes
May 30 03:02:18 node1 Keepalived_vrrp[49649]: Using LinkWatch kernel netlink reflector...
May 30 03:02:18 node1 Keepalived_vrrp[49649]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
May 30 03:02:19 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Transition to MASTER STATE
May 30 03:02:19 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Received lower prio advert, forcing new election
May 30 03:02:20 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Entering MASTER STATE
May 30 03:02:20 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) setting protocol VIPs.
May 30 03:02:20 node1 Keepalived_vrrp[49649]: VRRP_Instance(myroute) Sending gratuitous ARPs on ens33 for 172.18.20.100
May 30 03:02:20 node1 Keepalived_healthcheckers[49648]: Netlink reflector reports IP 172.18.20.100 added
[root@node1 ~]# ip a l
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:e4:53:15 brd ff:ff:ff:ff:ff:ff
    inet 172.18.20.7/16 brd 172.18.255.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 172.18.252.52/16 brd 172.18.255.255 scope global secondary dynamic ens33
       valid_lft 71521sec preferred_lft 71521sec
    inet 172.18.20.100/16 scope global secondary ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::87aa:10b2:67e3:70d0/64 scope link 
       valid_lft forever preferred_lft forever
3.在node2上抓包,可以收到主节点的通告信息
[root@node2 ~]# tcpdump -i eno16777736 -nn host 224.0.0.100
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eno16777736, link-type EN10MB (Ethernet), capture size 65535 bytes
15:04:21.217999 IP 172.18.20.7 > 224.0.0.100: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:04:22.221621 IP 172.18.20.7 > 224.0.0.100: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
15:04:23.227296 IP 172.18.20.7 > 224.0.0.100: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20

你可能感兴趣的:(HA Cluster之keepalived)