高可用集群
LB:(lvs/nginx (http/upstream,stream/upstream))
HP
HA Cluster(high availability
cluster):集群就是一组计算机,它们作为一个整体向用户提供一组网络资源。这些单个计算机系统就是集群的节点(node)。 高可用集群软件的主要作用就是实现故障检查和业务切换的自动化。只有两个节点的高可用集群又称为双机热备,即使用两台服务器互相备份。当一台服务器出现故障时,可由另一台服务器承担服务任务,从而在不需要人工干预的情况下,自动保证系统能持续对外提供服务。双机热备只是高可用集群的一种,高可用集群系统更可以支持两个以上的节点,提供比双机热备更多、更高级的功能,更能满足用户不断出现的需求变化。
资源:组成一个高可用集群的“组件”
高可用集群的衡量标准
HA=MTTF/(MTTF+MTTR)*100% MTTF:平均无故障时间; MTTR:平均修复时间;* SPof:Single Point of Failure 单点故障; 具体的HA衡量标准: * 99% 一年宕机时间不超过4天 * 99.9% 一年宕机时间不超过10小时 * 99.99% 一年宕机时间不超过1小时 * 99.999% 一年宕机时间不超过6分钟 **提高系统高可用的解决方案之降低MTTR,即采用冗余方案**
系统故障
硬件故障:设计缺陷、wear out、自然灾害、....
软件故障:设计缺陷、
高可用模型
双主 active/active active<--->heartbeat<--->active
主备 active/passive active---->heartbeat---->passive
高可用指的是“服务”的高可用 HA Nginx service: vip/Nginx process[/shared storage]
在构建集群时:一般采用奇数个节点;
网络分区
隔离设备: node隔离:node STONITH = SHooting the other head;关闭故障设备的电源(闪断); 资源隔离:fence 关闭故障设备访问存储的网络端口; quorum: with quorum 大于1/2投票; whitout quorum 小于1/2投票
failover:故障切换,某资源故障时,将资源转移至其他节点的操作; failback: 故障移回,即某资源的主节点故障后重新修改上线后,将转移至其它节点的资源重新切回的过程;
HA Cluster实现方案:
vrrp协议的实现 keepalived ais:完备的HA集群 RHCS(cman) heartbeat corosync
keepalived
基于vrrp协议实现高可用 vrrp:Virtual Redundant Routing Protocol 虚拟冗余路由协议; 术语 虚拟路由器:Virtual Router 虚拟路由器标识:Vrid (0-255) 物理路由器: master:主设备 通告心跳通告备用设备节点自己的工作状态; backup:备用设备 priority: VIP:Virtual IP VMAC:Virutal MAC (00-00-5E-00-01-vrid) 通告:心跳,优先级等;周期性; 抢占式,非抢占式; 安全工作: 认证:(3中工作方式) 无认证; 简单字符认证; MD5 工作模型:主主,主备 keepalived vrrp协议的软件实现,原声设计的目的为了高可用的ipvs服务; vrrp协议完成地址流动; 为vip地址所在的节点生成ipvs规则(在配置文件中预先定义); 为ipvs集群的各RS做健康状态监测; 基于脚本调用接口通过执行脚本完成脚本中定义的功能,进而影响集群事务; keepalived的核心组件 vrrp stack ipvs wrapper checkers
HA Cluster的安装配置
HA Cluster的配置前提
*各节点时间必须同步;(ntp,chrony) *确保iptables及selinux不会成为阻碍 *各节点之间可通过主机名相互通信(对KA并非必须); 一般使用/etc/hosts解析 *各节点之间的root用户可以基于秘钥认证的ssh服务完成互相通信;(并非必须)
keepalived安装配置
关闭selinux和iptables
iptables -vnL
getenforce
程序环境
主配置文件:/etc/keepalived/keeplived.conf 主程序文件:/usr/sbin/keepalived Unit File :keepalived.service Unit File : /etc/sysconfig/keepalived
配置文件组成:
THE HIERACHY GLOBAL CONFIGURATION gloab definitions static routes/address VRRPD CONFIGURATION vrrp synchronzation groups :vrrp同步组 vrrp instances:每个vrrp instance即一个vrrp路由器; LVS CONFIGURATION virtual server groups virtual servers:ipvs集群的vs和rs
单主配置示例
node1 global_defs { notification_email { root@localhost } notification_email_from keepalived@localhost 邮件发送服务器 smtp_server 127.0.0.1 邮件服务器地址 smtp_connect_timeout 30 服务器超时时间 router_id node1 节点1 vrrp_mcast_group 224.0.0.80 多播地址 } vrrp_instance VI_1 { state MASTER interface eno16777736 virtual_route_id 88 priority 100 advert_int 1 authentication { auth_type PASS ahth_pass fajinide } virtual_ipaddress { 172.16.80.80/16 dev eno16777736 label eno16777736:0 } track_interface { eno16777736 } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keppalived/notify.sh backup" notify_fault "/etc/keppalived/notify.sh fault" } node2 global_defs { notification_email { root@localhost } notification_email_from keepalived@localhost 邮件发送服务器 smtp_server 127.0.0.1 邮件服务器地址 smtp_connect_timeout 30 服务器超时时间 router_id node2 节点1 vrrp_mcast_group 224.0.0.80 多播地址 } vrrp_instance VI_1 { state BACKUP interface eno16777736 virtual_route_id 88 priority 98 advert_int 1 每隔多长时间进行通告 authentication { auth_type PASS ahth_pass fajinide } virtual_ipaddress { 172.16.80.80/16 dev eno16777736 label eno16777736:0 虚拟地址 } track_interface { eno16777736 } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keppalived/notify.sh backup" notify_fault "/etc/keppalived/notify.sh fault" }
配置语法:
配置虚拟路由器: vrrp_instance{ .... } 专用参数: state MASTER|BACKUP:当前节点在此虚拟路由器上的初始状态;只能有一个是MASTER,余下的都应该为BACKUP; interface IFACE_NAME:绑定为当前虚拟路由器使用的物理接口; virtual_router_id VRID:当前虚拟路由器的惟一标识,范围是0-255; priority 100:当前主机在此虚拟路径器中的优先级;范围1-254; advert_int 1:vrrp通告的时间间隔; authentication { auth_type AH|PASS auth_pass } virtual_ipaddress { / brd dev scope label
示例通知脚本
#!/bin/bash # contact='root@localhost' notify() { mailsubject="$(hostname) to be $1, vip floating" mailbody="$(date +'%F %T'): vrrp transition, $(hostname) changed to be $1" echo "$mailbody" | mail -s "$mailsubject" $contact } case $1 in master) notify master ;; backup) notify backup ;; fault) notify fault ;; *) echo "Usage: $(basename $0) {master|backup|fault}" exit 1 ;; esac
双主模型示例
server1的配置: global_defs { notification_email { root@localhost } notification_email_from keepalived@localhost smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id node1 vrrp_mcast_group4 224.0.100.19 } vrrp_instance VI_1 { state MASTER interface eno16777736 virtual_router_id 14 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 571f97b2 } virtual_ipaddress { 10.1.0.91/16 dev eno16777736 } } vrrp_instance VI_2 { state BACKUP interface eno16777736 virtual_router_id 15 priority 98 advert_int 1 authentication { auth_type PASS auth_pass ab8f07b2 } virtual_ipaddress { 10.1.0.92/16 dev eno16777736 } } server2 的配置 global_defs { notification_email { root@localhost } notification_email_from keepalived@localhost smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id node1 vrrp_mcast_group4 224.0.100.19 } vrrp_instance VI_1 { state BACKUP interface eno16777736 virtual_router_id 14 priority 98 advert_int 1 authentication { auth_type PASS auth_pass 571f97b2 } virtual_ipaddress { 10.1.0.91/16 dev eno16777736 } } vrrp_instance VI_2 { state MASTER interface eno16777736 virtual_router_id 15 priority 100 advert_int 1 authentication { auth_type PASS auth_pass ab8f07b2 } virtual_ipaddress { 10.1.0.92/16 dev eno16777736 } }
state priority值需改变
虚拟服务器定义
配置参数: virtual_server IP port | virtual_server fwmark int
{ ... real_server { ... } ... }
常用参数:
公共配置定义部分:
delay_loop
lb_algo rr|wrr|lc|wlc|lblc|sh|dh:定义调度方法;
lb_kind NAT|DR|TUN:集群的类型;
persistence_timeout
protocol TCP:服务协议,仅支持TCP;
sorry_server
real_serverd的定义部分:
real_server
weight
notify_up
notify_down
}
HTTP_GET|SSL_GET:应用层检测 HTTP_GET|SSL_GET{ url { path:定义要监控的URL; status_code :判断上述检测机制为健康状态的响应码; digest :判断上述检测机制为健康状态的响应的内容的校验码; **status_code和digest只要一种即可** } nb_get_retry :重试次数;一般需配置 delay_before_retry :重试之前的延迟时长;一般需配置 connect_ip :向当前RS的哪个IP地址发起健康状态检测请求 connect_port :向当前RS的哪个PORT发起健康状态检测请求 bindto :发出健康状态检测请求时使用的源地址; bind_port :发出健康状态检测请求时使用的源端口; connect_timeout :连接请求的超时时长;一般需配置 } TCP_CHECK { connect_ip :向当前RS的哪个IP地址发起健康状态检测请求 connect_port :向当前RS的哪个PORT发起健康状态检测请求 bindto :发出健康状态检测请求时使用的源地址; bind_port :发出健康状态检测请求时使用的源端口; connect_timeout :连接请求的超时时长; }
高可用的ipvs集群示例:
! Configuration File for keepalived global_defs { notification_email { root@localhost } notification_email_from keepalived@localhost smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id node1 vrrp_mcast_group4 224.0.100.19 } vrrp_instance VI_1 { state MASTER interface eno16777736 virtual_router_id 14 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 571f97b2 } virtual_ipaddress { 10.1.0.93/16 dev eno16777736 } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keepalived/notify.sh backup" notify_fault "/etc/keepalived/notify.sh fault" } virtual_server 10.1.0.93 80 { delay_loop 3 lb_algo rr lb_kind DR protocol TCP sorry_server 127.0.0.1 80 real_server 10.1.0.69 80 { weight 1 HTTP_GET { url { path / status_code 200 } connect_timeout 1 nb_get_retry 3 delay_before_retry 1 } } real_server 10.1.0.71 80 { weight 1 HTTP_GET { url { path / status_code 200 } connect_timeout 1 nb_get_retry 3 delay_before_retry 1 } } } ! Configuration File for keepalived global_defs { notification_email { root@localhost } notification_email_from kaadmin@localhost smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id node1 vrrp_mcast_group4 224.0.100.67 } vrrp_instance VI_1 { state MASTER interface eno16777736 virtual_router_id 44 priority 100 advert_int 1 authentication { auth_type PASS auth_pass f1bf7fde } virtual_ipaddress { 172.16.0.80/16 dev eno16777736 label eno16777736:0 } track_interface { eno16777736 } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keepalived/notify.sh backup" notify_fault "/etc/keepalived/notify.sh fault" } vrrp_instance VI_2 { state BACKUP interface eno16777736 virtual_router_id 45 priority 98 advert_int 1 authentication { auth_type PASS auth_pass f2bf7ade } virtual_ipaddress { 172.16.0.90/16 dev eno16777736 label eno16777736:1 } track_interface { eno16777736 } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keepalived/notify.sh backup" notify_fault "/etc/keepalived/notify.sh fault" } virtual_server fwmark 3 { delay_loop 2 lb_algo rr lb_kind DR nat_mask 255.255.0.0 protocol TCP sorry_server 127.0.0.1 80 real_server 172.16.0.69 80 { weight 1 HTTP_GET { url { path / status_code 200 } connect_timeout 2 nb_get_retry 3 delay_before_retry 3 } } real_server 172.16.0.6 80 { weight 1 HTTP_GET { url { path / status_code 200 } connect_timeout 2 nb_get_retry 3 delay_before_retry 3 } } } keepalived调用外部的辅助脚本进行资源监控,并根据监控的结果状态能实现优先动态调整; 分两步:(1) 先定义一个脚本;(2) 调用此脚本; vrrp_script{ script "" interval INT weight -INT } track_script { SCRIPT_NAME_1 SCRIPT_NAME_2 ... } 示例:高可用nginx服务 ! Configuration File for keepalived global_defs { notification_email { root@localhost } notification_email_from keepalived@localhost smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id node1 vrrp_mcast_group4 224.0.100.19 } vrrp_script chk_down { script "[[ -f /etc/keepalived/down ]] && exit 1 || exit 0" interval 1 weight -5 } vrrp_script chk_nginx { script "killall -0 nginx && exit 0 || exit 1" interval 1 weight -5 } vrrp_instance VI_1 { state MASTER interface eno16777736 virtual_router_id 14 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 571f97b2 } virtual_ipaddress { 10.1.0.93/16 dev eno16777736 } track_script { chk_down chk_nginx } notify_master "/etc/keepalived/notify.sh master" notify_backup "/etc/keepalived/notify.sh backup" notify_fault "/etc/keepalived/notify.sh fault" }