keepalive的配置解析与一个巧妙配置&keepalived安装与爬坑

以下试验以及说明是经过试验确定了的,准确!!另外,如果想知道每个参数的真正含义,建议看官网

 

解决的问题:

1,当一个节点挂了,2个VIP都浮动到一个节点上

2,当这个节点好了,由于业务有一定的延时,所以还不想浮动IP立马漂移回来

3,如果一个节点的业务完蛋了,需要自己主动交出VIP

4,等自己节点的业务又好了,那么不能立马夺权,而是有一个过渡再夺权

 

      preempt_delay 300 #表示的含义是,我当前是backup身份,但是我发现对方的master不如我,即优先级比我低,那么我不会立马去抢占,而是等五分钟后再去抢占

我是backup身份,但因为我的级别高,所以是实际的掌权者即master,当我发现我的业务已经挂了那么我就降低我的级别,让真正的master去掌权
直到我的级别又上来了,但是我也不会立马夺权,而是等一会再夺权

# Ignore VRRP interface faults (default unset)
dont_track_primary 一旦接口有问题,则忽略之,否则keepalived的代码中对链路有做检查,发现链路down则进入fault状态,于是将放弃所有浮动ip # optional, monitor these as well. go to FAULT state if any of these go down if unweighted. # When a weight is specified in track_interface, instead of setting the vrrp instance to the FAULT state in case of failure, its priority will be # increased by the weight when the interface is up (for positive weights), or decreased by the weight's absolute value when the interface is down # (for negative weights), unless reverse is specified, in which case the direction of adjustment of the priority is reversed. # The weight must be comprised between -253 and +253 inclusive.0 is the default behaviour which means that a failure implies a # FAULT state. The common practice is to use positive weights to count a limited number of good services so that the server with the highest count # becomes master. Negative weights are better to count unexpected failures among a high number of interfaces, as it will not saturate even with high # number of interfaces. Use reverse to increase priority if an interfaces is down track_interface { eth0 eth1 eth2 weight <-253..253> [reverse] ... }

 

vrrp_script  {
           # 脚本的路径,或者直接就是脚本本身
           script |

           # 间隔多长时间执行一次脚本
           interval 

           # seconds after which script is considered to have failed
           timeout    #脚本执行如果没有正确返回,则这段时间后就算超时,然后算作是failed了

           # adjust priority by this weight, (default: 0).For description of reverse, see track_script.
	   # 'weight 0 reverse' will cause the vrrp instance to be down when the script is up, and vice versa.
           weight  [reverse]

           # required number of successes for OK transition
           rise 

           # required number of successes for KO transition
           fall 

           # 以哪个用户身份去执行脚本的人是谁
           user USERNAME [GROUPNAME]

           # 假设初始时脚本是执行失败的
           init_fail
       }

关于weight,rise,fall的综合用法

A   positive   weight   means  that    successes  will  add  to the priority of all VRRP instances which monitor  it.
On  the opposite, a negative weight will be subtracted from the initial priority in case of  failures
解析:rise和正数的weight结合使用,如果rise次脚本执行都是成功的(返回0),则增加weight数量的优先级
fall和负数的weight结合使用,如果是fall次脚本执行都是失败的(返回1),则减少|weight|数量的优先级
其余的组合方式不起任何作用,即不会影响优先级的增减

节点1:

vrrp_script chkBackup {
  script "ps -fe|grep tranproxy |grep -v gre; [[ $? -eq 0 ]] && (/usr/local/bin/socket-client.out 172.18.1.10 14000; [[ $? -eq 0 ]] && exit 0 || exit 1) || exit 1"       ##检查进程是否存在,如果存在检查联通性,如果联通了。则返回0, 如果不存在或者不联通则返回1  
  interval 30
  fall 2 ##2次KO再降级,两次返回1(即两次进程不存在)则优先级下降20
  weight -20
  user root
}

vrrp_instance VI_1 {   state BACKUP
  interface eno2 ###表示发vrrp包的接口,可以选择一对专用接口做心跳线,这里千万注意,网上那些直接抄别人的博客说这个就是绑定vip的接口,真不要脸,简直误人子弟   unicast_src_ip 182.168.1.30 ###从eno2上发的包,如果想要给他搞一个假的ip就用他   unicast_peer { 182.168.1.245   }

dont_track_primary ###这个也很重要,通常心跳线都是主被之间直连,一旦主机掉电(注意,一定是没有电的情况),则备机上的心跳接口链路成DOWN状态,于是keepalived进入FAULT状态,进而放弃了所有vip   virtual_ipaddress {     
192.168.1.33/24 brd 192.168.1.255 dev eno1 label eno1:1 ##vip真正绑定再哪个接口上是在这里配置的,当然如果你不指定,可不就绑定到interface那里配置的那个接口了   }   virtual_router_id 1   priority 110 ##高优先级,实际我是master   track_script   { chkBackup #如果我发现自己挂了,则立马降低自己的优先级,master会立刻夺权   }   preempt_delay 300 ##发现优先级比我高的master,不会立马夺权,而是5分钟后再夺权 }

 

 

节点2:

节点2上的全局配置,节点1上类似,先以这个配置为例进行解析
global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server
127.0.0.1 smtp_connect_timeout 30 router_id k-two2-fst-hx ##一个局域网上id需要唯一,一般使用hostname。wxy:公司的测试环境中可能有多套测试环境,hostname都一样,所以还是不要直接用hostname script_user root enable_script_security }

 

节点2上的实例配置,以其中一个实例为例进行解析
vrrp_instance VI_1 { state MASTER
interface eno2 unicast_src_ip 182.168.1.30 unicast_peer { 182.168.1.245 } virtual_router_id 1 ##虚拟路由id,一对vrrp实例使用一个router id,具体什么含义没再多去研究 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 11111 } virtual_ipaddress { 192.168.1.33/24 brd 192.168.1.255 dev eno1 label eno1:0 } }

附:vrrp报文交互,可以看到使用的是182网段(eno2)的地址,交换的是192网段(eno1)的VIp

keepalive的配置解析与一个巧妙配置&keepalived安装与爬坑_第1张图片

 

 

 

写脚本可能遇到的坑:

vrrp_script chkBackup {
  script "./keepalived_script.sh 172.18.1.10"
  interval 10
  fall 2 ##2次KO再降级
  weight -20
  user root
}

 

报错1:Disabling track script chkBackup since not found/accessible

原因:不能使用相对路径,应该使用绝对路径,改为:

           script "/etc/keepalived/keepalived_script.sh 172.18.1.10"

 

报错2:Error exec-ing command '/etc/keepalived/keepalived_script.sh', error 8: Exec format error

             直接执行脚本是没有问题的

原因:直接执行是用#bash  /etc/keepalived/keepalived_script.sh 172.18.1.10

           所以脚本中必须加上:#!bin/bash

 

报错3:本地没有分到vip,查看日志信息报错为

Keepalived_vrrp[1884]: Assigned address 182.168.1.245 for interface enp5s0
Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: Assigned address fe80::fafd:41aa:f8d4:c6a4 for interface enp5s0
Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: (VI_1) entering FAULT state
Aug 20 11:37:31 one1-fst-hx Keepalived_vrrp[1884]: (VI_2) entering FAULT state

解析:我就奇怪了,要么是MASTER要么是SLAVE state,为什么是fault

原因1:网络问题,找不到被绑定的ip,如下

详解:

virtual_ipaddress {
192.168.1.51/24 brd 192.168.1.255 dev eno1 label eno1:0    ---要绑定eno1
192.168.2.51/24 brd 192.168.1.255 dev ens1f0 label ens1f0:0      ---要绑定ens1f0
}

 

[root@two2-asm-hx keepalived]# ip link
2: eno1: mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000    -----我是被绑定接口1
link/ether ac:1f:6b:d6:0d:ac brd ff:ff:ff:ff:ff:ff
3: eno2: mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000    ---我是心跳接口
link/ether ac:1f:6b:d6:0d:ad brd ff:ff:ff:ff:ff:ff
4: ens1f0: mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000    ---我是被绑定接口2
link/ether 00:1b:21:bf:5c:3c brd ff:ff:ff:ff:ff:ff

 

9月 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Opening file '/etc/keepalived/keepalived.conf'.
9月 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Assigned address 182.168.1.184 for interface eno2
9月 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_1) entering FAULT state
9月 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_2) entering FAULT state
9月 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: Registering gratuitous ARP shared channel
9月 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_1) removing VIPs.
9月 24 22:30:02 two2-asm-hx Keepalived_vrrp[22859]: (VI_2) removing VIPs.

小结:由于被绑定接口没有全部up,因此就认为我的设备有问题,也因此放权,不占用vip

           解决,当然要自己保证想要的接口都是up的,不知道通过配置 track_interface是否可行,简单试验是不行的,但是没有具体的去试验

 

原因2:心跳接口down

9月 24 20:07:37 two2-asm-hx Keepalived_vrrp[14273]: Netlink reports eno2 down   -----因为心跳接口down掉了
9月 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: Netlink reports ens1f0 down
9月 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) Entering FAULT STATE
9月 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) sent 0 priority
9月 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_1) removing VIPs.
9月 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) Entering FAULT STATE
9月 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) sent 0 priority
9月 24 20:07:38 two2-asm-hx Keepalived_vrrp[14273]: (VI_2) removing VIPs

详解1:心跳接口为什么down掉,有一种场景就是因为心跳链路是直连,因此当另一端掉电,则本端的链路也会呈现DOWN状态。

 

详解2:

9月 24 22:10:42 two2-asm-hx Keepalived_vrrp[12568]: Netlink reports eno2 down    ----当发现链路断开后
9月 24 22:10:46 two2-asm-hx Keepalived_vrrp[12568]: Deassigned address 182.168.1.184 from interface eno2     ---我会将心跳接口上的ip地址给去除
9月 24 22:11:04 two2-asm-hx Keepalived_vrrp[12568]: Netlink reports eno2 up       ---当发现链路ok
9月 24 22:11:04 two2-asm-hx Keepalived_vrrp[12568]: Assigned address 182.168.1.184 for interface eno2      --再添加上

小结:这种就是说arp发不出去了,可以通过添加配置改变:dont_track_primary

          此时,就如下log显示,尽管监测到接口down,但是并不改变浮动ip

 

wxy:实际上,这个所谓去除ip是针对keepalived,一旦链路down,即是没有keepalived,内核照样会将ip去掉?

 

-------------------------------------------------------华丽丽的分隔线,接下来是安装以及安装过程中遇到的坑----------------------------------------------------------------------------

一,安装

参考文档:https://blog.csdn.net/yiyangtime/article/details/84899536

需要特别注意的是,

1)其中有一处错误:创建数据库,使用的命令是

/usr/local/sbin/opensipsdbctl create

 

2)利用配置面板配置的时候,如下几个
 /usr/local/sbin/osipsconfig
可以将认证删除,如果不删除,则需要如下添加账户,然后在客户端上添加account:帐号/密码=1000/1000,相当于登陆....

./opensipsctl add 1000 1000
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
new user '1000' added
[root@mail sbin]# ./opensipsctl add 2000 2000
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
new user '2000' added

3)至于tcp,根据自己情况看看是否使用,目前的客户端都是udp的

 

二,爬坑

坑1:启动失败

[root@89 sbin]# ./opensipsctl start
INFO: Starting OpenSIPS :

ERROR: PID file /var/run/opensips.pid does not exist -- OpenSIPS start failed

原因1:经过各种试验得知,原因是debug模式就是如此,将debug关闭,ok

原因2:
tail -f /var/log/messages
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:db_mysql:db_mysql_connect: driver error(1045): Access denied for user 'opensips'@'localhost' (using password: YES)
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:db_mysql:db_mysql_new_connection: initial connect failed
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:db_do_init: could not add connection to the pool
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:uri:mod_init: Could not connect to database
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:init_mod: failed to initialize module uri
Sep 24 21:06:16 mail ./opensips[66657]: ERROR:core:main: error while initializing modules
Sep 24 21:06:16 mail ./opensips[66657]: INFO:core:cleanup: cleanup
Sep 24 21:06:16 mail ./opensips[66657]: NOTICE:core:main: Exiting....
Sep 24 21:06:16 mail opensips: INFO:core:daemonize: pre-daemon process exiting with -1

原来是数据库没有创建,或者是创建错误了,正是因为参考文档中写错了.......


坑2:客户端连接超时
定位过程:起初只是抓包udp协议,发现有来自客户端的注册请求,没有应答,所以一位是opensip安装有恶,于是还重装等各种操作
之后突然想到,应该不过滤抓包才行

解决:完整抓包发现,有应答,为icmp包:主机不可达, host administratively prohibited
知道多半是iptables的问题,尽管关闭的firewall其实还是有效的,于是增加
# iptables -t filter -IINPUT -p udp --dport 5060 -j ACCEPT
问题解决
或者:
systemctl stop iptables.service
systemctl disable iptables.service

/usr/local/opensips/sbin/opensipsctl start


坑3:其他任何失败的问题首先检查防火墙是否关闭!!!
如果是之前没有关闭防火墙,然后创建了应答绑定,此时是发送不出去的
然后关闭防火墙,此时还是不能发送出去
所以,需要再配置udp之前,关闭防火墙

 

你可能感兴趣的:(keepalive的配置解析与一个巧妙配置&keepalived安装与爬坑)