zabbix监控haproxy
http://john88wang.blog.51cto.com
使用HAProxy+Keepalived的方式部署游戏服务器前端负载均衡和高可用,因此需要对HAProxy的监控状况进行实时监控.
本文使用的HAProxy版本是1.4.24
参考官方文档http://cbonte.github.io/haproxy-dconv/configuration-1.4.html 中的
https://github.com/olindata/tribily-zabbix-templates/tree/master/App_HAProxy
https://github.com/jlyheden/zabbix_scripts/tree/master/haproxy
1.监控原理描述
HAProxy提供HTTP页面和状态Unix Socket可以显示HAProxy的状态信息,并且可以以CSV的格式导出。
HTTP页面可以通过类似http://10.10.41.100/status;csv 的方式查看
Unix Socket可以通过
echo "show info;show stat" | sudo socat stdio unix-connect:/tmp/haproxy
本文主要通过第二种方式获取HAProxy的状态信息
在haproxy.cfg配置文件中设置状态socket
stats socket /tmp/haproxy level admin
level后面可以跟级别user,operator,admin
user是最低权限级别,只能看到一些非敏感信息
operator可以看到全部信息,但是只能修改一些非敏感信息
admin可以看到并且操作所有信息,需要慎用.
$echo "show help" | /usr/bin/sudo /usr/bin/socat stdio unix-connect:/tmp/haproxy
Unknown command. Please enter one of the following commands only :
clear counters : clear max statistics counters (add 'all' for all counters)
help : this message
prompt : toggle interactive mode with prompt
quit : disconnect
show info : report information about the running process
show stat : report counters for each proxy and server
show errors : report last request and response errors for each proxy
show sess [id] : report the list of current sessions or dump this session
get weight : report a server's current weight
set weight : change a server's weight
set timeout : change a timeout setting
disable server : set a server in maintenance mode
enable server : re-enable a server that was previously in maintenance mode
show info 报告当前的HAProxy进程信息 Name: HAProxy Version: 1.4.24 Release_date: 2013/06/17 Nbproc: 1 Process_num: 1 Pid: 7020 Uptime: 110d 16h25m55s Uptime_sec: 9563155 Memmax_MB: 0 Ulimit-n: 131101 Maxsock: 131101 Maxconn: 65536 Maxpipes: 0 CurrConns: 14 PipesUsed: 0 PipesFree: 0 Tasks: 26 Run_queue: 1 node: master_loadbalance1 description: lb1 show stat显示HAProxy各个指标的计数 # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkf ail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_cod e,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt, srv_abrt, login_game_pool,FRONTEND,,,24,868,2000,196721023,87244966860,121969199234,0,0,171448,,,,,OPEN,,,,,,,,,1,1,0,,,,0,95,0,628 ,,,,0,195071390,0,1619236,28338,2034,,93,611,196721000,,, login_pool,web1_80,0,0,0,38,2000,8333681,2356031055,2827436427,,0,,0,3,2211,11,UP,30,1,0,902,0,9558963 ,0,,1,2,1,,8329209,,2,1,,199,L7OK,200,1,20,7967292,0,361648,7,0,0,,,,136,0, login_pool,web2_80,0,0,0,63,2000,8333998,2358035705,2826639220,,0,,1,6,2281,13,UP,30,1,0,861,0,9558963 0. pxname: proxy name 1. svname: service name (FRONTEND for frontend, BACKEND for backend, any name for server) 2. qcur: current queued requests 3. qmax: max queued requests 4. scur: current sessions 5. smax: max sessions 6. slim: sessions limit 7. stot: total sessions 8. bin: bytes in 9. bout: bytes out 10. dreq: denied requests 11. dresp: denied responses 12. ereq: request errors 13. econ: connection errors 14. eresp: response errors (among which srv_abrt) 15. wretr: retries (warning) 16. wredis: redispatches (warning) 17. status: status (UP/DOWN/NOLB/MAINT/MAINT(via)...) 18. weight: server weight (server), total weight (backend) 19. act: server is active (server), number of active servers (backend) 20. bck: server is backup (server), number of backup servers (backend) 21. chkfail: number of failed checks 22. chkdown: number of UP->DOWN transitions 23. lastchg: last status change (in seconds) 24. downtime: total downtime (in seconds) 25. qlimit: queue limit 26. pid: process id (0 for first instance, 1 for second, ...) 27. iid: unique proxy id 28. sid: service id (unique inside a proxy) 29. throttle: warm up status 30. lbtot: total number of times a server was selected 31. tracked: id of proxy/server if tracking is enabled 32. type (0=frontend, 1=backend, 2=server, 3=socket) 33. rate: number of sessions per second over last elapsed second 34. rate_lim: limit on new sessions per second 35. rate_max: max number of new sessions per second 36. check_status: status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TMOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx 37. check_code: layer5-7 code, if available 38. check_duration: time in ms took to finish last health check 39. hrsp_1xx: http responses with 1xx code 40. hrsp_2xx: http responses with 2xx code 41. hrsp_3xx: http responses with 3xx code 42. hrsp_4xx: http responses with 4xx code 43. hrsp_5xx: http responses with 5xx code 44. hrsp_other: http responses with other codes (protocol error) 45. hanafail: failed health checks details 46. req_rate: HTTP requests per second over last elapsed second 47. req_rate_max: max number of HTTP requests per second observed 48. req_tot: total number of HTTP requests received 49. cli_abrt: number of data transfers aborted by the client 50. srv_abrt: number of data transfers aborted by the server (inc. in eresp)
需要注意的是如果HAProxy是以多进程方式启动即设置nbproc的值不为1,那么每个进程都可以通过socket显示它的状态信息,所以看到的状态信息是在多个进程间切换的。
2.监控脚本编写
这里有三个监控脚本
haproxy_info.sh 用于收集HAProxy的基本信息
haproxy_pool_discovery.py 用于zabbix通过LLD功能发现各个pool.如:
login_pool:BACKEND,login_pool:web1_80等,通过低级发现可以动态的根据配置文件中配置的后端主机监控各个后端主机的状态
haproxy_stat.sh 通过向stat socket发送show stat命令收集各个状态的值,脚本中会根据,进行判断第二个字段的值,因为有些字段是只有FRONTEND或BACKEND才会有,或者除了 FRONTEND和BACKEND,其他都有等
haproxy_info.sh
#!/bin/bash #This script is used for getting haproxy info such as version ,uptime and number of processes etc metric=$1 stats_socket=/tmp/haproxy info_file=/tmp/haproxy_info.csv echo "show info"|/usr/bin/sudo /usr/bin/socat unix-connect:$stats_socket stdio > $info_file grep $metric $info_file|awk '{print $2}'
haproxy_pool_discovery.py
需要安装socat并且要设置zabbix客户端用户具有sudo权限执行socat
执行vim sudo命令更改
如下
# # Disable "ssh hostname sudo <cmd>", because it will show the password in clear. # You have to run "ssh -t hostname sudo <cmd>". # Defaults !requiretty zabbixagent ALL=(root) NOPASSWD:/usr/bin/socat
#/usr/bin/python #This script is used to discovery disk on the server import subprocess import json args='''echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy|egrep -v '^#|^$'|awk -F',' '{print $1":"$2}' ''' t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0] pools=[] for pool in t.split('\n'): if len(pool) != 0: pools.append({'{#POOL_NAME}':pool}) print json.dumps({'data':pools},indent=4,separators=(',',':'))
执行结果
{ "data":[ { "{#POOL_NAME}":"login_game_pool:FRONTEND" }, { "{#POOL_NAME}":"login_pool:web1_80" }, { "{#POOL_NAME}":"login_pool:web2_80" }, { "{#POOL_NAME}":"login_pool:BACKEND" }, ] }
haproxy_stat.sh
#!/bin/bash # login_game_pool:FRONTEND pool_name=$(echo $1|awk -F':' '{print $1}') server_name=$(echo $1|awk -F':' '{print $2}') metric=$2 stat_socket=/tmp/haproxy stat_file=/tmp/haproxy_stat.csv echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy > $stat_file case $metric in qcur) #current queued requests if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $3}' $stat_file else echo 0 fi ;; qmax) #max queued requests if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $4}' $stat_file else echo 0 fi ;; scur) #current sessions awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $5}' $stat_file ;; smax) #max sessions awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $6}' $stat_file ;; slim) #sessions limit awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $7}' $stat_file ;; stol) #total sessions awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $8}' $stat_file ;; bin) #bytes in awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $9}' $stat_file ;; bout) #bytes out awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $10}' $stat_file ;; dreq) #denied requests #only FRONTEND and BACKEND has this field if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $11}' $stat_file else echo 0 fi ;; dresp) #denied responses awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $12}' $stat_file ;; ereq) #request errors #only FRONTEND has this field if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $13}' $stat_file else echo 0 fi ;; econ) #connection errors #FRONTEND has not this field if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $14}' $stat_file else echo 0 fi ;; eresp) #response errors #FRONTEND has not this field if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $15}' $stat_file else echo 0 fi ;; status) #status awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $18}' $stat_file ;; chkfail) #number of failed checks #FRONTEND and BACKEND has not this field if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then echo 0 else awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $22}' $stat_file fi ;; chkdown) #number of UP->DOWN transitions #FRONTEND has not this field will return 0 if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $23}' $stat_file else echo 0 fi ;; lastchg) #last status change in seconds #FRONTEND has not this field will return 0 if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $24}' $stat_file else echo 0 fi ;; downtime) #total downtime in seconds #FRONTEND has not this field will return 0 if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $25}' $stat_file else echo 0 fi ;; lbtot) #total number of times a server was selected #FRONTEND has not this field if [ "$server_name" != "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $31}' $stat_file else echo 0 fi ;; rate) #number of sessions per second over last elapsed second awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $34}' $stat_file ;; rate_limit) #limit on new sessions per second #only FRONTEND has this field if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $35}' $stat_file else echo 0 fi ;; rate_max) #max number of new sessions per second awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $36}' $stat_file ;; check_status) #status of last health check if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then echo "NULL" else awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $37}' $stat_file fi ;; hrsp_1xx) #http response with 1xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $40}' $stat_file ;; hrsp_2xx) #http response with 2xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $41}' $stat_file ;; hrsp_3xx) #http response with 3xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $42}' $stat_file ;; hrsp_4xx) #http response with 4xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $43}' $stat_file ;; hrsp_5xx) #http response with 5xx code awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $44}' $stat_file ;; req_rate) #HTTP requests per second over last elapsed second #only FRONTEND has this field,others will return 0 if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $47}' $stat_file else echo 0 fi ;; req_rate_max) #max number of HTTP requests per second observed #only FRONTEND has this field,others will return 0 if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $48}' $stat_file else echo 0 fi ;; req_tot) #total number of HTTP requests recevied #only FRONTEND has this field,others will return 0 if [ "$server_name" == "FRONTEND" ];then awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $49}' $stat_file else echo 0 fi ;; *) echo "please input the correct argument" ;; esac
3.zabbix配置文件更改
在/data/zabbix/etc/zabbix_agentd.conf.d/中添加haproxy_status.conf
### Option: UserParameter # User-defined parameter to monitor. There can be several user-defined parameters. # Format: UserParameter=<key>,<shell command> # See 'zabbix_agentd' directory for examples. # # Mandatory: no # Default: # UserParameter= UserParameter=haproxy.info[*],/usr/local/zabbix/bin/haproxy_info.sh $1 UserParameter=haproxy.discovery,/usr/bin/python /usr/local/zabbix/bin/haproxy_pool_discovery.py UserParameter=haproxy.stat[*],/usr/local/zabbix/bin/haproxy_stat.sh $1 $2
4.添加zabbix模板
详细模板参考附件。
http://john88wang.blog.51cto.com/2165294/1568541
使用zabbix监控DRBD状态
http://john88wang.blog.51cto.com/2165294/1584572
线上采用DRBD+Heartbeat+MySQL的方式部署MySQL高可用架构,所以对DRBD的监控也很重要。
一 监控原理
1.使用drbd-overview
$ drbd-overview 0:??not-found?? Connected Primary/Secondary UpToDate/UpToDate C r----- /database ext4 50G 3.7G 44G 8%
如果不是root权限,将不会看到resource名称。
$ sudo drbd-overview 0:r0 Connected Primary/Secondary UpToDate/UpToDate C r----- /database ext4 50G 3.7G 44G 8%
2.查看/proc/drbd
$ cat /proc/drbd version: 8.3.16 (api:88/proto:86-97) GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:1200291012 nr:7644 dw:1200298728 dr:1575405 al:19036 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
C 这个位置表示同步协议是协议C , 可以是B 或 A
I/O 状态标志,共有6个标志位,表示关于这个资源的I/O操作状态信息
r-----
1.I/O suspension 。要么是r表示正在运行,要么是s表示暂停
2.Serial resynchronization。通常情况下是-
3.Peer-initiated sync suspension. 通常情况下是-
4.Peer-initiated sync suspension. 通常情况下是-
5.Locally blocked I/O。通常情况下是-
6.Activity Log update suspension. 通常情况下是-
cs Connetction State 显示定义resource的连接状态
可以有以下几种连接状态:
StandAlone
Disconnecting
Unconnected
BrokenPipe
NetworkFailure
Connected 正常状态
等等
ds disk states 显示磁盘状态
先显示本地磁盘状态,然后再显示远程主机磁盘状态,它们都可能是以下几种状态:
Diskless
Attaching
Failed
Negotiating
Inconsistent
Outdated
DUnknown
Consistent
UpToDate 这个状态表示数据同步一致,是正常状态
ro 资源角色类型
Primary 可读可写
Secondary 不可读不可写
Unknown 这个状态只发生在远端主机
ns network send 发送的数据量,以KBytes表示
nr network received 接收的数据量,以KBytes表示
dw disk write 写入到本地磁盘的数据量,以KBytes表示
dr disk read 从本地读取的数据量,以KBytes表示
al activity log DRBD元数据中活动日志位置更新次数
bm bitmap DRBD元数据中bitmap位置更新次数
lo local count 本地I/O子系统有关DRBD的请求数量
pe pending 已经发送到对端但是还没有得到响应的请求数量
ua unacknowledged 对端通过网络接收到的请求数量,但是它们还没有被答复
ap application pending Number of block I/O requests forwarded to DRBD, but not yet answered by DRBD.
ep
(epochs). Number of epoch objects. Usually 1. Might increase under I/O load when using either the barrier
or the none
write ordering method.
wo
(write order). Currently used write ordering method: b
(barrier),f
(flush), d
(drain) or n
(none).
oos
(out of sync). Amount of storage currently out of sync; in Kibibytes.
3.使用service drbd status查看
$ sudo service drbd status drbd driver loaded OK; device status: version: 8.3.16 (api:88/proto:86-97) GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43 m:res cs ro ds p mounted fstype 0:r0 Connected Primary/Secondary UpToDate/UpToDate C /database ext4
二 监控脚本编写
一般情况下,在生产服务器上只需定义一个resource,便于维护。所以,这里讨论只有一个DRBD resource的监控方法,如果有多个resource可以通过Zabbix的自动发现功能。
drbd_status.sh
#!/bin/bash #gather drbd status via /proc/drbd #$ cat /proc/drbd #version: 8.3.16 (api:88/proto:86-97) #GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43 # 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- # ns:1202588332 nr:7644 dw:1202596048 dr:1575405 al:19216 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 #assume only one resource defined,default is r0 status_file=/proc/drbd metric=$1 case $metric in version) cat $status_file|grep "version"|awk '{print $2}' ;; name) cat $status_file|grep "cs"|awk '{print $1}'|tr -d ":" ;; cs) cat $status_file|grep "cs"|awk '{print $2}'|awk -F":" '{print $2}' ;; ro) cat $status_file|grep -v "version"|grep "ro"|awk '{print $3}'|awk -F":" '{print $2}' ;; ds) cat $status_file|grep -v "version"|grep "ds"|awk '{print $4}'|awk -F":" '{print $2}' ;; protocol) cat $status_file|grep -v "version"|grep "cs"|awk '{print $5}' ;; ns) cat $status_file|grep "ns"|awk '{print $1}'|awk -F":" '{print $2}' ;; nr) cat $status_file|grep "nr"|awk '{print $2}'|awk -F":" '{print $2}' ;; dw) cat $status_file|grep "dw"|awk '{print $3}'|awk -F":" '{print $2}' ;; dr) cat $status_file|grep "dr"|awk '{print $4}'|awk -F":" '{print $2}' ;; al) cat $status_file|grep "al"|awk '{print $5}'|awk -F":" '{print $2}' ;; bm) cat $status_file|grep "bm"|awk '{print $6}'|awk -F":" '{print $2}' ;; lo) cat $status_file|grep "lo"|awk '{print $7}'|awk -F":" '{print $2}' ;; pe) cat $status_file|grep "pe"|awk '{print $8}'|awk -F":" '{print $2}' ;; ua) cat $status_file|grep "ua"|awk '{print $9}'|awk -F":" '{print $2}' ;; ap) cat $status_file|grep -v "version"|grep "ap"|awk '{print $10}'|awk -F":" '{print $2}' ;; ep) cat $status_file|grep "ep"|awk '{print $11}'|awk -F":" '{print $2}' ;; wo) cat $status_file|grep "wo"|awk '{print $12}'|awk -F":" '{print $2}' ;; oos) cat $status_file|grep "oos"|awk '{print $13}'|awk -F":" '{print $2}' ;; *) echo "unknown parameters" esac
添加zabbix子配置文件drbd_status.conf
UserParameter=drbd.status[*],/usr/local/zabbix/bin/drbd_status.sh $1
三 添加监控模板
这里注意一下触发表达式
{Template DRBD:drbd.status[ro].str(Secondary/Primary)}#1 & {Template DRBD:drbd.status[ro].str(Primary/Secondary)}#1
参考文章:
http://blog.pandorafms.org/?p=1944
http://drbd.linbit.com/docs/about/
zabbix监控lvs连接
一、环境说明
zabbix:2.0.6
ipvsadm:1.24
OS:CentOS 6.4 x86
dip:192.168.100.14
rip:192.168.100.22
rip:192.168.100.24
rip:192.168.100.76
rip:192.168.100.101
二、新建脚本
[root@lvs-master zabbix]# pwd /data/zabbix/sbin [root@lvs-master zabbix]# cat lvs-status.sh #!/bin/bash # get lvs connection function AllConn { sudo /sbin/ipvsadm -L -n |awk '{print $5}'| awk 'BEGIN{sum=0}{sum+=$1}END{print sum}' } function 101Conn { sudo /sbin/ipvsadm -L -n | grep 100.101|awk '{print $5}' } function 22Conn { sudo /sbin/ipvsadm -L -n | grep 100.22|awk '{print $5}' } function 24Conn { sudo /sbin/ipvsadm -L -n | grep 100.24|awk '{print $5}' } function 76Conn { sudo /sbin/ipvsadm -L -n | grep 100.76|awk '{print $5}' } function AllInConn { sudo /sbin/ipvsadm -L -n |awk '{print $6}'| awk 'BEGIN{sum=0}{sum+=$1}END{print sum}' } function 101InConn { sudo /sbin/ipvsadm -L -n | grep 100.101|awk '{print $6}' } function 22InConn { sudo /sbin/ipvsadm -L -n | grep 100.22|awk '{print $6}' } function 24InConn { sudo /sbin/ipvsadm -L -n | grep 100.24|awk '{print $6}' } function 76InConn { sudo /sbin/ipvsadm -L -n | grep 100.76|awk '{print $6}' } # Run the requested function $1
三、修改配置文件
[root@lvs-master zabbix]# vim zabbix_agentd.conf ### ipvsadm Active ##### UserParameter=lvs.AllConn[*],/etc/zabbix/lvs-status.sh AllConn UserParameter=lvs.101Conn[*],/etc/zabbix/lvs-status.sh 101Conn UserParameter=lvs.22Conn,/etc/zabbix/lvs-status.sh 22Conn UserParameter=lvs.24Conn,/etc/zabbix/lvs-status.sh 24Conn UserParameter=lvs.76Conn,/etc/zabbix/lvs-status.sh 76Conn ### ipvsadm InActive ##### UserParameter=lvs.AllInConn,/etc/zabbix/lvs-status.sh AllInConn UserParameter=lvs.101InConn,/etc/zabbix/lvs-status.sh 101InConn UserParameter=lvs.22InConn,/etc/zabbix/lvs-status.sh 22InConn UserParameter=lvs.24InConn,/etc/zabbix/lvs-status.sh 24InConn UserParameter=lvs.76InConn,/etc/zabbix/lvs-status.sh 76InConn
[root@lvs-master zabbix]# chmod +x lvs-status.sh
四、排错
由于之前lvs-status.sh 脚本没有加入sudo ,所以看agent日志报如下:
[root@lvs-master zabbix]# tail -f /tmp/zabbix_agentd.log
Can't initialize ipvs: Permission denied (you must be root)
Are you sure that IP Virtual Server is built in the kernel or as module?
解决办法是visudo 修改如下:
[root@lvs-master ~]# visudo
#Defaults requiretty
添加
zabbix ALL=(ALL) NOPASSWD:/sbin/ipvsadm
重启zabbix_agentd服务
service zabbix_agentd restart
五、zabbix server 测试
[root@jumper ~]# zabbix_get -s 192.168.100.14 -p 10050 -k "lvs.AllConn" 2326
六,创建lvs模板
添加二个应用集
接下来创建监控的key
添加图形
最后定义触发器的值
模板在附件里面。
七,相关名词解释
lvs中ipvsadm的ActiveConn和InActConn理解
lvs的activeconn是个让人很迷惑的东东.每次看到这个数巨大而真实机上的活动连接数并不是很高,都很奇怪。
ActiveConn是活动连接数,也就是tcp连接状态的ESTABLISHED;
InActConn是指除了ESTABLISHED以外的,所有的其它状态的tcp连接。
为什么从lvs里看的ActiveConn会比在真实机上通过netstats看到的ESTABLISHED高很多呢?
原来lvs自身有一个默认超时时间.可以用ipvsadm -L --timeout查看,默认是900 120 300,分别是TCP TCPFIN UDP的时间.也就是说一条tcp的连接经过lvs后,lvs会把这台记录保存15分钟,而不管这条连接是不是已经失效!所以如果你的服务器在15分钟以内有大量的并发请求连进来的时候,你就会看到这个数值直线上升.
其实很多时候,我们看lvs的这个连接数是想知道现在的每台机器的真实连接数,怎么样做到这一点呢?
其实知道现在的ActiveConn是怎样产生的,做到这一点就简单了.举个例子:比如你的lvs是用来负载网站,用的模式是dr,后台的web server用的nginx.这时候一条请求过来,在程序没有问题的情况下,一条连接最多也就五秒就断开了.这时候你可以这样设置:ipvsadm --set 5 10 300.设置tcp连接只保持5秒中.如果现在ActiveConn很高你会发现这个数值会很快降下来,直到降到和你用nginx的status看当前连 接数的时候差不多.你可以继续增加或者减小5这个数值,直到真实机的status连接数和lvs里的ActiveConn一致.