一 应用场景描述
使用了多个IDC机房的服务器作为外层代理,但是经常有用户反映网站卡。当联系用户时,又不卡了。所以有必要对每台外网服务器到各个区域的网络质量进行监测以确保不是服务器的网络问题。网络质量监测工具可以使用smokeping,也可以根据smokeping的原理自己开发
二 安装并使用Smokeping
1.安装依赖软件包
Smokeping使用RRDTool来绘图,使用fping,curl,dig等工具来检测,如果部署成单个实例,那么每个实例都需要安装一个Web服务,如果部署成Master/Slave模式,只需要Master安装Web服务就行
yum -y install perl perl-Net-Telnet perl-Net-DNS perl-LDAP perl-libwww-perl perl-RadiusPerl perl-IO-Socket-SSL perl-Socket6 perl-CGI-SpeedyCGI perl-FCGI perl-CGI-SpeedCGI perl-Time-HiRes perl-ExtUtils-MakeMaker perl-RRD-Simple rrdtool rrdtool-perl curl fping echoping gcc make wget libxml2-devel libpng-devel glib pango pango-devel freetype freetype-devel fontconfig cairo cairo-devel libart_lgpl libart_lgpl-devel mod_fastcgi fping tcping tcpingtreaceroute yum -y install httpd httpd-devel
2.安装和配置Smokeping
wget http://oss.oetiker.ch/smokeping/pub/smokeping-2.6.11.tar.gz tar zxvf smokeping-2.6.11.tar.gz ./setup/build-perl-modules.sh /opt/app/smokeping/thirdparty ./configure --prefix=/opt/app/smokeping /usr/bin/gmake install useradd smokeping -s /sbin/nologin mkdir -p /opt/data/smokeping/{data,cache} /opt/logs/smokeping /opt/run/smokeping chown -R smokeping:smokeping /opt/app/smokeping/ /opt/data/smokeping/ /opt/run/smokeping/ /opt/logs/smokeping/
配置httpd,只有smokeping单实例或者master才需要配置httpd
/etc/httpd/conf.d/smokeping.conf
NameVirtualHost *:80Order Allow,Deny Deny from all ServerName networklatency.xxxxx.com ErrorLog /var/log/httpd/smokeping_error.log CustomLog /var/log/httpd/smokeping_access.log common Alias /cache "/opt/data/smokeping/cache/" Alias /cropper "/opt/app/smokeping/htdocs/cropper/" Alias /smokeping "/opt/app/smokeping/htdocs/smokeping.fcgi" AllowOverride None Options All AddHandler cgi-script .fcgi .cgi AllowOverride AuthConfig Order allow,deny Allow from all #AuthName "smokeping" #AuthType Basic #AuthUserFile "/opt/app/smokeping/htdocs/htpasswd" #Require valid-user DirectoryIndex smokeping.fcgi
修改/etc/httpd/conf/httpd.conf
User smokeping Group smokeping
由于smokeping可能会放在外网环境,所以安全方面就需要注意一下,在smokeping.conf文件中设置禁止IP直接访问smokeping,只能通过域名访问
然后就是配置smokeping,注意只有单实例或者master才需要有配置文件,slave不需要配置文件,slave是定期动态地从master端获取配置信息
etc/examples 这个目录里面有几个配置案例,可以根据自己需要进行修改
*** General *** owner = admin contact = admin@gmail.com mailhost = localhost sendmail = /usr/sbin/sendmail # NOTE: do not put the Image Cache below cgi-bin # since all files under cgi-bin will be executed ... this is not # good for p_w_picpaths. imgcache = /opt/data/smokeping/cache imgurl = cache datadir = /opt/data/smokeping/data piddir = /opt/run/smokeping cgiurl = http://xxxx.com/smokeping smokemail = /opt/app/smokeping/etc/smokemail.dist tmail = /opt/app/smokeping/etc/tmail.dist # specify this to get syslog logging syslogfacility = local0 # each probe is now run in its own process # disable this to revert to the old behaviour concurrentprobes = yes
imgcache datadir piddir cgiurl 根据自己情况修改,几个目录的权限一定要是smokeping和httpd运行的用户,比如smokeping用户
+ detail width = 600 height = 200 unison_tolerance = 2 "Last 1 Hour" 1h "Last 2 Hour" 2h "Last 3 Hour" 3h "Last 6 Hours" 6h "Last 12 Hours" 12h "Last 1 Day" 1d "Last 7 Days" 7d "Last 15 Days" 15d "Last 30 Days" 30d
这里可以自定义设置显示时间范围,比如1h,3h,1d
*** Slaves *** secrets=/opt/app/smokeping/etc/smokeping_secrets.dist +slave1 display_name=slave1 location=HK color=382a34 *** Targets *** slaves = slave1
Slaves栏定义有哪些Slaves,+表示添加一个slave,display_name显示名称,location表示位置,比如香港,广东等,color表示在一个图中显示多个slaves时的颜色,颜色代码必须是小写和字母,可以根据这里选择http://www.colorpicker.com/
Targets栏就是定义具体要探测的点了,可以指定不同的probes去探测。
+ Clients menu = 到客户所在网络区域网络监测 #host = /Yunying/plat/plat223.255.151.87 /Yunying/plat/plat223.255.151.86 /Yunying/plat/plat103.250.15.6 title = 到客户所在网络区域网络监测列表 ++ dianxin menu = 电信网络监控 title = 电信网络监控列表 +++ dianxin-hlj menu = 黑龙江电信 title = 黑龙江电信 host = 219.150.32.132 +++ dianxin-gd menu = 广东电信 title = 广东电信 host = 113.111.211.22 +++ dianxin-gs menu = 甘肃电信 title = 甘肃电信 alerts = someloss #slaves = boomer slave2 host = 202.100.64.68 +++ dianxin-sh menu = 上海电信 title = 上海电信 alerts = someloss #slaves = boomer slave2 host = 202.96.209.5 #+++ dianxin-multi #menu = 多个电信网络监控列表 #title = 多个电信网络监控列表 #alerts = someloss #slaves = boomer slave2 # ++ liantong menu = 联通网络监控 title = 联通网络监控列表 +++ liantong-hlj menu = 黑龙江联通 title = 黑龙江联通 host = 202.97.224.68 +++ liantong-gd menu = 广东联通 #slaves = boomer slave2 host = 221.4.66.66 +++ liantong-gs menu = 甘肃联通 title = 甘肃联通 alerts = someloss #slaves = boomer slave2 host = 221.7.34.10 +++ liantong-sh menu = 上海联通 title = 上海联通 alerts = someloss #slaves = boomer slave2 host = 210.22.70.3 #+++ liantong-multi #menu = 多个联通网络监控列表 #title = 多个联通网络监控列表 #alerts = someloss #slaves = boomer slave2 ++ yidong menu = 移动网络监控 title = 移动网络监控列表 +++ yidong-hlj menu = 黑龙江移动 title = 黑龙江移动 host = 211.137.241.34 +++ yidong-gd menu = 广东移动 #slaves = boomer slave2 host = 211.137.241.34 +++ yidong-gs menu = 甘肃移动 title = 甘肃移动 alerts = someloss #slaves = boomer slave2 host = 218.203.160.194 +++ yidong-sh menu = 上海移动 title = 上海移动 alerts = someloss #slaves = boomer slave2 host = 117.131.0.22 #+++ yidong-multi #menu = 多个移动网络监控列表 #title = 多个移动网络监控列表 #alerts = someloss #slaves = boomer slave2 ++ jiaoyu menu = 教育网络监控 title = 教育网络监控列表 +++ jiaoyu-qh menu = 清华大学 title = 清华大学 host = 166.111.8.28 +++ jiaoyu-sh menu = 上海交大 title = 上海交大 alerts = someloss #slaves = boomer slave2 host = 202.112.26.34 +++ jiaoyu-wh menu = 武汉科技大学 title = 武汉科技大学 alerts = someloss #slaves = boomer slave2 host = 202.114.240.6 +++ jiaoyu-hn menu = 华南农业大学 title = 华南农业大学 alerts = someloss #slaves = boomer slave2 host = 202.116.160.33 #+++ jiaoyu-multi #menu = 多个教育网络监控列表 #title = 多个教育网络监控列表 #alerts = someloss #slaves = boomer slave2 #host = /Clients/jiaoyu/jiaoyu-qh /Clients/jiaoyu/jiaoyu-sh /Clients/jiaoyu/jiaoyu-wh /Clients/jiaoyu/jiaoyu-hn
smokeping_secrets.dist 文件是master端与slave端交互的密钥文件,格式如下
slave1:xxxxxx
文件内的slave名称一定要和slave启动的名称一样,默认是主机名
slave端指定一个secret.txt文件用于与master端交互,格式如下
xxxxxx
只含有密钥
这两个文件的权限都必须是600,并且属主是smokeping的启动用户
添加master启动脚本
#! /bin/sh # # smokeping-master Start/Stop smokeping-master # # chkconfig: 345 99 99 # description: smokeping master # processname: smokeping if [ -f /etc/rc.d/init.d/functions ]; then . /etc/rc.d/init.d/functions fi name="smokeping-master" smokeping_bin="/opt/app/smokeping/bin/smokeping" cache_dir="/opt/data/smokeping/cache" data_dir="/opt/data/smokeping/data" smokeping_log="/opt/logs/smokeping/smokeping_master.log" smokeping_secrets="/opt/app/smokeping/etc/smokeping_secrets.dist" pid_dir="/opt/run/smokeping" user="smokeping" find_smokeping_process () { PID=`ps -ef |grep -v grep| grep $smokeping_bin | grep $smokeping_log | awk '{ print $2 }'` } start () { log_dir=`dirname ${smokeping_log}` if [ ! -d $log_dir ]; then echo -e "\e[35mLog dir ${log_dir} doesn't exist. Creating\e[0m" mkdir -p $log_dir fi if [ ! -d $cache_dir ];then echo -e "\e[35mCache dir ${cache_dir} doesn't exist.Creating\e[0m" mkdir -p $cache_dir fi if [ ! -d $pid_dir ];then echo -e "\e[35mPid dir ${pid_dir} doesn't exist.Creating\e[0m" mkdir -p $pid_dir fi if [ ! -d $data_dir ];then echo -e "\e[35mData dir ${data_dir} doesn't exist.Creating\e[0m" mkdir -p $data_dir fi chown -R $user $log_dir $cache_dir $pid_dir $smokeping_secrets chmod 600 $smokeping_secrets find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is already running!\e[0m" else daemon --user $user ${smokeping_bin} --logfile=${smokeping_log} > /dev/null 2>&1 find_smokeping_process if [ "$PID" != "" ];then echo -e "\e[35mStarting $name SUCCESS\e[0m" else echo -e "\e[35mStarting $name Failed!!!\e[0m" fi fi } stop () { find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35mStopping $name\e[0m" kill $PID else echo -e "\e[35m$name is not running yet\e[0m" fi } case $1 in start) start ;; stop) stop exit 0 ;; reload) stop sleep 2 start ;; restart) stop sleep 2 start ;; status) find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is running: $PID\e[0m" exit 0 else echo -e "\e[35m$name is not running\e[0m" exit 1 fi ;; *) echo -e "\e[35mUsage: $0 {start|stop|restart|reload|status}\e[0m" RETVAL=1 esac exit 0
添加slave启动脚本
#! /bin/sh # # smokeping-slave Start/Stop smokeping-slave # # chkconfig: 345 99 99 # description: smokeping slave # processname: smokeping if [ -f /etc/rc.d/init.d/functions ]; then . /etc/rc.d/init.d/functions fi name="smokeping-slave" smokeping_bin="/opt/app/smokeping/bin/smokeping" cache_dir="/opt/data/smokeping/cache" smokeping_log="/opt/logs/smokeping/smokeping_slave.log" master_url="http://networklatency.caipiao88.com/smokeping" shared_secret="/opt/app/smokeping/etc/secret.txt" pid_dir="/opt/run/smokeping" user="smokeping" find_smokeping_process () { PID=`ps -ef |grep -v grep| grep $smokeping_bin | grep $smokeping_log | awk '{ print $2 }'` } start () { log_dir=`dirname ${smokeping_log}` if [ ! -d $log_dir ]; then echo -e "\e[35mLog dir ${log_dir} doesn't exist. Creating\e[0m" mkdir -p $log_dir fi if [ ! -d $cache_dir ];then echo -e "\e[35mCache dir ${cache_dir} doesn't exist.Creating\e[0m" mkdir -p $cache_dir fi if [ ! -d $pid_dir ];then echo -e "\e[35mPid dir ${pid_dir} doesn't exist.Creating\e[0m" mkdir -p $pid_dir fi chown -R $user $log_dir $cache_dir $pid_dir $shared_secret find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is already running!\e[0m" else daemon --user $user ${smokeping_bin} --master-url=${master_url} --cache-dir=${cache_dir} --shared-secret=${shared_secret} --pid-dir=${pid_dir} --logfile=${smokeping_log} > /dev/null 2>&1 find_smokeping_process if [ "$PID" != "" ];then echo -e "\e[35mStarting $name SUCCESS\e[0m" else echo -e "\e[35mStarting $name Failed!!!\e[0m" fi fi } stop () { find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35mStopping $name\e[0m" kill $PID else echo -e "\e[35m$name is not running yet\e[0m" fi } case $1 in start) start ;; stop) stop exit 0 ;; reload) stop sleep 2 start ;; restart) stop sleep 2 start ;; status) find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is running: $PID\e[0m" exit 0 else echo -e "\e[35m$name is not running\e[0m" exit 1 fi ;; *) echo -e "\e[35mUsage: $0 {start|stop|restart|reload|status}\e[0m" RETVAL=1 esac exit 0
启动Master
service smokeping-master start
启动Slave
service smokeping-slave start
三 线上部署Smokeping
线上部署可以采用Master-Slave方案,Master放置在防火墙内,Slave就是需要执行各种探测任务的服务器。
[slave 1] [slave 2] [slave 3] | | | +-------+ | +--------+ | | | v v v +---------------+ | master | +---------------+
Slave端收集完数据会通过Master的CGI接口上传数据,然后到Master段进行汇总显示,所以Master只需要按照上述步骤直接安装就行了,由于部署的Slave数量可能比较多,最好采用Ansible或者SaltStack批量部署
可以自定义制作rpm包方便部署
rpm包的制作方法参考http://john88wang.blog.51cto.com/2165294/1787783
采用Master-Slave方案有一个弊端,如果slave只有三四个,那么问题还不大,但是如果想通过Master来收集很多个slave,那么smokeping页面打开会很慢很慢。因为我想要检测线上外网的每台服务器的网络质量,所以如果采用一个Master,其他都当作slave的话,smokeping的页面根本就打不开。所以只有放弃Master-Slave方案,而是在每个外网服务器上部署一个smokeping实例。另外,线上的外层代理都是部署的nginx,所以没有必要再单独为smokeping部署apache了。
Nginx默认是不能处理perl cgi程序的,需要借助于spawn-fcgi
yum -y install spawn-fcgi
/etc/init.d/smokeping-fcgi
#!/bin/sh # # chkconfig: - 86 14 # description: smokeping-fcgi exec=/usr/bin/spawn-fcgi fcgi_port=9007 pid_file=/opt/run/smokeping/smokeping-fcgi.pid fcgi_user=nobody fcgi_app=/opt/app/smokeping/bin/smokeping_cgi find_smokepingfcgi_pid() { pid=$(ps -ef|grep smokeping_cgi|grep -v grep|grep $fcgi_user|awk '{print $2}') } start() { echo -e "\e[035mStarting smokeping-fcgi\e[0m" $exec -a 127.0.0.1 -p $fcgi_port -P $pid_file -u $fcgi_user -f $fcgi_app > /dev/null 2>&1 find_smokepingfcgi_pid if [ "$pid" != "" ];then echo -e "\e[035mStarting OK\e[0m" else echo -e "\e[035mStarting Failed\e[0m" fi } stop() { echo -e "\e[035mShutting down smokeping-fcgi\e[0m" find_smokepingfcgi_pid if [ "$pid" != "" ];then kill -9 $pid rm $pid_file else echo -e "\e[035msmokeping-fcgi is not running yet\e[0m" fi } status() { find_smokepingfcgi_pid if [ "$pid" != "" ];then echo -e "\e[035m smokeping-fcgi is running: pid:$pid\e[0m" else echo -e "\e[035m smokeping-fcgi is not running\e[0m" fi } restart() { stop start } case "$1" in start|stop|restart|status) $1 ;; *) echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}" exit 2 ;; esac
/etc/init.d/smokeping-master
#! /bin/sh # # smokeping-master Start/Stop smokeping-master # # chkconfig: 345 99 99 # description: smokeping master # processname: smokeping if [ -f /etc/rc.d/init.d/functions ]; then . /etc/rc.d/init.d/functions fi name="smokeping-master" smokeping_app="/opt/app/smokeping" smokeping_bin="/opt/app/smokeping/bin/smokeping" cache_dir="/opt/data/smokeping/cache" data_dir="/opt/data/smokeping/data" smokeping_log="/opt/logs/smokeping/smokeping_master.log" smokeping_secrets="/opt/app/smokeping/etc/smokeping_secrets.dist" pid_dir="/opt/run/smokeping" user="nobody" find_smokeping_process () { PID=`ps -ef |grep -v grep| grep $smokeping_bin | grep $smokeping_log | awk '{ print $2 }'` } start () { log_dir=`dirname ${smokeping_log}` if [ ! -d $log_dir ]; then echo -e "\e[35mLog dir ${log_dir} doesn't exist. Creating\e[0m" mkdir -p $log_dir fi if [ ! -d $cache_dir ];then echo -e "\e[35mCache dir ${cache_dir} doesn't exist.Creating\e[0m" mkdir -p $cache_dir fi if [ ! -d $pid_dir ];then echo -e "\e[35mPid dir ${pid_dir} doesn't exist.Creating\e[0m" mkdir -p $pid_dir fi if [ ! -d $data_dir ];then echo -e "\e[35mData dir ${data_dir} doesn't exist.Creating\e[0m" mkdir -p $data_dir fi ln -sf $cache_dir $smokeping_app ln -sf $data_dir $smokeping_app chown -R $user $log_dir $cache_dir $pid_dir $smokeping_secrets chmod 600 $smokeping_secrets find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is already running!\e[0m" else daemon --user $user ${smokeping_bin} --logfile=${smokeping_log} > /dev/null find_smokeping_process if [ "$PID" != "" ];then echo -e "\e[35mStarting $name SUCCESS\e[0m" service smokeping-fcgi start else echo -e "\e[35mStarting $name Failed!!!\e[0m" fi fi } stop () { find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35mStopping $name\e[0m" kill $PID else echo -e "\e[35m$name is not running yet\e[0m" fi } case $1 in start) start ;; stop) stop exit 0 ;; reload) stop sleep 2 start ;; restart) stop sleep 2 start ;; status) find_smokeping_process if [ "$PID" != "" ]; then echo -e "\e[35m$name is running: $PID\e[0m" exit 0 else echo -e "\e[35m$name is not running\e[0m" exit 1 fi ;; *) echo -e "\e[35mUsage: $0 {start|stop|restart|reload|status}\e[0m" RETVAL=1 esac exit 0
smokeping config文件变动
imgcache = /opt/app/smokeping/cache imgurl = cache datadir = /opt/app/smokeping/data piddir = /opt/run/smokeping cgiurl = http://networklatency.xxxx.com/smokeping.fcgi
Nginx配置 smokeping.conf
server { listen 80; server_name networklatency.xxxx.com; root /opt/app/smokeping/htdocs; location /cache { root /opt/app/smokeping; } location ~ .*\.fcgi$ { include fastcgi_params; fastcgi_pass 127.0.0.1:9007; fastcgi_index smokeping.fcgi; #fastcgi_param SCRIPT_FILENAME /opt/app/smokeping/htdocs/$fastcgi_script_name; }
需要注意smokeping-master和smokeping-fcgi脚本中的执行用户要和nginx的运行用户一致
剩下的事情就是批量部署smokeping实例了
四 Smokeping替代方案
参考文档:
http://blog.coocla.org/smokeping-slave.html
http://blog.coocla.org/smokeping-with-nginx.html
http://oss.oetiker.ch/smokeping/pub/smokeping-2.6.11.tar.gz