一共三种方法
A、利用nagios自带的插件check_mrtgtraf对网卡流量进行监控
这种方法要依赖mrtg数据,并且使用起来就Bytes和Bites换算也有点问题,不推荐使用。
这里只简单介绍下check_mrtgtraf ,它定时检查mrtg的日志文件,获取当前流量
如下例子,但该插件个人觉得功能简单切有限 ,自己已经弃用。
/u/nagios/libexec/check_mrtgtraf -F /var/www/html/mrtg/192.168.0.21_2.log -a AVG -w 300000,300000 -c 400000,400000 -e 1
Traffic WARNING - Avg. In = 295.2 KB/s, Avg. ut = 58.7 KB/s|in=295.211914KB/s;300000.000000;400000.000000;0.000000 in=58.667969KB/s;300000.000000;400000.000000;0.000000
B、网络流传其他一类的流量监测方法的脚本,我试用了几个最终感觉还是不太方便。
不要感觉到坑爹,笔者也是使用了以上两种方法之后最终选择下面这一种的,方便快捷功能强。
C、使用check_snmp_int.pl 插件监控网络
推荐使用,简单方便功能多,流量计算也比较准确(我是和同步的mrtg监控和cacti监控页面对比过数据)。
参考页面http://nagios.manubulon.com/snmp_int.html
下载地址http://nagios.manubulon.com/check_snmp_int.pl
前提:您的要被监控的主机也要开放snmp服务才行。
环境:nagios监控服务器和被监控服务器均是linux服务器
1、下载该插件到nagios监控服务器
首先确保监控服务器上snmp和perl相关包都已安装,执行以下语句测试是否返回正确值。
perl check_snmp_int.pl -H 192.168.0.21 -C zjhcsoft -n eth1 -k -Y -B -w 200,400 -c 0,800
该语句表示:-H 表示监控192.168.0.21服务器 -C 表示组织名称为 zjhcsoft -n 表示检查eth1 网卡 -Y -B 联合使用表示返回的是以bits/s的网卡流量 -w 和-c 表示警告伐值 in伐值,out伐值
确定手动执行可以返回正确结果如下,如果超过-c的伐值会有如下警告
eth1:UP (WARN 5095.5Kbps/CRIT 37443.9Kbps):(1 UP): CRITICAL
2、确定该插件正常使用后配置nagios
编辑 commands.cfg文件,创建一个本地命令
[root@cacti objects]# vi commands.cfg
# 'check_snmp_int_iftraffic' command definition
define command{
command_name check_snmp_int_iftraffic
command_line $USER1$/check_snmp_int.pl -H $HOSTADDRESS$ -C $ARG1$ -n $ARG2$ -k -Y -B -w $ARG3$ -c $ARG4$
}
创建检查服务,编辑配置文件
[root@cacti objects]# vi network_interface_service.cfg
define service{
use generic-service ; Inherit values from a template
host_name web1,web2,web3,web4,web6
service_description Output Interface Bandwidth Usage
check_command check_snmp_int_iftraffic!zjhcsoft!eth2!1200,5000!2000,10000
notifications_enabled 0
}
检查配置无误
[nagios@cacti objects]$ /u/nagios/bin/nagios -v /u/nagios/etc/nagios.cfg
重启nagios
[root@cacti objects]# service nagios restart
3、检查nagios监控页面确定页面返回正常监控数据
补充:以上是snmp v1版本的 如果设备是v2版本 就要加一个参数‘-2’,在nagios再新配置一个v2版本的本地命令
# 'check_snmp_int_iftraffic_v2' command definition
define command{
command_name check_snmp_int_iftraffic
command_line $USER1$/check_snmp_int.pl -H $HOSTADDRESS$ -C $ARG1$ -2 -n $ARG2$ -k -Y -B -w $ARG3$ -c $ARG4$
}
监控snmp为v2版本的网络设备
define service{
use generic-service ; Inherit values from a template
host_name Netscreen ISG 2000
service_description Output Interface Bandwidth Usage
check_command check_snmp_int_iftraffic_v2!zjhcsoft!ethernet1/1!1200,5000!2000,10000
notifications_enabled 0
}
4、遇到问题
[root@cacti libexec]# perl check_snmp_int.pl -H 192.168.0.21 -C zjhcsoft -n eth1 -k -w 200,400 -c 0,800
eth1:UP No usable data on file (1 rows) :(1 UP): UNKNOWN
[root@cacti libexec]# perl check_snmp_int.pl -H 192.168.0.21 -C zjhcsoft -n eth1 -k -w 200,400 -c 0,800
eth1:UP No usable data on file (2 rows) :(1 UP): UNKNOWN
网站解释:
(我总结下就是最好执行时间超过5分钟,这样才有正确结果可以返回,我没有修改默认数值,有兴趣的同学可以深入研究下。)
No usable data on file (X rows)
Scripts like check_snmp_int need to store data when they get a SNMP counter so they can outpout readable data like bandwidth, cpu, etc....
For example, to output a bandwidth with an octet counter, check_snmp_int will store data every time it is run. It will also read the previous data, and try to get data old enough to make a correct average. By default, it needs data which was produced 5 minutes ago.
So, when you first run the script. - or if you run it a long time ago -, it won't be able to get data old enough and will report an error (UNKNOWN status) saying the is "no usable data on file (X rows)".
If you leave the 5 minutes default delta value, the script. will need data wich is :
- At least 4 minutes and 30 seconds old (5 min - 10%)
- At most 15 minutes old (3 * 5 min)
You can change this 5 minutes value using the '-d ' option. The script. will then look for data which is at least -10% old and at most 3*.
This option will only tell to make an average on seconds, you can run the service every minute with Nagios, it will always get the newest value which is at least -10% old.
The only thing you must check is that your service will at least run every 15 minutes, or the script. will always output "unknown" as the value will be too old for him.