最近在做线上的nagios监控优化工作,发现线上监控有几个warning内容如下:
Warning: Service 'usr Partition' on host 'luckcart_ss02' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
从nagios配置实例来看:
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 3 ; Check the service every 10 minutes under normal conditions retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 10 ; Re-notify about service problems every hour
如果出现notification_interval小于retry_check_interval的时间,即是,还没有完成重复的检测,就已经发送了告警,而告警的状态还是前一次的状态。
所以,根据需要,调整相关时间,重新检查一个配置文件:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg