check_km文档没有看到如何修改passive check的间隔时间,不过观察发现icinga里面有一个变量check_interval可以设置在service里面。
在check_mk_objects.cfg里面的CPU load设置里添加这个变量:
define service { use check_mk_passive_perf host_name StaticFileServer service_description CPU load check_command check_mk-cpu.loads check_interval 0.05 }因为单位是分钟,这里用0.05来表示3秒间隔。
然后重新启动icinga
service icinga restart
web页面里面显示间隔为3秒。
如果要改变所有的service的监控间隔,可以修改conf.d/check_mk_templates.cfg文件中的名为check_mk_default的service:
# Template used by all other check_mk templates define service { name check_mk_default register 0 active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 0 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 0 retain_status_information 1 retain_nonstatus_information 1 notification_interval 0 is_volatile 0 normal_check_interval 0.05 retry_check_interval 0.05 max_check_attempts 1 notification_options u,c,w,r,f,s notification_period 24X7 check_period 24X7 }上面将normal_check_ineterval和retry_check_interval修改成了0.05分钟。
再修改icinga.cfg文件:
command_check_interval=1s
external_command_buffer_slots=32768
加上日志:
log_external_commands=1 log_passive_checks=1
用grep命令把对某个服务器的cpuload监控日志过滤出来:
./icinga.log:127508:[1369051243] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127538:[1369051247] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127585:[1369051252] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127631:[1369051257] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127677:[1369051262] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127724:[1369051267] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127770:[1369051272] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127832:[1369051278] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127878:[1369051283] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127909:[1369051287] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:127955:[1369051292] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:128002:[1369051297] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:128048:[1369051302] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs ./icinga.log:128125:[1369051309] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
刚才是全局的设置,所有服务的检查都改成了3s间隔,但是如果仅仅改动一个service的间隔可以么?我尝试了把下面的配置单独放在一个service中,而全局的配置仍然为1分钟:
normal_check_interval 0.05 retry_check_interval 0.05日志中显示仍然为60秒间隔,尽管web页面上已经显示3s.
Service normal/retry check interval | 3s/3s |
结论:
1. 目前只找到全局的修改方式,对某个service修改无效。
2. 服务器CPU load现在没有什么压力,所以还看不出实际的效果。还需要压力测试来证明。