调整check_mk的check时间

check_km文档没有看到如何修改passive check的间隔时间,不过观察发现icinga里面有一个变量check_interval可以设置在service里面。

在check_mk_objects.cfg里面的CPU load设置里添加这个变量:

define service {
  use                           check_mk_passive_perf
  host_name                     StaticFileServer
  service_description           CPU load
  check_command                 check_mk-cpu.loads
  check_interval                0.05
}
因为单位是分钟,这里用0.05来表示3秒间隔。

然后重新启动icinga

service icinga restart

web页面里面显示间隔为3秒。


如果要改变所有的service的监控间隔,可以修改conf.d/check_mk_templates.cfg文件中的名为check_mk_default的service:

# Template used by all other check_mk templates                                                                                                                                  
define service {
  name                            check_mk_default
  register                        0
  active_checks_enabled           1
  passive_checks_enabled          1
  parallelize_check               1
  obsess_over_service             1
  check_freshness                 0
  notifications_enabled           1
  event_handler_enabled           0
  flap_detection_enabled          1
  failure_prediction_enabled      1
  process_perf_data               0
  retain_status_information       1
  retain_nonstatus_information    1
  notification_interval           0
  is_volatile                     0
  normal_check_interval           0.05
  retry_check_interval            0.05
  max_check_attempts              1
  notification_options            u,c,w,r,f,s
  notification_period             24X7
  check_period                    24X7
}
上面将normal_check_ineterval和retry_check_interval修改成了0.05分钟。


再修改icinga.cfg文件:

command_check_interval=1s
external_command_buffer_slots=32768

加上日志:

log_external_commands=1
log_passive_checks=1



重新启动后看日志:

用grep命令把对某个服务器的cpuload监控日志过滤出来:

./icinga.log:127508:[1369051243] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127538:[1369051247] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127585:[1369051252] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127631:[1369051257] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127677:[1369051262] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127724:[1369051267] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127770:[1369051272] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127832:[1369051278] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127878:[1369051283] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127909:[1369051287] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127955:[1369051292] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:128002:[1369051297] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:128048:[1369051302] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:128125:[1369051309] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs

可以看到基本上后面几条都是间隔4-6秒。已经达到了修改的目的。

刚才是全局的设置,所有服务的检查都改成了3s间隔,但是如果仅仅改动一个service的间隔可以么?我尝试了把下面的配置单独放在一个service中,而全局的配置仍然为1分钟:

  normal_check_interval           0.05
  retry_check_interval            0.05
日志中显示仍然为60秒间隔,尽管web页面上已经显示3s.

Service normal/retry check interval 3s/3s

结论:

1. 目前只找到全局的修改方式,对某个service修改无效。

2. 服务器CPU load现在没有什么压力,所以还看不出实际的效果。还需要压力测试来证明。


你可能感兴趣的:(调整check_mk的check时间)