monit配置文件命令学习

monit可以用来监视unix进程,程序,文件,目录,文件系统等(processes,programs, files, directories and filesystems)比如时间戳,校验和,或者大小发生改变都能监测到!

monit程序提供了一个http接口,你可以通过浏览器访问monit程序,monit通过一个配置文件来控制自己的行为

终于明白为什么为什么有两个monitrc配置文件了,缺省的是/usr/local/monit/etc/monitrc,如果这个文件不存在的话就会去访问/etc/monit/monitrc【回答:前者是缺省的配置文件,后者是备用配置文件][

你有可以手动指定一个配置文件的路径,

 $ monit -c /var/monit/monitrc
你可以通过命令行来修改配置文件,但是为了简单起便,推荐在配置文件里面进行配置
sudo /usr/local/monit/bin/monit -t 可以编译一下配置文件,看配置文件是否出错,没有错误将会出现如下信息
laicb@laicb-HP-ProBook-4416s:~$ sudo /usr/local/monit/bin/monit -t
[sudo] password for laicb: 
Control file syntax OK

./monit.state保存monit的状态并且利用她从一个毁坏性的状态恢复!

~/.monit.id保存他自己的唯一ID到这个文件里面


守护进程

守护进程(Daemon)是运行在后台的一种特殊进程。它独立于控制终端并且周期性地执行某种任务或等待
    处理某些发生的事件。守护进程是一种很有用的进程。
    Linux的大多数服务器就是用守护进程实现的。比如,Internet服务器inetd,Web服务器httpd等。
    同时,守护进程完成许多系统任务。比如,作业规划进程crond,打印进程lpd等。

启动后monit将做为守护进程

This is Monit version 5.3.2

Monit 参数

The following options are recognized by Monit. However, it isrecommended that you set options (when applicable) directly inthe.monitrc control file.

-c file Use this control file

-d n Run Monit as a daemon once per n seconds. Or use"set daemon" in monitrc.

-g name Set group name for start, stop, restart, monitor and unmonitor action.

-l logfile Print log information to this file. Or use"set logfile" in monitrc.

-p pidfile Use this lock file in daemon mode. Or use"set pidfile" in monitrc.

-s statefile Write state information to this file. Or use"set statefile" in monitrc.

-I Do not run in background (needed for run from init)

-t Run syntax check for the control file

-v Verbose mode, work noisy (diagnostic output)

-vv Very verbose mode, same as -v plus log stack-trace on error

-H [filename] Print MD5 and SHA1 hashes of the file or of stdin if the filename is omitted; Monit will exit afterwards

-V Print version number and patch level

-h Print a help text


一旦启动了Monit,你就可以使用以下命令来操作这个守护进程

Once you have Monit running as a daemon process, you can callMonit with one of the following arguments. Monit will thenconnect to the Monit daemon (on TCP port 127.0.0.1:2812 bydefault) and ask the Monit daemon to perform the requestedaction. In other words; calling monit without arguments startsthe Monit daemon, and calling monit with arguments enables youto communicate with the Monit daemon process.

[plain] view plaincopyprint?
  1. Once you have Monit running as a daemon process, you can call Monit with one of the following arguments. Monit will then connect to the Monit daemon (on TCP port 127.0.0.1:2812 by default) and ask the Monit daemon to perform the requested action. In other words; calling monit without arguments starts the Monit daemon, and calling monit with arguments enables you to communicate with the Monit daemon process.  
  2.   
  3. start all  
  4.   
  5.     Start all services listed in the control file and enable monitoring for them. If the group option is set (-g), only start and enable monitoring of services in the named group ("all" is not required in this case).  
  6. start name  
  7.   
  8.     Start the named service and enable monitoring for it. The name is a service entry name from the monitrc file.  
  9. stop all  
  10.   
  11.     Stop all services listed in the control file and disable their monitoring. If the group option is set, only stop and disable monitoring of the services in the named group (all" is not required in this case).  
  12. stop name  
  13.   
  14.     Stop the named service and disable its monitoring. The name is a service entry name from the monitrc file.  
  15. restart all  
  16.   
  17.     Stop and start all services. If the group option is set, only restart the services in the named group ("all" is not required in this case).  
  18. restart name  
  19.   
  20.     Restart the named service. The name is a service entry name from the monitrc file.  
  21. monitor all  
  22.   
  23.     Enable monitoring of all services listed in the control file. If the group option is set, only start monitoring of services in the named group ("all" is not required in this case).  
  24. monitor name  
  25.   
  26.     Enable monitoring of the named service. The name is a service entry name from the monitrc file. Monit will also enable monitoring of all services this service depends on.  
  27. unmonitor all  
  28.   
  29.     Disable monitoring of all services listed in the control file. If the group option is set, only disable monitoring of services in the named group ("all" is not required in this case).  
  30. unmonitor name  
  31.   
  32.     Disable monitoring of the named service. The name is a service entry name from the monitrc file. Monit will also disable monitoring of all services that depends on this service.  
  33. status  
  34.   
  35.     Print status information of each service.  
  36. summary  
  37.   
  38.     Print a short status summary.  
  39. reload  
  40.   
  41.     Reinitialize a running Monit daemon, the daemon will reread its configuration, close and reopen log files.  
  42. quit  
  43.   
  44.     Kill the Monit daemon process  

Monit可以做什么?

你可以用monit来监控进程,尤其对监控守护进程很有用,比如在系统启动时间启动的 /etc/init.d,比如sendmail,ssh,apache,mysql等

1,你可以用Monit来监控files,directories,文件系统,monit可以监控这些项目的改变,比如时间戳,校验和改变,文件大小改变,这样比较安全,比如你改变了文件的内容,那么它的md5或者sha1校验码不会改变。

2,monit可以监控到各种服务器的网络链接,本地或者远程,TCP还是UDP,Unix DomainSockets 都支持

3,monit可以用来在某些时候测试程序或者脚本,你可以测试程序的返回值,并以此为依据,进行一些必要的操作,比如执行某一个动作或者发送一个警报

4,Monit可以用来监控一般的系统资源,比如CPU使用,内存,以及负载均值(Load Acerage)

  {

 Load AverageCPULoad,它所包含的信息不是CPU的使用率状况,而是在一段时间内CPU正在处理以及等待CPU处理的进程数之和的统计信息,也就是CPU使用队列的 长度的统计信息

}


[plain] view plaincopyprint?
  1. LOGGING  
  2.   
  3. Monit will log status and error messages to a log file. Use the set logfile statement in the monitrc control file. To setup Monit to log to its own logfile,   
  4. use e.g. set logfile /var/log/monit.log. If syslog is given as a value for the -l command-line switch (or the keyword set logfile syslog is found in the  
  5. control file) Monit will use the syslog system daemon to log messages with a priority assigned to each message based on the context. To turn off logging,   
  6. simply do not set the logfile in the control file (and of course, do not use the -l switch)  


 守护模式(DAEMON MODE)

use

   set daemon n (where n is a number in seconds)

如果你没有指定这个命令set daemon,那么monit将会运行一次,然后退出,这在某些地方可能会有用处,但是monit当初设计就是设计为守护进程的


INIT SUPPORT

 set init 让阻止monit转换他自己为一个守护进程,而把monit作为一个前台进程,但是你仍然要在配置文件中设置set daemon,以此来设置轮询的时间

从init启动是一个最好的方式了,因为这样你可以保证你的系统里面始终有一个Monit进程

另外可以选择从crontab来启动Monit

要从init启动MOnit,一种方式是你在配置文件中设置Monit的配置问i俺,另外一种可选命令行的是 -I 选项

下面是你要要添加到/etc/inittab:

[plain] view plaincopyprint?
  1. # Run Monit in standard run-levels  
  2. mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc  
在你已经修改init的配置文件之后,你可以运行下面命令去重新检验,/etc/inittab并且启动Monit

telinit q

对于没有telinit的系统使用

kill -1 1

加入init启动之后可能会出现一个问题,就是当Monti监控的某些服务比Monit启动慢的时候,Monit会认为这个服务没有启动,所以会发送错误警报;

要解决这个问题,可以参考FAQ


监控模式:(MONITRING MODE)

Monit支持三种监控模式,

active--Monitj监控一个服务,为了防止一系列问题,Monit会执行以及发送警报,停止,启动,重启,这是一个缺省的模式

passive--MOnit监控一个服务,不会尝试去修复这个问题,但还是会发送警报

manual--Monit监控进入active模式,通过monit的控制,比如在控制台执行命令,比如 Monit start sybase

  (Monit will call sybase's start method and enable monitoring)

ALERT MESSAGES

Monit会发送一个邮件提醒,在下列情况

o A service timed out
 o A service does not exist
 o A service related data access problem
 o A service related program execution problem
 o A service is of invalid object type
 o A program status failed
 o A icmp problem
 o A port connection problem
 o A resource statement match
 o A file checksum problem
 o A file size problem
 o A file/directory timestamp problem
 o A file/directory/filesystem permission problem
 o A file/directory/filesystem uid problem
 o A file/directory/filesystem gid problem
 o An action is done per administrator's request
Monit 会发送一个警报只要被监控对象发生了改变,这些对象包括
 o Monit started, stopped or reloaded
 o A file checksum changed
 o A file size changed
 o A file content match
 o A file/directory timestamp changed
 o A filesystem mount flags changed
 o A process PID changed
 o A process PPID changed

警报状态有两种形式

Global -- common for all services

local -- per service

在没一种形式下你都可以发送多个警报状态,换句话说你可以发懊恼过不同的邮件到不同的地址

Setting a global alert statement

{

   如果在监控服务发生了改变,Monit将会发送一个警报到全局列表的所有的接受者,下面是全局警报的语法

   SET ALERT mail-address [[NOT]{EVENTS}]  [MAIL-FORMAT {mail-format}] [REMINDER number]

   简单使用:set alert foo@bar

  EVENTS,MAIL-FORMAT,REMINDER看下面使用用法

 Setting a local alert statement

 每一个服务可以有他自己的接收列表

ALERT mail-address [[NOT]{EVENTS}]  [MAIL-FORMAT {mail-format}] [REMINDER number]

没有了SET就成了局部的了

或者NOALERT  mail-address

如果你只想接受某些服务的某些警报信息的话,比如你只想接受timeout或者nonexist事件,那么你可以这么写

check process myproc with pidfile /var/run/my.pid
   alert foo@bar only on { timeout, nonexist } 
   ...

你可以指定除去某些事件外发送警报信息,比如你想监听所有时间除了instance事件,那么你可以这么写

 check system myserver
   alert foo@bar but not on { instance } 
   ...
相当于
  alert foo@bar on { action
                      checksum
                      connection
                      content
                      data
                      exec
                      fsflags
                      gid
                      icmp
                      invalid
                      nonexist
                      permission
                      pid
                      ppid
                      resource
                      size
                      status
                      timeout
                      timestamp
                      uid
                      uptime }
一个instance事件是指Monit程序启动或者停止

你也可以根据事件的不同来发送给不同的邮件

 alert foo@bar { nonexist, timeout, resource, icmp, connection }
 alert security@bar on { checksum, permission, uid, gid }
 alert manager@bar
可以在邮件过滤器中使用的事件如下:

action,checksum, connection, content, data, exec, fsflags, gid, icmp,instance, invalid, nonexist, permission, pid, ppid, resource, size, status, timeout, timestamp, uid, uptime

你可以使用

noalert appadmin@bar来进行不接受警报的邮箱
 set alert foo@bar
 
 check process myfoo with pidfile /var/run/myfoo.pid
   ...
 check process mybar with pidfile /var/run/mybar.pid
   alert foo@bar only on { timeout }
上述代码会把所有的警报信息发送给foo@bar,除了mybar服务的,在timeout的时候发送警报信息,这就是局部覆盖全局的原理了


[plain] view plaincopyprint?
  1.     $EVENT  
  2.   
  3.      A string describing the event that occurred. The values are  
  4.      fixed and are:  
  5.   
  6.      Event:    | Failure state:           | Success state:                
  7.      -------------------------------------------------------------------  
  8.      ACTION    | "Action done"            | "Action done"                 
  9.      CHECKSUM  | "Checksum failed"        | "Checksum succeeded"          
  10.      CONNECTION| "Connection failed"      | "Connection succeeded"        
  11.      CONTENT   | "Content failed",        | "Content succeeded"  
  12.      DATA      | "Data access error"      | "Data access succeeded"       
  13.      EXEC      | "Execution failed"       | "Execution succeeded"         
  14.      FSFLAG    | "Filesystem flags failed"| "Filesystem flags succeeded"  
  15.      GID       | "GID failed"             | "GID succeeded"               
  16.      ICMP      | "ICMP failed"            | "ICMP succeeded"              
  17.      INSTANCE  | "Monit instance changed" | "Monit instance changed not"  
  18.      INVALID   | "Invalid type"           | "Type succeeded"              
  19.      NONEXIST  | "Does not exist"         | "Exists"                      
  20.      PERMISSION| "Permission failed"      | "Permission succeeded"        
  21.      PID       | "PID failed"             | "PID succeeded"  
  22.      PPID      | "PPID failed"            | "PPID succeeded"  
  23.      RESOURCE  | "Resource limit matched" | "Resource limit succeeded"    
  24.      SIZE      | "Size failed"            | "Size succeeded"              
  25.      STATUS    | "Status failed"          | "Status succeeded"              
  26.      TIMEOUT   | "Timeout"                | "Timeout recovery"            
  27.      TIMESTAMP | "Timestamp failed"       | "Timestamp succeeded"         
  28.      UID       | "UID failed"             | "UID succeeded"               
  29.      UPTIME    | "Uptime failed"          | "Uptime succeeded"  
  30.   
  31.     $SERVICE  
  32.   
  33.      The service entry name in monitrc  
  34.   
  35.     $DATE  
  36.   
  37.      The current time and date (RFC 822 date style).  
  38.   
  39.     $HOST  
  40.   
  41.      The name of the host Monit is running on  
  42.   
  43.     $ACTION  
  44.   
  45.      The name of the action which was done. Action names are fixed  
  46.      and are:http://write.blog.csdn.net/postedit/9564261  
  47.   
  48.      Action:  | Name:  
  49.      --------------------  
  50.      ALERT    | "alert"  
  51.      EXEC     | "exec"  
  52.      RESTART  | "restart"  
  53.      START    | "start"  
  54.      STOP     | "stop"  
  55.      UNMONITOR| "unmonitor"  
  56.   
  57.     $DESCRIPTION  
  58.   
  59.      The description of the error condition  
  60.   
  61.   
  62.   
  63.   
  64.   
  65.   
  66.   

Setting an error reminder

ALERT ... [WITH] REMINDER [ON] number [CYCLES]

For example if you want to be notified each tenth cycle if a service remains in a failed state, you can use:(/如果一个服务10个轮询都在失败状态,那么就发送邮件)

  alert foo@bar with reminder on 10 cycles

Likewise if you want to be notified on each failed cycle, you canuse:

  alert foo@bar with reminder on 1 cycle

为提醒消息设置邮件服务器

[plain] view plaincopyprint?
  1. SET MAILSERVER {hostname|ip-address [PORT port]  
  2.                 [USERNAME username] [PASSWORD password]  
  3.                 [using SSLV2|SSLV3|TLSV1] [CERTMD5 checksum]}+   
  4.                 [with TIMEOUT X SECONDS]  
  5.                 [using HOSTNAME hostname]  

[plain] view plaincopyprint?
  1. set mailserver mail.tildeslash.com, mail.foo.bar port 10025  
  2.      username "Rabbi" password "Loew" using tlsv1, localhost  
  3.      with timeout 15 secondshttp://write.blog.csdn.net/postedit/9564261  

[html] view plaincopyprint?
  1. 使用qq邮箱邮件服务器来发送  
  2.   
  3. 用163.qq等邮件服务器需要较多的安全验证信息,使用本地安装的sendmail服务就没有那么多的要求  
  4.   
  5. 首先设置mail server  
  6.   
  7.   set mailserver  smtp.qq.com USERNAME "530765863"  PASSWORD "*********"  
  8.   
  9. 注意qq邮箱的用户名是不加@qq.com的,网上说vip需要加上@vip.qq.com,这没有考证过  
  10.   
  11. 这样设置之后还不行,需要设置邮件的格式,from字段,也就是发件人必须是[email protected]  
  12. set mail-format {  
  13.                    from: [email protected]  
  14.                    subject:monit alert --  $EVENT $SERVICE  
  15.                    message: $EVENT Service $SERVICE  
  16.                    Date:        $DATE  
  17.                    Action:      $ACTION  
  18.                    Host:        $HOST  
  19.                    Description: $DESCRIPTION  
  20.             Your faithful employee,  
  21.             Monit  
  22.   
  23. }  
  24.   
  25. 查收一下你的qq邮箱,一大堆邮件正在靠近。。。  



可以设置有多个邮件服务器,用逗号分隔开,如果15秒内第一个邮件服务器没有反应,会去尝试第二个邮件服务器,会去尝试第三个邮件服务器

缺省的,Monit会使用主机名在SMTP HELO/EHLO 以及the Message-ID header,但是在一些邮件服务器,为了防止垃圾邮件,如果DNS和在事务中所用的主机名不一致,那么就会拒绝,解决这个问题的方法就是设置主机名 [using HOSTNAME hostname]


设置事件队列

  set eventqueue
      basedir /var/monit
      slots 5000
basedir是可选项目,可以仅仅更改slots的数目
SET EVENTQUEUE BASEDIR  [SLOTS ]
为什么要设置队列?因为有些时候去借助邮件服务器发送邮件,会出现连不上,那么就可以把这些事件放在邮件队列里面,等到邮件服务器可用的时候再次发送!


服务超时Service timeout

Monit提供超时服务机制,如果一个服务拒绝启动或者长时间没有回复,那么就超时了

[plain] view plaincopyprint?
  1. IF  RESTART  CYCLE(S) THEN   
如果邮件在x个轮询中有y次重新启动,那么就执行某一动作action

比如:

 if 2 restarts within 3 cycles then unmonitor
如果三个轮询中有两次重启,那么就不监视了
 if 5 restarts within 5 cycles then exec "/foo/bar"
如果5个轮询中有5次重启,那么就执行 某一个动作
 if 7 restarts within 10 cycles then stop

如果在10个轮询中有7次重启,那么就关闭这个服务



服务测试:(SERVICES TEST)

  MONIT在“check service"入口提供了多种测试服务,有两类测试,第一种是可变测试,第二种是不变测试,这就是说我们的测试的条件可以是不变的,比如一个数字,或者可变的

不变测试语法  [[] [TIMES WITHIN] CYCLES] THEN ACTION [ELSE IF SUCCEEDED [[] [TIMES WITHIN] CYCLES] THEN ACTIIF [[] [TIMES WITHIN] CYCLES] THEN ACTION [ELSE IF SUCCEEDED [[] [TIMES WITHIN] CYCLES] THEN ACTION

[plain] view plaincopyprint?
  1. IF  [[] [TIMES WITHIN]  CYCLES] THEN ACTION [ELSE IF SUCCEEDED [[] [TIMES WITHIN]  CYCLES] THEN ACTION]  
可变测试:

IF CHANGED [[] [TIMES WITHIN] CYCLES] THEN ACTIO

[plain] view plaincopyprint?
  1. IF CHANGED  [[] [TIMES WITHIN]  CYCLES] THEN ACTION  
可以使用的ACTION
[plain] view plaincopyprint?
  1. n each test you must select the action to be executed from this list:  
  2.   
  3.     ALERT sends the user an alert event on each state change (for constant tests) or on each change (for variable tests).  
  4.   
  5.     RESTART restarts the service and sends an alert. Restart is conducted by first calling the service's registered stop method and then the service's start method.  
  6.   
  7.     START starts the service by calling the service's registered start method and send an alert.  
  8.   
  9.     STOP stops the service by calling the service's registered stop method and send an alert. If Monit stops a service it will not be checked by Monit anymore nor restarted again later. To reactivate monitoring of the service again you must explicitly enable monitoring from the web interface or from the console, e.g. 'monit monitor apache'.  
  10.   
  11.     EXEC can be used to execute an arbitrary program and send an alert. If you choose this action you must state the program to be executed and if the program require arguments you must enclose the program and its arguments in a quoted string. You may optionally specify the uid and gid the executed program should switch to upon start. For instance:  
  12.   
  13.      exec "/usr/local/tomcat/bin/startup.sh"  
  14.           as uid nobody and gid nobody  
  15.   
  16.     The uid and gid switch can be useful if the program to be started cannot change to a lesser privileged user and group. This is typically needed for Java Servers. Remember, if Monit is run by the superuser, then all programs executed by Monit will be started with superuser privileges unless the uid and gid extension was used.  
  17.   
  18.     UNMONITOR will disable monitoring of the service and send an alert. The service will not be checked by Monit anymore nor restarted again later. To reactivate monitoring of the service you must explicitly enable monitoring from monit's web interface or from the console using the monitor argument.  




}


存在性测试:

      Monit当发现一个文件不存或者一个服务没有启动的时候默认操作是重启这个操作

     语法:

[plain] view plaincopyprint?
  1. IF [DOES] NOT EXIST [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
    action is a choice of "ALERT", "RESTART", "START", "STOP","EXEC" or "UNMONITOR".
check file with path /home/laicb/test.txt (with是自己取的一个名字 路径是path 后面所跟的字符串)
   if does not exist for 5 cycles then alert
注意,检测的是文件,如果你只写了/home/laicb那么监视的时候就会提示,path不是一个有效的类型!目录测试应该是directory

资源测试:

 
[plain] view plaincopyprint?
  1. IF resource operator value [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
  2.   
  3. resource is a choice of "CPU", "TOTALCPU", "CPU([user|system|wait])", "MEMORY", "SWAP", "CHILDREN", "TOTALMEMORY", "LOADAVG([1min|5min|15min])".   
  4. Some resource tests can be used inside a check system entry, some in a check process entry and some in both:  

有些资源可以在check system路口,有些可以在check entry路口,有些都可以,详见官方文档

语法:

[plain] view plaincopyprint?
  1. IF resource operator value [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
resource就是上面所提及的,operator可以是>,>=,<,<=也可以是相应的标识符
 if cpu is greater than 50% for 5 cycles then restart
%也可以用字节或者GB,MB等字符

action可以有"ALERT", "RESTART", "START", "STOP","EXEC" or "UNMONITOR".


文件校验码测试:

这可以测试只能在文件入口

[plain] view plaincopyprint?
  1. The checksum test in constant form is used to verify that a file does not change. Syntax (keywords are in capital):  
  2.   
  3. IF FAILED [MD5|SHA1] CHECKSUM [EXPECT checksum] [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
  4.   
  5. The checksum test in variable form is used to watch for file changes. Syntax (keywords are in capital):  
  6.   
  7. IF CHANGED [MD5|SHA1] CHECKSUM [[ CYCLES] THEN action   

时间戳测试:

[plain] view plaincopyprint?
  1. IF TIMESTAMP [[operator] value [unit]] [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
  2.   
  3. The timestamp statement in variable form is simply to test an existing file or directory for timestamp changes and if changed, execute an action. Syntax (keywords are in capital):  
  4.   
  5. IF CHANGED TIMESTAMP [[ CYCLES] THEN action  
  6.   
  7. operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT", "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL", "NOTEQUAL" in human readable form (if not specified, default is EQUAL).  

时间戳[1]是指文件属性里的创建、修改、访[2]问的时间

改变形式:

 check file httpd.conf with path /usr/local/apache/conf/httpd.conf
   if changed timestamp
      then exec "/usr/local/apache/bin/apachectl graceful"
常量模式:
check file stored.ckp with path /msg-foo/config/stored.ckp
   if timestamp > 1 minute then alert


文件大小测试:

这个只能用在check file入口

[plain] view plaincopyprint?
  1. The size test in constant form is used to verify various size conditions. Syntax (keywords are in capital):  
  2.   
  3. IF SIZE [[operator] value [unit]] [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
  4.   
  5. The size statement in variable form is simply to test an existing file for size changes and if changed, execute an action. Syntax (keywords are in capital):  
  6.   
  7. IF CHANGED SIZE [[ CYCLES] THEN action   

[plain] view plaincopyprint?
  1. check file with path /home/laicb/test.txt  
  2.     if does not exist for 5 cycles then alert  
  3.     if changed size for  1 cycles then alert //如果没有指定,查看服务所对应的会发现是for 5 times within 5cycles   

如果更改文件大小,那么文件大小变化之后就在状态栏里显示size changed


文件目录测试:

[plain] view plaincopyprint?
  1. The syntax (keywords in capital) for using this test is:  
  2.   
  3. IF [NOT] MATCH {regex|path} [[ CYCLES] THEN action   

(有待进一步研究)


文件系统标签测试:

[plain] view plaincopyprint?
  1. The syntax for the fsflags statement is:  
  2.   
  3. IF CHANGED FSFLAGS [[ CYCLES] THEN action   

Example:

 check filesystem rootfs with path /
       if changed fsflags then exec "/my/script"
       alert root@localhost

空间测试:

[plain] view plaincopyprint?
  1. The full syntax for the space statement is:  
  2.   
  3. IF SPACE operator value unit [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
  4.   
  5. operator is a choice of "<",">","!=","==" in c notation, "gt", "lt", "eq", "ne" in shell sh notation and "greater", "less", "equal", "notequal" in human   
  6. readable form (if not specified, default is EQUAL).  
  7.   
  8. unit is a choice of "B","KB","MB","GB", "%" or long alternatives "byte", "kilobyte", "megabyte", "gigabyte", "percent".  
  9.   
  10. action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR"  

权限测试:

[plain] view plaincopyprint?
  1. he syntax for the permission statement is:  
  2.   
  3. IF FAILED PERM(ISSION) octalnumber [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]  
  4.   
  5. octalnumber defines permissions for a file, a directory or a filesystem as four octal digits (0-7). Valid range: 0000 - 7777 (you can omit the leading zeros, Monit will add the zeros to the left thus for example "640" is valid value and matches "0640").  
  6.   
  7. action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR"  

 check file monit.bin with path "/usr/local/bin/monit"
       if failed permission 0555 then unmonitor

UID测试(user identify):

[plain] view plaincopyprint?
  1. Monit can monitor the owner user id (uid) of a file object. This test may only be used within a check - file, fifo, directory or filesystem service entry in the Monit control file.  
  2.   
  3. The syntax for the uid statement is:  
  4.   
  5. IF FAILED UID user [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]   
 check file passwd with path /etc/passwd
       if failed uid root then unmonitor
如果不是root访问/etc/passwd那么拒绝访问

GID测试:

[plain] view plaincopyprint?
  1. Monit can monitor the owner group id (gid) of file objects. This test may only be used within a file, fifo, directory or filesystem service entry in the Monit control file.  
  2.   
  3. The syntax for the gid statement is:  
  4.   
  5. IF FAILED GID user [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]   
 check file shadow with path /etc/shadow
       if failed gid root then unmonitor
如果不是root组,那么就停止监视

PID测试(Progress Identify):

[plain] view plaincopyprint?
  1. Monit can test the process identification number (pid) of a process for changes. This test is implicit and Monit will send a alert in the case of failure by default.  
  2.   
  3. The syntax for the pid statement is:  
  4.   
  5. IF CHANGED PID [[ CYCLES] THEN action   
 check process sshd with pidfile /var/run/sshd.pid
       if changed pid then exec "/my/script"

更新时间测试:

正常运行时间测试:

[plain] view plaincopyprint?
  1. Syntax (keywords are in capital):  
  2.   
  3. IF UPTIME [[operator] value [unit]] [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]   

Example of restarting the process if the uptime exceeded 3 days:

 check process myapp with pidfile /var/run/myapp.pid
    start program = "/etc/init.d/myapp start"
    stop program = "/etc/init.d/myapp stop"
    if uptime > 3 days then restart

 链接测试:

用法很复杂,可以慢慢学!!!!!

程序状态测试:

[plain] view plaincopyprint?
  1. You can check the exit status of a program or a script. This test may only be used within a check program service entry in the Monit control file.  
  2.   
  3. An example:  
  4.   
  5.  check program myscript with path "/usr/local/bin/myscript.sh" with timeout 1000 seconds  
  6.        if status != 0 then aler  
[plain] view plaincopyprint?
  1. The syntax of the program status statement is:  
  2.   
  3. IF STATUS operator value [[ CYCLES] THEN action [ELSE IF SUCCEEDED [[ CYCLES] THEN action]   

服务轮循时间

There are three variants:

[plain] view plaincopyprint?
  1. There are three variants:  
  2.   
  3.     custom interval based on poll cycle length multiple  
  4.   
  5.           EVERY [number] CYCLES  
  6.   
  7.     test schedule based on cron-style string  
  8.   
  9.           EVERY [cron]  
  10.   
  11.     do-not-test schedule based on cron-style string  
  12.   
  13.           NOT EVERY [cron]  
MONIT HTTPD

set httpd port  2812 然后你可以使用http://localhost:2812去访问,端口可以任意指定

 set httpd port 2812
     ssl enable
     pemfile /etc/certs/monit.pem
你可以通过https://localhost:2812 ,通过ssl安全链接访问web服务器

The pemfile, in the example above, holds both the server'sprivate key and certificate. This file should be stored in a safeplace on the filesystem and should have strict permissions, thatis, no more than 0700.

pemfile必须如上设置,存放服务器的私密码值,以及证书


如果你只想httpd只接受到某一个主机的请求那么你可以这么使用,

  set httpd port 2812 and use the address 127.0.0.1

or

  set httpd port 2812 and use the address localhost
如果不使用use the address那么任何地址都可以



目前有八种访问语句被支持

[plain] view plaincopyprint?
  1. CHECK PROCESS   | MATCHING >  
  2.   
  3.  is the absolute path to the program's pidfile. If the pidfile does not exist or does not contain the pid number of a running process, Monit will call the entry's start method if defined.  is alternative process specification using pattern matching to process name (command line) from process table instead of pidfile. The first match is used so this form of check is useful for unique pattern matching - the pidfile should be used where possible as it defines expected pid exactly (pattern matching won't be useful for Apache in most cases for example). The pattern can be obtained using monit procmatch ".*" CLI command which lists all processes visible to Monit or using the ps utility. The "procmatch" CLI command can be used to test your pattern as well. If Monit runs in passive mode or the start methods is not defined, Monit will just send alerts on errors.  
  4. CHECK FILE  PATH   
  5.   
  6.  is the absolute path to the file. If the file does not exist or disappeared, Monit will call the entry's start method if defined, if  does not point to a regular file type (for instance a directory), Monit will disable monitoring of this entry. If Monit runs in passive mode or the start methods is not defined, Monit will just send alerts on errors.  
  7. CHECK FIFO  PATH   
  8.   
  9.  is the absolute path to the fifo. If the fifo does not exist or disappeared, Monit will call the entry's start method if defined, if  does not point to a fifo type (for instance a directory), Monit will disable monitoring of this entry. If Monit runs in passive mode or the start methods is not defined, Monit will just send alerts on errors.  
  10. CHECK FILESYSTEM  PATH   
  11.   
  12.  is the path to the filesystem block special device, mount point, file or a directory which is part of a filesystem. It is recommended to use a block special file directly (for example /dev/hda1 on Linux or /dev/dsk/c0t0d0s1 on Solaris, etc.) If you use a mount point (for example /data), be careful, because if the filesystem is unmounted the test will still be true because the mount point exist.  
  13.   
  14. If the filesystem becomes unavailable, Monit will call the entry's start method if defined. if  does not point to a filesystem, Monit will disable monitoring of this entry. If Monit runs in passive mode or the start methods is not defined, Monit will just send alerts on errors.  
  15. CHECK DIRECTORY  PATH   
  16.   
  17.  is the absolute path to the directory. If the directory does not exist or disappeared, Monit will call the entry's start method if defined, if  does not point to a directory, monit will disable monitoring of this entry. If Monit runs in passive mode or the start methods is not defined, Monit will just send alerts on errors.  
[plain] view plaincopyprint?
  1. CHECK HOST  ADDRESS   
  2.   
  3. The host address can be specified as a hostname string or as an ip-address string on a dotted decimal format. Such as, tildeslash.com or "64.87.72.95".  
  4. CHECK SYSTEM   
  5.   
  6. The system name is usually hostname, but any descriptive name can be used. You can use the variable $HOST as the name, which will expand to the hostname. This test allows one to check general system resources such as CPU usage (percent of time spent in user, system and wait), total memory usage or load average. The unique name is used as the system hostname in mail alerts and when M/Monit is configured, then also as initial name of the host entry in M/Monit.  
  7. CHECK PROGRAM  PATH  [TIMEOUT  SECONDS]  
  8.   
  9.  is the absolute path to the executable program or script. The status test allows one to check the program's exit status. If program will not finish within  seconds, Monit will terminate it. The default program timeout is 600 seconds (5 minutes).  

你可以在配置文件的任何地方使用'if', `and', `with(in)', `has',`using', 'use', 'on(ly)', `usage' and `program(s)',使配置文件看起来像英语,当读取配置文件的时候实际上是忽略他们的
 Here are the legal global keywords:
 Keyword         Function
 ----------------------------------------------------------------
 set daemon      Set a background poll interval in seconds.
 set init        Set Monit to run from init. Monit will not
                 transform itself into a daemon process.
 set logfile     Name of a file to dump error- and status-
                 messages to. If syslog is specified as the 
                 file, Monit will utilize the syslog daemon
                 to log messages. This can optionally be 
                 followed by 'facility ' where 
                 facility is 'log_local0' - 'log_local7' or 
                 'log_daemon'. If no facility is specified, 
                 LOG_USER is used.
 set mailserver  The mailserver used for sending alert
                 notifications. If the mailserver is not 
                 defined, Monit will try to use 'localhost' 
                 as the smtp-server for sending mail. You 
                 can add more mail servers, if Monit cannot
                 connect to the first server it will try the
                 next server and so on.
 set mail-format Set a global mail format for all alert
                 messages emitted by monit.
 set idfile      Explicit set the location of the Monit id
                 file. E.g. set idfile /var/monit/id.
 set pidfile     Explicit set the location of the Monit lock
                 file. E.g. set pidfile /var/run/xyzmonit.pid.
 set statefile   Explicit set the location of the file Monit 
                 will write state data to. If not set, the
                 default is $HOME/.monit.state. 
 set httpd port  Activates Monit http server at the given 
                 port number.
 ssl enable      Enables ssl support for the httpd server.
                 Requires the use of the pemfile statement.
 ssl disable     Disables ssl support for the httpd server.
                 It is equal to omitting any ssl statement.
 pemfile         Set the pemfile to be used with ssl.
 clientpemfile   Set the pemfile to be used when client
                 certificates should be checked by monit.
 address         If specified, the http server will only 
                 accept connect requests to this addresses
                 This statement is an optional part of the
                 set httpd statement.
 allow           Specifies a host or IP address allowed to
                 connect to the http server. Can also specify
                 a username and password allowed to connect
                 to the server. More than one allow statement
                 are allowed. This statement is also an 
                 optional part of the set httpd statement.
 read-only       Set the user defined in username:password
                 to read only. A read-only user cannot change
                 a service from the Monit web interface.
 include         include a file or files matching the globstring

 Here are the legal service entry keywords:
 Keyword         Function
 ----------------------------------------------------------------
 check           Starts an entry and must be followed by the type
                 of monitored service {filesystem|directory|file|host
                 process|system|program} and a descriptive name for
                 the service.
 pidfile         Specify the  process pidfile. Every
                 process must create a pidfile with its
                 current process id. This statement should only
                 be used in a process service entry.
 path            Must be followed by a path to the block
                 special file for filesystem, regular
                 file, directory or a process's pidfile.
 group           Specify a groupname for a service entry.
 start           The program used to start the specified 
                 service. Full path is required. This 
                 statement is optional, but recommended.
 stop            The program used to stop the specified
                 service. Full path is required. This 
                 statement is optional, but recommended.
 pid and ppid    These keywords may be used as standalone
                 statements in a process service entry to
                 override the alert action for change of
                 process pid and ppid.
 uid and gid     These keywords are either 1) an optional part of
                 a start, stop or exec statement. They may be
                 used to specify a user id and a group id the
                 program (process) should switch to upon start.
                 This feature can only be used if the superuser
                 is running monit. 2) uid and gid may also be
                 used as standalone statements in a file service
                 entry to test a file's uid and gid attributes.
 host            The hostname or IP address to test the port
                 at. This keyword can only be used together
                 with a port statement or in the check host
                 statement.
 port            Specify a TCP/IP service port number which 
                 a process is listening on. This statement
                 is also optional. If this statement is not
                 prefixed with a host-statement, localhost is
                 used as the hostname to test the port at.
 type            Specifies the socket type Monit should use when
                 testing a connection to a port. If the type
                 keyword is omitted, tcp is used. This keyword
                 must be followed by either tcp, udp or tcpssl.
 tcp             Specifies that Monit should use a TCP 
                 socket type (stream) when testing a port.
 tcpssl          Specifies that Monit should use a TCP socket
                 type (stream) and the secure socket layer (ssl)
                 when testing a port connection.
 udp             Specifies that Monit should use a UDP socket
                 type (datagram) when testing a port.
 certmd5         The md5 sum of a certificate a ssl forged 
                 server has to deliver.
 proto(col)      This keyword specifies the type of service 
                 found at the port. See CONNECTION TESTING
                 for list of supported protocols.
                 You're welcome to write new protocol test
                 modules. If no protocol is specified Monit will
                 use a default test which in most cases are good
                 enough.
 request         Specifies a server request and must come
                 after the protocol keyword mentioned above.
                  - for http it can contain an URL and an
                    optional query string.
                  - other protocols does not support this
                    statement yet
 send/expect     These keywords specify a generic protocol. 
                 Both require a string whether to be sent or
                 to be matched against (as extended regex if 
                 supported).  Send/expect can not be used 
                 together with the proto(col) statement.
 unix(socket)    Specifies a Unix socket file and used like 
                 the port statement above to test a Unix 
                 domain network socket connection.
 URL             Specify an URL string which Monit will use for
                 connection testing.
 content         Optional sub-statement for the URL statement.
                 Specifies that Monit should test the content
                 returned by the server against a regular 
                 expression.
 timeout x sec.  Define a network port connection timeout. Must
                 be followed by a number in seconds and the 
                 keyword, seconds.
 timeout         Define a service timeout. Must be followed by
                 two digits. The first digit is max number of
                 restarts for the service. The second digit
                 is the cycle interval to test restarts. 
                 This statement is optional.
 alert           Specifies an email address for notification
                 if a service event occurs. Alert can also
                 be postfixed, to only send a message for
                 certain events. See the examples above. More
                 than one alert statement is allowed in an
                 entry. This statement is also optional.
 noalert         Specifies an email address which don't want
                 to receive alerts. This statement is also
                 optional.
 restart, stop   These keywords may be used as actions for 
 unmonitor,      various test statements. The exec statement is
 start and       special in that it requires a following string
 exec            specifying the program to be execute. You may
                 also specify an UID and GID for the exec 
                 statement. The program executed will then run
                 using the specified user id and group id.
 mail-format     Specifies a mail format for an alert message 
                 This statement is an optional part of the
                 alert statement.
 checksum        Specify that Monit should compute and monitor a
                 file's md5/sha1 checksum. May only be used in a 
                 check file entry.
 expect          Specifies a md5/sha1 checksum string Monit 
                 should expect when testing the checksum. This 
                 statement is an optional part of the checksum 
                 statement.
 timestamp       Specifies an expected timestamp for a file
                 or directory. More than one timestamp statement
                 are allowed. May only be used in a check file or
                 check directory entry.
 changed         Part of a timestamp statement and used as an
                 operator to simply test for a timestamp change.
 every           Validate this entry only at every n poll cycle
                 or per cron specification. Useful in daemon mode
                 when the cycle is short and a service takes some
                 time to start or to suppress monitoring during
                 backup windows.
 mode            Must be followed either by the keyword active,
                 passive or manual. If active, Monit will restart
                 the service if it is not running (this is the
                 default behavior). If passive, Monit will not
                 (re)start the service if it is not running - it
                 will only monitor and send alerts (resource
                 related restart and stop options are ignored
                 in this mode also). If manual, Monit will enter
                 active mode only if a service was started under
                 monit's control otherwise the service isn't
                 monitored.
 cpu             Must be followed by a compare operator, a number 
                 with "%" and an action. This statement is used
                 to check the cpu usage in percent of a process
                 with its children over a number of cycles. If
                 the compare expression matches then the 
                 specified action is executed.
 mem             The equivalent to the cpu token for memory of a 
                 process (w/o children!).  This token must be 
                 followed by a compare operator a number with 
                 unit {B|KB|MB|GB|%|byte|kilobyte|megabyte|
                 gigabyte|percent} and an action.
 swap            Token for system swap usage monitoring. This token
                 must be followed by a compare operator a number with 
                 unit {B|KB|MB|GB|%|byte|kilobyte|megabyte|gigabyte|percent}
                 and an action.
 loadavg         Must be followed by [1min,5min,15min] in (), a 
                 compare operator, a number and an action. This
                 statement is used to check the system load 
                 average over a number of cycles. If the compare 
                 expression matches then the specified action is 
                 executed.
 children        This is the number of child processes spawn by a
                 process. The syntax is the same as above.
 totalmem        The equivalent of mem, except totalmem is an
                 aggregation of memory, not only used by a
                 process but also by all its child
                 processes. The syntax is the same as above.
 space           Must be followed by a compare operator, a
                 number, unit {B|KB|MB|GB|%|byte|kilobyte|
                 megabyte|gigabyte|percent} and an action.
 inode(s)        Must be followed by a compare operator, integer
                 number, optionally by percent sign (if not, the
                 limit is absolute) and an action.
 perm(ission)    Must be followed by an octal number describing
                 the permissions.
 size            Must be followed by a compare operator, a
                 number, unit {B|KB|MB|GB|byte|kilobyte|
                 megabyte|gigabyte} and an action.
 uptime          Must be followed by a compare operator, a
                 number, unit {second(s)|minute(s)|hour(s)|day(s)}
                 and an action.
 depends (on)    Must be followed by the name of a service this
                 service depends on.
每一进程都有pid,存放在/var/run/monit.pid 这个文件里面会存放一个数值


CONFIGURATION EXAMPLES

The simplest form is just the check statement. In this example wecheck to see if the server is running and log a message if not:

 check process resin with pidfile /usr/local/resin/srun.pid

Checking process without pidfile:

 check process pager matching "/sbin/dynamic_pager -F /private/var/vm/swapfile"

To have Monit start the server if it's not running, add a startstatement:

 check process resin with pidfile /usr/local/resin/srun.pid
       start program = "/usr/local/resin/bin/srun.sh start"
       stop program = "/usr/local/resin/bin/srun.sh stop"

Here's a more advanced example for monitoring an apacheweb-server listening on the default port number for HTTP andHTTPS. In this example Monit will restart apache if it's notaccepting connections at the port numbers. The method Monit usefor a process restart is to first execute the stop-program, waitup to 30s for the process to stop and then execute the start-programand wait up to 30s for it to start. The length of start or stoptimeout can be overridden using the 'timeout' option. If Monit wasunable to stop or start the service a failed alert message willbe sent if you have requested alert messages to be sent.

 check process apache with pidfile /var/run/httpd.pid
       start program = "/etc/init.d/httpd start" with timeout 60 seconds
       stop program  = "/etc/init.d/httpd stop"
       if failed port 80 then restart
       if failed port 443 with timeout 15 seconds then restart

This example demonstrate how you can run a program as a specifieduser (uid) and with a specified group (gid). Many daemon programswill do the uid and gid switch by them self, but for thoseprograms that does not (e.g. Java programs), monit's ability tostart a program as a certain user can be very useful. In thisexample we start the Tomcat Java Servlet Engine as the standardnobody user and group. Please note that Monit will only switchuid and gid for a program if the super-user is running monit,otherwise Monit will simply ignore the request to change uid andgid.

 check process tomcat with pidfile /var/run/tomcat.pid
       start program = "/etc/init.d/tomcat start" 
             as uid nobody and gid nobody
       stop program  = "/etc/init.d/tomcat stop"
             # You can also use id numbers instead and write:
             as uid 99 and with gid 99
       if failed port 8080 then alert

In this example we use udp for connection testing to check if thename-server is running and also use timeout and alert:

 check process named with pidfile /var/run/named.pid
       start program = "/etc/init.d/named start"
       stop program  = "/etc/init.d/named stop"
       if failed port 53 use type udp protocol dns then restart
       if 3 restarts within 5 cycles then timeout

The following example illustrates how to check if the service'sophie' is answering connections on its Unix domain socket:

 check process sophie with pidfile /var/run/sophie.pid
       start program = "/etc/init.d/sophie start"
       stop  program = "/etc/init.d/sophie stop"
       if failed unix /var/run/sophie then restart

In this example we check an apache web-server running onlocalhost that answers for several IP-based virtual hosts orvhosts, hence the host statement before port:

 check process apache with pidfile /var/run/httpd.pid
       start "/etc/init.d/httpd start"
       stop  "/etc/init.d/httpd stop"
       if failed host www.sol.no port 80 then alert
       if failed host shop.sol.no port 443 then alert
       if failed host chat.sol.no port 80 then alert
       if failed host www.tildeslash.com port 80 then alert

To make sure that Monit is communicating with a http server aprotocol test can be added:

 check process apache with pidfile /var/run/httpd.pid
       start "/etc/init.d/httpd start"
       stop  "/etc/init.d/httpd stop"
       if failed host www.sol.no port 80 
          protocol HTTP
          then alert

This example shows a different way to check a webserver usingthe send/expect mechanism:

 check process apache with pidfile /var/run/httpd.pid
       start "/etc/init.d/httpd start"
       stop  "/etc/init.d/httpd stop"
       if failed host www.sol.no port 80 
          send "GET / HTTP/1.0\r\nHost: www.sol.no\r\n\r\n"
          expect "HTTP/[0-9\.]{3} 200 .*\r\n"
          then alert

To make sure that Apache is logging successfully (i.e. no more than 60 percent of child servers are logging), use its mod_statuspage at www.sol.no/server-status with this special protocol test:

 check process apache with pidfile /var/run/httpd.pid
       start "/etc/init.d/httpd start"
       stop  "/etc/init.d/httpd stop"
       if failed host www.sol.no port 80
       protocol apache-status loglimit > 60% then restart

This configuration can be used to alert you if 25 percent or moreof Apache child processes are stuck performing DNS lookups:

 check process apache with pidfile /var/run/httpd.pid
       start "/etc/init.d/httpd start"
       stop  "/etc/init.d/httpd stop"
       if failed host www.sol.no port 80
       protocol apache-status dnslimit > 25% then alert

Here we use an icmp ping test to check if a remote host is up andif not send an alert:

 check host www.tildeslash.com with address www.tildeslash.com
       if failed icmp type echo count 5 with timeout 15 seconds
          then alert

In the following example we ask Monit to compute and verify thechecksum for the underlying apache binary used by the start andstop programs. If the the checksum test should fail, monitoringwill be disabled to prevent possibly starting a compromisedbinary:

 check process apache with pidfile /var/run/httpd.pid
       start program = "/etc/init.d/httpd start"
       stop program  = "/etc/init.d/httpd stop"
       if failed host www.tildeslash.com port 80 then restart
       depends on apache_bin
 check file apache_bin with path /usr/local/apache/bin/httpd
       if failed checksum then unmonitor

In this example we ask Monit to test the checksum for a documenton a remote server. If the checksum was changed we send an alert:

 check host tildeslash with address www.tildeslash.com
       if failed port 80 protocol http 
          and request "/monit/dist/monit-4.0.tar.gz"
              with checksum f9d26b8393736b5dfad837bb13780786
       then alert

Here are a couple of tests for some popular communicationservers, using the SIP protocol. First we test a FreeSWITCHserver and then an Asterisk server

 check process freeswitch 
    with pidfile /usr/local/freeswitch/log/freeswitch.pid
  start program = “/usr/local/freeswitch/bin/freeswitch -nc -hp”
  stop program = “/usr/local/freeswitch/bin/freeswitch -stop”
  if totalmem > 1000.0 MB for 5 cycles then alert
  if totalmem > 1500.0 MB for 5 cycles then alert
  if totalmem > 2000.0 MB for 5 cycles then restart
  if cpu > 60% for 5 cycles then alert
  if failed port 5060 type udp protocol SIP 
     target [email protected] and maxforward 10 
  then restart
  if 5 restarts within 5 cycles then timeout
 check process asterisk 
   with pidfile /var/run/asterisk/asterisk.pid
   start program = “/usr/sbin/asterisk”
   stop program = “/usr/sbin/asterisk -r -x ’shutdown now’”
   if totalmem > 1000.0 MB for 5 cycles then alert
   if totalmem > 1500.0 MB for 5 cycles then alert
   if totalmem > 2000.0 MB for 5 cycles then restart
   if cpu > 60% for 5 cycles then alert
   if failed port 5060 type udp protocol SIP 
     and target [email protected] maxforward 10 
   then restart
   if 5 restarts within 5 cycles then timeout

Some servers are slow starters, like for example Java basedApplication Servers. So if we want to keep the poll-cycle low(i.e. < 60 seconds) but allow some services to take its time tostart, theevery statement is handy:

 check process dynamo with pidfile /etc/dynamo.pid every 2 cycles
       start program = "/etc/init.d/dynamo start"
       stop program  = "/etc/init.d/dynamo stop"
       if failed port 8840 then alert

Here is an example where we group together two database entriesso you can manage them together, e.g.; 'Monit -g database startall'. The mode statement is also illustrated in the first entryand have the effect that Monit will not try to (re)start thisservice if it is not running:

 check process sybase with pidfile /var/run/sybase.pid
       start = "/etc/init.d/sybase start"
       stop  = "/etc/init.d/sybase stop"
       mode passive
       group database
 check process oracle with pidfile /var/run/oracle.pid
       start program = "/etc/init.d/oracle start"
       stop program  = "/etc/init.d/oracle stop"
       mode active # Not necessary really, since it's the default
       if failed port 9001 then restart
       group database

Here is an example to show the usage of the resource checks. Itwill send an alert when the CPU usage of the http daemon and itschild processes raises beyond 60% for over two cycles. Apache isrestarted if the CPU usage is over 80% for five cycles or thememory usage over 100Mb for five cycles or if the machines loadaverage is more than 10 for 8 cycles:

 check process apache with pidfile /var/run/httpd.pid
       start program = "/etc/init.d/httpd start"
       stop program  = "/etc/init.d/httpd stop"
       if cpu > 40% for 2 cycles then alert
       if totalcpu > 60% for 2 cycles then alert
       if totalcpu > 80% for 5 cycles then restart
       if mem > 100 MB for 5 cycles then stop
       if loadavg(5min) greater than 10.0 for 8 cycles then stop

This examples demonstrate the timestamp statement with exec andhow you may restart apache if its configuration file waschanged.

 check file httpd.conf with path /etc/httpd/httpd.conf
       if changed timestamp
          then exec "/etc/init.d/httpd graceful"

In this example we demonstrate usage of the extended alertstatement and a file check dependency:

 check process apache with pidfile /var/run/httpd.pid
      start = "/etc/init.d/httpd start"
      stop  = "/etc/init.d/httpd stop"
      alert admin@bar on {nonexist, timeout} 
        with mail-format { 
              from:     bofh@$HOST
              subject:  apache $EVENT - $ACTION
              message:  This event occurred on $HOST at $DATE. 
              Your faithful employee,
              monit
      }
      if failed host www.tildeslash.com  port 80 then restart
      if 3 restarts within 5 cycles then timeout
      depend httpd_bin
      group apache
 check file httpd_bin with path /usr/local/apache/bin/httpd
       alert security@bar on {checksum, timestamp, 
                  permission, uid, gid}
             with mail-format {subject: Alaaarrm! on $HOST}
       if failed checksum 
          and expect 8f7f419955cefa0b33a2ba316cba3659
              then unmonitor
       if failed permission 755 then unmonitor
       if failed uid root then unmonitor
       if failed gid root then unmonitor
       if changed timestamp then alert
       group apache

In this example, we demonstrate usage of the depend statement. Inthis case, we want to start oracle and apache. However, we've setup apache to use oracle as a back end, and if oracle isrestarted, apache must be restarted as well.

 check process apache with pidfile /var/run/httpd.pid
       start = "/etc/init.d/httpd start"
       stop  = "/etc/init.d/httpd stop"
       depends on oracle
 check process oracle with pidfile /var/run/oracle.pid
       start = "/etc/init.d/oracle start"
       stop  = "/etc/init.d/oracle stop"
       if failed port 9001 then restart

Next, we have 2 services, oracle-import and oracle-export thatneed to be restarted if oracle is restarted, but are independentof each other.

 check process oracle with pidfile /var/run/oracle.pid
       start = "/etc/init.d/oracle start"
       stop  = "/etc/init.d/oracle stop"
       if failed port 9001 then restart
 check process oracle-import 
      with pidfile /var/run/oracle-import.pid
       start = "/etc/init.d/oracle-import start"
       stop  = "/etc/init.d/oracle-import stop"
       depends on oracle
 check process oracle-export 
      with pidfile /var/run/oracle-export.pid
       start = "/etc/init.d/oracle-export start"
       stop  = "/etc/init.d/oracle-export stop"
       depends on oracle

Finally an example with all statements:

 check process apache with pidfile /var/run/httpd.pid
       start program = "/etc/init.d/httpd start"
       stop program  = "/etc/init.d/httpd stop"
       if 3 restarts within 5 cycles then timeout
       if failed host www.sol.no  port 80 protocol http
          and use the request "/login.cgi"
              then alert
       if failed host shop.sol.no port 443 type tcpssl 
          protocol http and with timeout 15 seconds 
              then restart
       if cpu is greater than 60% for 2 cycles then alert
       if cpu > 80% for 5 cycles then restart
       if totalmem > 100 MB then stop
       if children > 200 then alert
       alert bofh@bar with mail-format {from: [email protected]}
       every 2 cycles
       mode active
       depends on weblogic
       depends on httpd.pid
       depends on httpd.conf
       depends on httpd_bin
       depends on datafs
       group server
 check file httpd.pid with path /usr/local/apache/logs/httpd.pid
       group server
       if timestamp > 7 days then restart
       every 2 cycles
       alert bofh@bar with mail-format {from: [email protected]}
       depends on datafs
 check file httpd.conf with path /etc/httpd/httpd.conf
       group server
       if timestamp was changed 
          then exec "/usr/local/apache/bin/apachectl graceful"
       every 2 cycles
       alert bofh@bar with mail-format {from: [email protected]}
       depends on datafs
 check file httpd_bin with path /usr/local/apache/bin/httpd
       group server
       if failed checksum and expect the sum
          8f7f419955cefa0b33a2ba316cba3659 then unmonitor
       if failed permission 755 then unmonitor
       if failed uid root then unmonitor
       if failed gid root then unmonitor
       if changed size then alert
       if changed timestamp then alert
       every 2 cycles
       alert bofh@bar with mail-format {from: [email protected]}
       alert foo@bar on { checksum, size, timestamp, uid, gid } 
       depends on datafs
 check filesystem datafs with path /dev/sdb1
       group server
       start program  = "/bin/mount /data"
       stop program  =  "/bin/umount /data"
       if failed permission 660 then unmonitor
       if failed uid root then unmonitor
       if failed gid disk then unmonitor
       if space usage > 80 % then alert
       if space usage > 94 % then stop
       if inode usage > 80 % then alert
       if inode usage > 94 % then stop
       alert root@localhost
 check host ftp.redhat.com with address ftp.redhat.com
       if failed icmp type echo with timeout 15 seconds
          then alert 
       if failed port 21 protocol ftp
          then exec "/usr/X11R6/bin/xmessage -display
                     :0 ftp connection failed"
       alert [email protected]
 
 check host www.gnu.org with address www.gnu.org
       if failed port 80 protocol http 
          and request "/pub/gnu/bash/bash-2.05b.tar.gz"
              with checksum 8f7f419955cefa0b33a2ba316cba3659
       then alert
       alert [email protected] with mail-format {
            subject: The gnu server may be hacked again! }

你可能感兴趣的:(Linux)