3.nagios原理及配置详解

1、Nagios如何监控Linux机器 NRPE总共由两部分组成: (1).check_nrpe插件,运行在监控主机上。 服务器端安装详见: (2).NRPE daemon,运行在远程的linux主机上(通常就是被监控机) 客户端具体安装详见: 图1 按照上图,整个的监控过程如下: 当Nagios需要监控某个远程linux主机的服务或者资源情况时: 1).nagios会运行check_nrpe插件,我们要在nagios配置文件中告诉它要检查什么. 2).check_nrpe插件会通过SSL连接到远程的NRPE daemon. 3).NRPE daemon会运行相应的nagios插件来执行检查本地资源或服务. 4).NRPE daemon将检查的结果返回给check_nrpe插件,插件将其递交给nagios做处理. 注意:NRPE daemon需要nagios插件安装在远程被监控linux主机上,否则,daemon不能做任何的监控. 2、Nagios的配置文件详解 注:Nagios所有配置文件均在/usr/local/nagios/etc下 其作用分别如下: 控制cgi访问的配置文件 cgi.cfg # Nagios主配置文件 nagios.cfg # resource.cfg定义了一些变量,以便被其他文件引用,如$USER1$ resource.cfg # objects是一个目录,用于定义Nagios对象 objects # servers是自己创建的一个目录,Nagios可以加载一个目录下面的所有配置文件(需要在nagios.cfg中配置) servers ./objects:目录下文件详解 # 命令定义配置文件,里面定义的命令可以被其他文件引用 commands.cfg # 联系人和联系人组配置文件 contacts.cfg # 监控本地机器的配置文件 localhost.cfg # 监控打印机的一个事例配置文件(默认未启用) printer.cfg # 监控路由器的一个事例配置文件(默认未启用) switch.cfg # 模板配置文件,在此可以定义模板,在其他文件中引用 templates.cfg # 定义监控时间段的配置文件 timeperiods.cfg # 监控Windows的一个事例配置文件(默认未启用) windows.cfg ./servers:目录下文件详解 # 自己创建的主机群组配置文件 hostgroup.cfg # 自己创建的监控远程Linux主机的配置文件 wiki-l-11.cfg 3、配置文件是怎样引用的? 用nagios主要是监控一台主机的各种信息,包括本机资源以及对外的服务等等.这些在nagios里面都是被定义为一个个的项目(nagios称之为服务,为了与主机提供的服务相区别,我这里用项目这个词),而实现每个监控项目,则需要通过commands.cfg文件中定义的命令。 为了不必重复定义一些项目,Nagios引入了一个模板配置文件(templates.cfg),将一些共性的属性定义成模板,以便于多次引用。 我们现在有一个监控项目是监控一台机器的web服务是否正常, 我们需要哪些元素呢?最重要的有下面三点:首先是监控哪台机器,然后是这个监控要用什么命令实现,最后就是出了问题的时候要通知哪个联系人。 我们首先应该在commands.cfg中定义监控远程服务和资源的命令,以及如何发送邮件的命令。大部分监控远程服务和资源的命令的命令通过/usr/local/nagios/libexec下的脚本实现,如ping命令为check_ping。 /usr/local/nagios/libexec下的脚本命令的使用发法可以通过-h参数查看,如: [codesyntax lang="text"] ----------------------------------------------------------------------------------------- /usr/local/nagios/libexec/check_ping -h check_ping v1991 (nagios-plugins 1.4.13) Copyright (c) 1999 Ethan Galstad <[email protected]> Copyright (c) 2000-2007 Nagios Plugin Development Team <[email protected]> Use ping to check connection statistics for a remote host. Usage:check_ping -H <host_address> -w ,% -c ,% [-p packets] [-t timeout] [-4|-6] Options: -h, --help Print detailed help screen -V, --version Print version information -4, --use-ipv4 Use IPv4 connection -6, --use-ipv6 Use IPv6 connection -H, --hostname=HOST host to ping -w, --warning=THRESHOLD warning threshold pair -c, --critical=THRESHOLD critical threshold pair -p, --packets=INTEGER number of ICMP ECHO packets to send (Default: 5) -L, --link show HTML in the plugin output (obsoleted by urlize) -t, --timeout=INTEGER Seconds before connection times out (default: 10) ----------------------------------------------------------------------------------------- [/codesyntax] 然后我们在contacts.cfg文件中定义联系人和联系人组,在timeperiods.cfg中定义监控时间段。最后我们在服务器监控配置文件中引用前面定义的元素来监控服务器状态。 4、主配置文件相关配置说明 图2 4.1、resource.cfg 下面引用配置文件中部分配置做说明: [codesyntax lang="text"] vi /usr/local/nagios/etc/resource.cfg # 定义$USER1$变量,设置插件路径 $USER1$=/usr/local/nagios/libexec [/codesyntax] 4.2、commands.cfg [codesyntax lang="text"] vi /usr/local/nagios/etc/objects/commands.cfg # 定义check-host-alive命令 define command{ command_name check-host-alive # 命令名称 command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5 } # 上面的$USER1$和$HOSTADDRESS$引用自已定义的配置文件resource.cfg。变量不需在commands.cfg定义就能被引用。 ######################################################################## # # 2008.11.18 add by Stone # NRPE COMMAND # 自己定义check_nrpe命令,此命令后必需接一个参数,用于告诉远程服务器上的NRPE daemon需要监控的内容,如check_swap参数为监控远程机器的交换分区。 ######################################################################## # 'check_nrpe ' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ } [/codesyntax] 4.3、contacts.cfg [codesyntax lang="text"] vi /usr/local/nagios/etc/objects/contacts.cfg # 定义联系人 define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined above) alias Nagios Admin ; Full name of user email [email protected] ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** } # 上面的generic-contact在templates.cfg中定义。 # 定义联系人组 define contactgroup{ contactgroup_name admins alias Nagios Administrators members nagiosadmin #在此可以加入多个联系人,中间用逗号隔开 } [/codesyntax] 4.4、timeperiods.cfg [codesyntax lang="text"] vi /usr/local/nagios/etc/objects/timeperiods.cfg # 定义监控的时间段 define timeperiod{ timeperiod_name 24x7 #监控所有时间段(7*24小时) alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 } [/codesyntax] 4.5、templates.cfg [codesyntax lang="text"] vi /usr/local/nagios/etc/objects/templates.cfg # 定义generic-contact联系人模板,并非真正的联系人,真正的联系人在contacts.cfg中定义 define contact{ name generic-contact ; The name of this contact template service_notification_period 24x7 ; service notifications can be sent anytime host_notification_period 24x7 ; host notifications can be sent anytime service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events service_notification_commands notify-service-by-email ; send service notifications via email host_notification_commands notify-host-by-email ; send host notifications via email register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE! } [/codesyntax] 相关参数解释: ------------------------------------------------------------------------------------------------------------------ service_notification_period 24x7 服务出了状况通知的时间段,这个时间段就是上面在timeperiods.cfg中定义的. host_notification_period 24x7 主机出了状况通知的时间段, 这个时间段就是上面在timeperiods.cfg中定义的 service_notification_options w,u,c,r 当服务出现w—报警(warning),u—未知(unkown),c—严重(critical),或者r—从异常情况恢复正常,在这四种情况下通知联系人. host_notification_options d,u,r 当主机出现d----—当机(down),u—返回不可达(unreachable),r—从异常情况恢复正常,在这3种情况下通知联系人 service_notification_commands notify-service-by-email 服务出问题通知采用的命令notify-service-by-email,这个命令是在commands.cfg中定义的,作用是给联系人发邮件. host_notification_commands notify-host-by-email 同上,主机出问题时采用的也是发邮件的方式通知联系人 ------------------------------------------------------------------------------------------------------------------ [codesyntax lang="text"] # 定义generic-host主机模板 define host{ name generic-host ; The name of this host template notifications_enabled 1 ; Host notifications are enabled event_handler_enabled 1 ; Host event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts notification_period 24x7 ; Send host notifications at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! } # 定义Linux主机模板 define host{ name linux-server ; The name of this host template use generic-host ; This template inherits other values from the generic-host template check_period 24x7 ; By default, Linux hosts are checked round the clock check_interval 5 ; Actively check the host every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 10 ; Check each Linux host 10 times (max) check_command check-host-alive ; Default command to check Linux hosts notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day ; Note that the notification_period variable is being overridden from ; the value that is inherited from the generic-host template! notification_interval 120 ; Resend notifications every 2 hours notification_options d,u,r ; Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! } [/codesyntax] 4.6、nagios.cfg [codesyntax lang="text"] vi /usr/local/nagios/etc/nagios.cfg # 在nagios.cfg配置文件中开启对/usr/local/nagios/etc/servers/中配置文件的引用。 cfg_dir=/usr/local/nagios/etc/servers [/codesyntax] 5、Servers相关文件配置 5.1、定义主机配置 # 远程Linux主机监控文件,如果监控多台主机只需简单复制修改即可。 #我们应该牢记wiki-l-11.cfg用到的命令在commands.cfg中定义,在commands.cfg中定义的命令用到/usr/local/nagios/libexec下的插件(命令)。 [codesyntax lang="text"] vi /usr/local/nagios/etc/servers/wiki-l-11.cfg # 定义主机 define host{ use linux-server ; Name of host template to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name wiki alias Docs address 192.168.0.11 } # 定义Ping远程Linux主机 define service{ use generic-service ; Name of service template to use host_name wiki service_description PING check_command check_ping!100.0,20%!500.0,60% ;check_ping命令在commands.cfg中定义,后跟两个参数,命令及参数间用!分割。 } #检查远程Linux主机根分区使用情况,check_nrpe命令必须在/usr/local/nagios/etc/objects/commands.cfg中定义(默认未定义) define service{ use generic-service ; Name of service template to use host_name wiki service_description Root Partition check_command check_nrpe!check_disk_root } # 检查远程Linux主机的登录人数 define service{ use generic-service ; Name of service template to use host_name wiki service_description Current Users check_command check_nrpe!check_users } # 检查远程Linux的主机的负载 define service{ use generic-service ; Name of service template to use host_name wiki service_description Current Load check_command check_nrpe!check_load } # 检查远程Linux主机swap分区使用情况 define service{ use generic-service ; Name of service template to use host_name wiki service_description Swap Usage check_command check_nrpe!check_swap } # 检查远程Linux主机的SSH服务 define service{ use generic-service ; Name of service template to use host_name wiki service_description SSH check_command check_ssh notifications_enabled 0 } # 检查远程Linux主机的HTTP服务 define service{ use generic-service ; Name of service template to use host_name wiki service_description HTTP check_command check_http notifications_enabled 0 } [/codesyntax] 5.2、定义用户组配置 [codesyntax lang="text"] vi /usr/local/nagios/etc/servers/hostgroup.cfg # 定义主机组(localhost.cfg中有类似的主机组设置,我已将其注释掉,否则可能会有冲突) define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group members localhost,wiki ; Comma separated list of hosts that belong to this group } #define hostgroup{ # hostgroup_name windows-servers ; The name of the hostgroup # alias Windows Servers ; Long name of the group # members print ; Comma separated list of hosts that belong to this group # } [/codesyntax] 6、检查配置并重启 # 完成监控主机配置文件的配置后使用下面命令检查配置文件的正确性: [codesyntax lang="text"] /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg [/codesyntax] # 确定无误后重启Nagios: [codesyntax lang="text"] service nagios restart [/codesyntax]

你可能感兴趣的:(server,object,原理,配置文件,Daemon,nrpe)