What's New in Nagios Core 4.x


Up To: Contents
See Also: Known Issues

Important: Make sure you read through the documentation and the FAQs at support.nagios.com before sending a question to the mailing lists.

注意:在你发送问题给邮件列表之前,请确定你已经通读了文档和在support.nagios.com上面的FAQs。

Change Log 更新记录

The change log for Nagios can be found online at https://www.nagios.org/development/history or in the Changelog file in the root directory of the source code distribution.

在线的更新记录在https://www.nagios.org/development/history ,在发布的源代码根目录的Changelog文件内也有更新记录 。

Changes and New Features 变更内容与新特征

  1. Performance Improvements:

    改善的表现

    The performance improvements in Nagios Core 4 come primarily from the following areas:

    Nagios Core 4 的改善主要体现在以下几个方面:


  • Core Workers - Core workers are lightweight processes whose only job is to perform checks. Because they are smaller they spawn much more quickly than the the old process which forked the full Nagios Core. In addition, they communicate with the main Nagios Core process using in-memory techniques, eliminating the disk I/O latencies that could previously slow things down, especially in large installations.

  • 核心工人。核心工人是轻量级的进程唯一的工作是执行检查。相比原进程它们可以大量的快速的生成。另外,它们与Nagios Core主进程间通讯是通过内存技术来实现的,消除了磁盘IO延迟,在监控大量设备时候不会慢下来。

  • Configuration Verification - Configuration verification has been improved so that each configuration item is verified only once.  Previously configuration verification was an O(n2) operation.

  • 配置校验。配置校验也得到了很大的改善,只需要一次校验。先前的配置校验是需要O(n2)次操作。

  • Event Queue - The event queue now uses a data structure that has O(log n) insertion times versus the O(n) insertion time previously. This means that inserting events into the queue uses much less CPU than in Nagios Core 3.

  • 事件队列。现在的事件队列使用的数据结构只需要O(log n)次插入相比以前的O(n)次插入,意味着事件插入队列比以前使用更少的CPU资源。

  • Macro Resolution - Macros are now sorted on startup so macro lookup can use a binary search. In addition, frequently accessed macros $USERx$, $ARGx$, and $HOSTADDRESS$ are given special case, early lookups.

    宏。宏在启动的时候就被分类,可以使用二进制搜索。那些频繁存取的宏$USERx$, $ARGx$, and $HOSTADDRESS$在早期查找表中就被指定好了。

Object Definitions: The following changes have been made to object definitions:

对象定义:有下面的改变:

  • The host address attribute is now optional. The address attribute is set to the host name when it is absent. Most configurations set the host name attribute to the DNS host name making the address attribute redundant.

  • 主机地址属性可以选择。当它缺少的时候地址属性被主机名替代。当主机名与DSN主机名相同的时候地址属性是多余的。

  • Both hosts and services now support an hourly value attribute. The hourly value attribute is intended to represent the value of a host or service to an organization and is used by the new minimum value contact attribute.

  • 主机与服务同时支持一个hourly值属性。这个hourly值是contact属性中的一个新的最小值。

  • Services now support a parents attribute. A service parent performs a function similar to host parents and can be used in place of service dependencies in simple circumstances.

  • 服务现在支持parents属性。它同主机的parents属性相类似,可以取代服务依赖。

  • The failure_prediction_enabled flag has been removed from both host and service object definitions.

  • 参数failure_prediction_enabled 从host与service对象定义中同时被移除

  • Contacts now support a minimum value attribute. The mininum value attribute is used with the host and service hourly value attributes to determine whether to notify a contact on host and service problems.

  • contacts支持一个最小值属性,当这个最小值属性被使用的时候,主机与服务将根据这个最小值去通知当主机和服务问题时。

  • The host obess_over_host and the service obsess_over_service attributes can now both use the shortened attribute obsess.

  • 主机obess_over_host 和服务 obsess_over_service的属性由obsess属性替代

Object Behavior:

对象行为:

  • Contact Inheritance - According to the documentation, contacts should only be inherited from host to service if the service has no other contacts whatsoever (and the same goes for escalations), but the way the code previously worked was that it handled contact_groups and contacts directives separately, meaning services with only 'contacts' specified were still eligible for inheriting 'contact_groups' from the host. This has been updated to comply with the documentation.

  • Timeperiods - There were several issues processing timeperiods when both exclusions and exceptions were involved. The issues have been corrected.

Configuration: The following changes have been made to the main Nagios Core configuration, nagios.cfg:

  • Because there are many ways to obtain object information, the object information is no longer stored if in the object cache if the configuration variable object_cache_file equals '/dev/null'. Setting the variable to '/dev/null' will reduce the disk I/O load.

  • Because there are many ways to obtain status information, the status information is no longer stored if in the status data file if the configuration variable status_file equals '/dev/null'. Setting the variable to '/dev/null' will reduce the disk I/O load.

  • There is a new configuration variable, log_current_states, which determines whether current states will be logged in the log files when they are rotated. In Nagios Core 3, this was always the behavior and it is the default in Nagios Core 4. Disabling the logging of current states on log rotation can save considerable disk space for large installations.

  • There is a new configuration variable, check_workers, which specifies how many worker processes are created when Nagios Core starts. If not specified, the number of worker process is determine by the number of CPUs on the system.

  • There is a new configuration variable, query_socket, which specifies the location of the query handler socket. The default location is /usr/local/nagios/var/rw/nagios.qh.

  • The configuration variables, check_result_reaper_frequency and max_check_result_reaper_time, have been deprecated. Because of the new worker architecture, checks are no longer reaped, but they are fed back to core by the worker processes. As a result, these variables no longer make sense.

  • All file and directory configuration variables in the main nagios.cfg can now use paths that are relative to the location of nagios.cfg.

  • Although rarely used in the past, creating nagios objects in the main nagios.cfg configuration file was allowed. This is now prohibited.

Macros:

  • Additions - A new macro, $CHECKSOURCE$, has been added which contains information about what process performed a check.

  • Changes - If use_large_installation_tweaks is set, the $HOSTGROUPMEMBERS$ and $SERVICEGROUPMEMBERS$ macros are no longer exported because they can consume the available space for environment variables.

  • Macros are normally available as environment variables when check, event handler, notification, and other commands are run. This can be rather CPU intensive in large Nagios installations, so you can disable the export of environment variables completely with the enable_environment_macros option.

  • Macro information can be found here.

Query Handler: The query handler is a general purpose communication mechanism that allows external entities to communicate with Nagios Core in a well-defined manner. As of this writing, all communication with the query handler takes place through a Unix-domain socket whose location is defined by the query_socket configuration variable. There are currently 5 built-in query handlers. More information about the query handler interface, including an introduction to creating a custom query handler, can be found in the source-supplied documentation.

  • core - provides Nagios Core management and information

  • wproc - provides worker process registration, management and information

  • nerd - provides a subscription service to the Nagios Event Radio Dispatcher (NERD)

  • help - provides help for the query handler

  • echo - implements a basic query handler that simply echoes back the queries sent to it

Core Workers: Previously, all host and service checks were performed by the full Nagios Core process. This required forking the Nagios Core process for every check. The full Nagios Core process includes a lot of things that are not required to actually perform the check, including check scheduling, downtime handling, processing external commands, etc. As a result, forking the Nagios Core process was much slower than was necessary. When the actual check was run, the forked process again forked a shell to run the check and the shell forked to run the plugin. In addition, disk files were used as the inter-process communication (IPC) mechanism between the forked Nagios process doing the checking and the main Nagios process handling the check results. In Nagios Core 4, the process of performing host and service checks is now accomplished using a lightweight worker processes. Standard worker processes start up with the main Nagios Core process and additional, special-purpose workers, can be started at any time after Nagios Core starts. If the check command is "simple" (no shell escapes), the worker process can run the command directly, avoiding the 2 additional forks previously required. Also in Nagios Core 4, the worker processes report the check results to the main Nagios Core process using in-memory IPC mechanisms (the query handler interface), eliminating the disk I/O bottleneck that used to be an issue in large installations. When a worker process registers with the main Nagios Core process, it tells Nagios Core what checks it will handle. This feature allows external authors to create special-purpose workers which are optimized to perform certain checks. A sample special-purpose ping check worker is included with the Nagios Core source code in the worker/ping subdirectory. More information about workers, including an introduction to creating custom workers can be found in the source-supplied documentation. Nagios Event Radio Dispatcher (NERD): The Nagios Event Radio Dispatcher (NERD) is a query handler based service that streams Nagios Core events to the subscriber. Currently, there are three channels that can be subscribed to: hostchecks, servicechecks and opathchecks. libnagios: libnagios is a library of functions that can be used by developers of query handlers and worker processes. libnagios currently contains the following components.

  • bitmap - bitmap library for calculating dependency graphs

  • dkhash - dual-keyed hash api

  • fanout - sparsely populated array used for downtime, comments, and worker jobs

  • iobroker - I/O broker library for multiplexing between running tasks and the master nagios process.

  • iocache - I/O caching libary for bulk-reading requests and parsing them

  • kvvec - key/value library for parsing requests and building responses

  • nsock - socket library for connecting to and communicating through the qh socket

  • nspath - general purpose path library for converting between relative and absolute paths

  • nsutils - small library with worker related utilities

  • pqueue - pqueue library written by Volkan Yazici

  • runcmd - for spawning and reaping commands

  • skiplist - skiplist library used within Nagios Core

  • squeue - for maintaining a queue of the running job's timeouts

  • worker - for utils and stuff nifty to have if you're a worker

Documentation: Documentation of Nagios Core internals is now provided as part of the source distribution. To create an HTML version of this documentation run 'make dox' from the root of the source distribution tree. The doxygen utilities must be installed to make this documentation. Tests: A much more complete test suite is now incuded with the Nagios Core source distribution. RPM Spec File: The RPM spec file has been completely overhauled to support more current standards. Deprecated Features: Extended Host and Service Information - The hostextinfo and serviceextinfo objects are now deprecated and should not be used. Support for them will be removed in a future version. The same information specified in the hostextinfo and serviceextinfo objects can be specified in the host and service object respectively. -x/--dont-verify-paths command line option (Don't check for circular object paths) - Because configuration checking is now so much faster, the option to skip checking for circular object paths has been deprecated. The following configuration variables have been deprecated: check_result_reaper_frequency, max_check_result_reaper_time, sleep_time, external_command_buffer_slots, command_check_intervalObsoleted Features:

  • Failure Prediction - As noted above, the failure_prediction_enabled flag has been removed from both host and service object definitions. Failure predition was never fully implemented and would require breaking the paradigm that Nagios Core knows nothing about the performance data returned by plugins. Failure prediction is much more approprately handled by an add-on than by Nagios Core.

  • -o/--dont-verify-objects command line option - This option, while accepted in Nagios Core 3, has neither been advertized nor has had any effect for quite some time. The option has been removed in Nagios Core 4.

  • Embedded Perl - Embedded Perl has historically been the least tested and the most problem prone part of Nagios Core. A significant part of the issue is that there are so many versions of Perl available. The performance enhancements provided by the new worker process architecture make up for any performance loss due to the removal of embeddd Perl. In addition, the worker process architecture makes possible the implementation of a special purpose worker to persistently load and run Perl plugins. The following configuration variables that were related to embedded Perl have been obsoleted: use_embedded_perl_implicitly, enable_embedded_perl, p1_file.

Miscellaneous:

  • Object IDs - Primarily only of interest to developers, all of the first-class objects now have object IDs.  First-class objects are timeperiod, command, contact, host, service, escalations, dependencies and all kinds of groups. Object IDs are not persistent and are recreated on each restart.