初探zabbix_agent2 plugin

概述

  • zabbix_agent2作为可以完全替代zabbix_agent功能的客户端,较以往的功能非常强大。
  • 采用go语言进行编写,插件化方式对监控的能力进行管理。
  • 一栈式代理能力,官方提供的5.2版本已经具有很强的监控能力

zabbix_agent2指标

  • 在代理运行的情况下,我们可以执行zabbix_agent2 -R metrics 获取当前代理所支持的指标,以及指标的运行情况
[Agent]
active: true
capacity: 0/100
tasks: 0
agent.hostname: Returns Hostname from agent configuration.
agent.ping: Returns agent availability check result.
agent.version: Version of Zabbix agent.

[Ceph]
active: false
capacity: 0/100
tasks: 0
ceph.df.details: Returns information about cluster’s data usage and distribution among pools.
ceph.osd.discovery: Returns a list of discovered OSDs.
ceph.osd.dump: Returns usage thresholds and statuses of OSDs.
ceph.osd.stats: Returns aggregated and per OSD statistics.
ceph.ping: Tests if a connection is alive or not.
ceph.pool.discovery: Returns a list of discovered pools.
ceph.status: Returns an overall cluster's status.

[Cpu]
active: true
capacity: 0/100
tasks: 12
system.cpu.discovery: List of detected CPUs/CPU cores, used for low-level discovery.
system.cpu.num: Number of CPUs.
system.cpu.util: CPU utilisation percentage.

[Docker]
active: false
capacity: 0/100
tasks: 0
docker.container_info: Return low-level information about a container.
docker.container_stats: Returns near realtime stats for a given container.
docker.containers: Returns a list of containers.
docker.containers.discovery: Returns a list of containers, used for low-level discovery.
docker.data_usage: Returns information about current data usage.
docker.images: Returns a list of images.
docker.images.discovery: Returns a list of images, used for low-level discovery.
docker.info: Returns information about the docker server.
docker.ping: Pings the server and returns 0 or 1.

[File]
active: true
capacity: 0/100
tasks: 3
vfs.file.cksum: Returns File checksum, calculated by the UNIX cksum algorithm.
vfs.file.contents: Retrieves contents of the file.
vfs.file.exists: Returns if file exists or not.
vfs.file.md5sum: Returns MD5 checksum of file.
vfs.file.regexp: Find string in a file.
vfs.file.regmatch: Find string in a file.
vfs.file.size: Returns file size.
vfs.file.time: Returns file time information.

[Kernel]
active: true
capacity: 0/100
tasks: 2
kernel.maxfiles: Returns maximum number of opened files supported by OS.
kernel.maxproc: Returns maximum number of processes supported by OS.

[Log]
active: false
capacity: 0/100
tasks: 0
log: Log file monitoring.
log.count: Count of matched lines in log file monitoring.
logrt: Log file monitoring with log rotation support.
logrt.count: Count of matched lines in log file monitoring with log rotation support.

[MQTT]
active: false
capacity: 0/100
tasks: 0
mqtt.get: Subscribe to MQTT topics for published messages.

[Memcached]
active: false
capacity: 0/100
tasks: 0
memcached.ping: Test if connection is alive or not.
memcached.stats: Returns output of stats command.

[Memory]
active: true
capacity: 0/100
tasks: 3
vm.memory.size: Returns memory size in bytes or in percentage from total.

[Modbus]
active: false
capacity: 0/100
tasks: 0
modbus.get: Returns a JSON array of the requested values, usage: modbus.get[endpoint,<slave id>,<function>,<address>,<count>,<type>,<endianness>,<offset>].

[Mongo]
active: false
capacity: 0/100
tasks: 0
mongodb.cfg.discovery: Returns a list of discovered config servers.
mongodb.collection.stats: Returns a variety of storage statistics for a given collection.
mongodb.collections.discovery: Returns a list of discovered collections.
mongodb.collections.usage: Returns usage statistics for collections.
mongodb.connpool.stats: Returns information regarding the open outgoing connections from the current database instance to other members of the sharded cluster or replica set.
mongodb.db.discovery: Returns a list of discovered databases.
mongodb.db.stats: Returns statistics reflecting a given database system’s state.
mongodb.jumbo_chunks.count: Returns count of jumbo chunks.
mongodb.oplog.stats: Returns a status of the replica set, using data polled from the oplog.
mongodb.ping: Test if connection is alive or not.
mongodb.rs.config: Returns a current configuration of the replica set.
mongodb.rs.status: Returns a replica set status from the point of view of the member where the method is run.
mongodb.server.status: Returns a database’s state.
mongodb.sh.discovery: Returns a list of discovered shards present in the cluster.

[Mysql]
active: false
capacity: 0/100
tasks: 0
mysql.db.discovery: Returns list of databases in LLD format.
mysql.db.size: Returns size of given database in bytes.
mysql.get_status_variables: Returns values of global status variables.
mysql.ping: Tests if connection is alive or not.
mysql.replication.discovery: Returns replication information in LLD format.
mysql.replication.get_slave_status: Returns replication status.
mysql.version: Returns MySQL version.

[NetIf]
active: true
capacity: 0/100
tasks: 7
net.if.collisions: Returns number of out-of-window collisions.
net.if.discovery: Returns list of network interfaces. Used for low-level discovery.
net.if.in: Returns incoming traffic statistics on network interface.
net.if.out: Returns outgoing traffic statistics on network interface.
net.if.total: Returns sum of incoming and outgoing traffic statistics on network interface.

[Oracle]
active: false
capacity: 0/100
tasks: 0
oracle.archive.discovery: Returns list of archive logs in LLD format.
oracle.archive.info: Returns archive logs statistics.
oracle.cdb.info: Returns CDBs info.
oracle.custom.query: Returns result of a custom query.
oracle.datafiles.stats: Returns data files statistics.
oracle.db.discovery: Returns list of databases in LLD format.
oracle.diskgroups.discovery: Returns list of ASM disk groups in LLD format.
oracle.diskgroups.stats: Returns ASM disk groups statistics.
oracle.fra.stats: Returns FRA statistics.
oracle.instance.info: Returns instance stats.
oracle.pdb.discovery: Returns list of PDBs in LLD format.
oracle.pdb.info: Returns PDBs info.
oracle.pga.stats: Returns PGA statistics.
oracle.ping: Tests if connection is alive or not.
oracle.proc.stats: Returns processes statistics.
oracle.redolog.info: Returns log file information from the control file.
oracle.sessions.stats: Returns sessions statistics.
oracle.sga.stats: Returns SGA statistics.
oracle.sys.metrics: Returns a set of system metric values.
oracle.sys.params: Returns a set of system parameter values.
oracle.ts.discovery: Returns list of tablespaces in LLD format.
oracle.ts.stats: Returns tablespaces statistics.
oracle.user.info: Returns user information.

[Postgres]
active: false
capacity: 0/100
tasks: 0
pgsql.archive: Returns info about size of archive files.
pgsql.autovacuum.count: Returns count of autovacuum workers.
pgsql.bgwriter: Returns JSON for sum of each type of bgwriter statistic.
pgsql.cache.hit: Returns cache hit percent.
pgsql.connections: Returns JSON for sum of each type of connection.
pgsql.custom.query: Returns result of a custom query.
pgsql.db.age: Returns age for specific database.
pgsql.db.bloating_tables: Returns percent of bloating tables for each database.
pgsql.db.discovery: Returns JSON discovery rule with names of databases.
pgsql.db.size: Returns size in bytes for specific database.
pgsql.dbstat: Returns JSON for sum of each type of statistic.
pgsql.dbstat.sum: Returns JSON for sum of each type of statistic for all database.
pgsql.locks: Returns collect all metrics from pg_locks.
pgsql.oldest.xid: Returns age of oldest xid.
pgsql.ping: Tests if connection is alive or not.
pgsql.replication.count: Returns number of standby servers.
pgsql.replication.lag.b: Returns replication lag with Master in byte.
pgsql.replication.lag.sec: Returns replication lag with Master in seconds.
pgsql.replication.process: Returns flush lag, write lag and replay lag per each sender process.
pgsql.replication.process.discovery: Returns JSON with application name from pg_stat_replication.
pgsql.replication.recovery_role: Returns postgreSQL recovery role.
pgsql.replication.status: Returns postgreSQL replication status.
pgsql.uptime: Returns uptime.
pgsql.wal.stat: Returns JSON wal by type.

[Proc]
active: false
capacity: 0/100
tasks: 0
proc.cpu.util: Process CPU utilization percentage.

[ProcExporter]
active: false
capacity: 0/100
tasks: 0
proc.mem: Process memory utilization values.

[Redis]
active: false
capacity: 0/100
tasks: 0
redis.config: Returns configuration parameters of Redis server.
redis.info: Returns output of INFO command.
redis.ping: Test if connection is alive or not.
redis.slowlog.count: Returns the number of slow log entries since Redis has been started.

[Smart]
active: false
capacity: 0/100
tasks: 0
smart.attribute.discovery: Returns JSON array of smart device attributes.
smart.disk.discovery: Returns JSON array of smart devices.
smart.disk.get: Returns JSON data of smart device.

[Sw]
active: true
capacity: 0/100
tasks: 1
system.sw.packages: Lists installed packages whose name matches the given package regular expression.

[Swap]
active: true
capacity: 0/100
tasks: 3
system.swap.size: Returns Swap space size in bytes or in percentage from total.

[SystemRun]
active: false
capacity: 0/100
tasks: 0
system.run: Run specified command.

[Systemd]
active: false
capacity: 0/100
tasks: 0
systemd.unit.discovery: Returns JSON array of discovered units, usage: systemd.unit.discovery[<type>].
systemd.unit.get: Returns the bulked info, usage: systemd.unit.get[unit,<interface>].
systemd.unit.info: Returns the unit info, usage: systemd.unit.info[unit,<parameter>,<interface>].

[TCP]
active: false
capacity: 0/100
tasks: 0
net.tcp.port: Checks if it is possible to make TCP connection to specified port.
net.tcp.service: Checks if service is running and accepting TCP connections.
net.tcp.service.perf: Checks performance of TCP service.

[UDP]
active: false
capacity: 0/100
tasks: 0
net.udp.service: Checks if service is running and responding to UDP requests.
net.udp.service.perf: Checks performance of UDP service.

[Uname]
active: true
capacity: 0/100
tasks: 3
system.hostname: Returns system host name.
system.sw.arch: Software architecture information.
system.uname: Returns system uname.

[Uptime]
active: true
capacity: 0/100
tasks: 1
system.uptime: Returns system uptime in seconds.

[Users]
active: true
capacity: 0/100
tasks: 1
system.users.num: Returns number of useres logged in.

[VFSDev]
active: true
capacity: 0/100
tasks: 2
vfs.dev.discovery: List of block devices and their type. Used for low-level discovery.
vfs.dev.read: Disk read statistics.
vfs.dev.write: Disk write statistics.

[VfsFs]
active: true
capacity: 0/100
tasks: 13
vfs.fs.discovery: List of mounted filesystems. Used for low-level discovery.
vfs.fs.get: List of mounted filesystems with statistics.
vfs.fs.inode: Disk space in bytes or in percentage from total.
vfs.fs.size: Disk space in bytes or in percentage from total.

[Web]
active: false
capacity: 0/100
tasks: 0
web.page.get: Get content of a web page.
web.page.perf: Loading time of full web page (in seconds).
web.page.regexp: Find string on a web page.

[ZabbixAsync]
active: true
capacity: 0/100
tasks: 7
net.tcp.listen: Checks if this TCP port is in LISTEN state.
net.udp.listen: Checks if this UDP port is in LISTEN state.
sensor: Hardware sensor reading.
system.boottime: Returns system boot time.
system.cpu.intr: Device interrupts.
system.cpu.load: CPU load.
system.cpu.switches: Count of context switches.
system.hw.cpu: CPU information.
system.hw.macaddr: Listing of MAC addresses.
system.localtime: Returns system local time.
system.sw.os: Operating system information.
system.swap.in: Swap in (from device into memory) statistics.
system.swap.out: Swap out (from memory onto device) statistics.

[ZabbixStats]
active: false
capacity: 0/100
tasks: 0
zabbix.stats: Return a set of Zabbix server or proxy internal metrics or return number of monitored items in the queue which are delayed on Zabbix server or proxy.

[ZabbixSync]
active: true
capacity: 0/1
tasks: 2
net.dns: Checks if DNS service is up.
net.dns.record: Performs DNS query.
proc.num: The number of processes.
system.hw.chassis: Chassis information.
system.hw.devices: Listing of PCI or USB devices.
vfs.dir.count: Directory entry count.
vfs.dir.size: Directory size (in bytes).
  • 从以上可以看到,按照组,zabbix_agent2代理已经支持很多种类型的软件,并且这些以插件的形式进行管理,在未启用的情况下,处于未激活状态,并不消耗资源,只有关联到的模板监控项时,才会进行启用。
  • 预计未来zabbix_agent2的插件会整合更多的软件监控解决方案

打造一个简易插件

  • 通过官方文档,我们可以了解到如何去打造一个属于自己的一个插件。相关文档资料链接https://www.zabbix.com/documentation/current/manual/config/items/plugins
    官方文档中的一个示例有明显错误,返回值不满足函数定义
    初探zabbix_agent2 plugin_第1张图片
  • 一个插件至少要继承一个或多个插件接口(Exporter, Collector, Runner, Watcher),我们选择最简单的一种方式,Exporter
  • 根据官方文档,我们需要定义一个Plugin结构体,包含plugin.Base,并且实现Export方法,代码如下:
type Plugin struct {
	plugin.Base
}

var impl Plugin

func (p *Plugin) Export(key string, params []string, ctx plugin.ContextProvider) (result interface{}, err error) {
	switch key {
	case "system.mytime":
		if len(params) > 0 {
			p.Debugf("received %d parameters while expected none", len(params))
			return nil, errors.New("Too many parameters")
		}
		return time.Now().Format(time.RFC3339), nil
	case "system.echo":
		return params[0], nil
	default:
		return nil, plugin.UnsupportedMetricError
	}
}

这里解释一下,这里的key就是监控项,params就是监控项所允许的参数内容,如我上述代码,当监控项为"system.mytime"时,如果带有参数,则会报错。当监控项为"system.echo"时,则会回显出第一个参数值出来,这里并没有对参数内容做判断,当参数长度为0时,会引发错误,所以在实际项目时,建议完善参数配置,并给出配置文档。

  • 注册指标,通过包名的init函数,自动初始化注册指标,代码如下
func init() {
	plugin.RegisterMetrics(&impl, "myTime",
		"system.mytime", "Returns time string in RFC 3999 format.",
		"system.echo","Echo what you type in!")
}

impl为包里面定义的一个插件变量,myTime是属性组,注册指标时,需要注意要把监控项和描述一对一对的注册,否则会引发运行时错误。
另外描述必须以大写字母开头,以英文句点结束,否则将引发运行时异常,无法启动代理,将出现如下报错。

panic: cannot register metric "system.echo" without dot at the end of description: "Just for test!"
panic: cannot register metric "system.echo" with description without capital first letter: "just for test!"
  • 添加插件
    找到插件添加文件plugins_linux.go
    添加我们自己新建的插件目录,即可完成插件的添加
_ "zabbix.com/plugins/yeqing/mydemo"

编译安装

  • 整个编译过程跟官方使用源码方式部署一致,这里不再赘述。

验证

  • 我们可以通过zabbix_agent2 -R metrics 来检查是否已经包含了我们的插件
[myTime]
active: true
capacity: 0/100
tasks: 1
system.echo: Just for test.
system.mytime: Returns time string in RFC 3999 format.

我们可以看到已经有了自己编写的插件了

  • 使用监控项验证
./zabbix_agent2 -t system.echo[xx,dd,dd]
system.echo[xx,dd,dd]                         [s|xx]

可以看到,已经按照我们的要求,返回对应的内容了

  • 监控项验证
    在zabbix中增加一个监控项,从zabbix前端查看数据
    在这里插入图片描述

总结

  • zabbix_agent2确实很强大,通过go语言,很轻松就可以实现插件的编写,使监控更加灵活,强大。

你可能感兴趣的:(IT基础监控,go语言,linux,zabbix,运维)