监控系列讲座(十九)node_exporter详解

1. 概述

node_exporter是我们最常用的exporter之一,我们可以把他认为是一个agent,需要被安装在操作系统之上,然后才能采集到系统的数据。采集Linux类的exporter被称为node_exporter,如果我们使用sidecar的方式来运行的话,他就可以采集pod的信息。

我们在前面的指标采集一章中,已经使用node_exporter做了一个demo,他的采集的指标比zabbix多了好多。其实,这只是他的冰山一角,因为我们配置exporter的时候,使用的是默认启动的方式,他还有很多开关,需要添加参数才能生效。同时,他还可以指定那类指标不收集,以此来节省我们的资源消耗。

2. 详解node_exporter

node_exporter是prometheus官方提供的agent,项目被托管在prometheus的账号之下。我们也可以通过官方提供的链接下载最新的版本。在官方的github里面已经提供了非常详细使用说明,我们这里带大家梳理一下

2.1. 默认启用的参数

官方地址,这些参数就是我们前面做实验的时候默认收集的指标

Name Description OS
arp Exposes ARP statistics from /proc/net/arp. Linux
bcache Exposes bcache statistics from /sys/fs/bcache/. Linux
bonding Exposes the number of configured and active slaves of Linux bonding interfaces. Linux
boottime Exposes system boot time derived from the kern.boottime sysctl. Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
conntrack Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). Linux
cpu Exposes CPU statistics Darwin, Dragonfly, FreeBSD, Linux, Solaris
cpufreq Exposes CPU frequency statistics Linux, Solaris
diskstats Exposes disk I/O statistics. Darwin, Linux, OpenBSD
edac Exposes error detection and correction statistics. Linux
entropy Exposes available entropy. Linux
exec Exposes execution statistics. Dragonfly, FreeBSD
filefd Exposes file descriptor statistics from /proc/sys/fs/file-nr. Linux
filesystem Exposes filesystem statistics, such as disk space used. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon Expose hardware monitoring and sensor data from /sys/class/hwmon/. Linux
infiniband Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. Linux
ipvs Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. Linux
loadavg Exposes load average. Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). Linux
meminfo Exposes memory statistics. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass Exposes network interface info from /sys/class/net/ Linux
netdev Exposes network interface statistics such as bytes transferred. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. Linux
nfs Exposes NFS client statistics from /proc/net/rpc/nfs. This is the same information as nfsstat -c. Linux
nfsd Exposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same information as nfsstat -s. Linux
pressure Exposes pressure stall statistics from /proc/pressure/. Linux (kernel 4.20+ and/or CONFIG_PSI)
rapl Exposes various statistics from /sys/class/powercap. Linux
schedstat Exposes task scheduler statistics from /proc/schedstat. Linux
sockstat Exposes various statistics from /proc/net/sockstat. Linux
softnet Exposes statistics from /proc/net/softnet_stat. Linux
stat Exposes various statistics from /proc/stat. This includes boot time, forks and interrupts. Linux
textfile Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. any
thermal_zone Exposes thermal zone & cooling device statistics from /sys/class/thermal. Linux
time Exposes the current system time. any
timex Exposes selected adjtimex(2) system call stats. Linux
udp_queues Exposes UDP total lengths of the rx_queue and tx_queue from /proc/net/udp and /proc/net/udp6. Linux
uname Exposes system information as provided by the uname system call. Darwin, FreeBSD, Linux, OpenBSD
vmstat Exposes statistics from /proc/vmstat. Linux
xfs Exposes XFS runtime statistics. Linux (kernel 4.4+)
zfs Exposes ZFS performance statistics. Linux, Solaris

可以看到,每个指标都有他对应的操作系统,并不是每种系统都适用的。如果不想收集某个类型的指标,就使用--no-collector.参数,比如

./node_exporter --no-collector.time

2.2. 默认不启用的参数

默认不启用的参数需要通过--collector.参数来启用,官方提供的不启用的参数如下

Name Description OS
buddyinfo Exposes statistics of memory fragments as reported by /proc/buddyinfo. Linux
devstat Exposes device statistics Dragonfly, FreeBSD
drbd Exposes Distributed Replicated Block Device statistics (to version 8.4) Linux
interrupts Exposes detailed interrupts statistics. Linux, OpenBSD
ksmd Exposes kernel and system statistics from /sys/kernel/mm/ksm. Linux
logind Exposes session counts from logind. Linux
meminfo_numa Exposes memory statistics from /proc/meminfo_numa. Linux
mountstats Exposes filesystem statistics from /proc/self/mountstats. Exposes detailed NFS client statistics. Linux
ntp Exposes local NTP daemon health to check time any
processes Exposes aggregate process statistics from /proc. Linux
qdisc Exposes queuing discipline statistics Linux
runit Exposes service status from runit. any
supervisord Exposes service status from supervisord. any
systemd Exposes service and system status from systemd. Linux
tcpstat Exposes TCP connection status information from /proc/net/tcp and /proc/net/tcp6. (Warning: the current version has potential performance issues in high load situations.) Linux
wifi Exposes WiFi device and station statistics. Linux
perf Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings). Linux

如果大家使用的是Centos系,或者ubuntu系的,我们建议大家打开ntp,mountstats,systemd,ntp,tcpstat这几个选项

2.3. 文本收集器

使用--collector.textfile.directory选项可以把文本中的内容也用metrics的方式收集起来,但是用的不多,大家了解就好

2.4. 在prometheus中过滤collector

我们还可以在prometheus中过滤要采集的选项,在exporter的配置文件中使用下面的配置

  params:
    collect[]:
      - foo
      - bar

这个功能可以减少prometheus的压力,但是并不会减少node_exporter采集到的数据。有点鸡肋。。。既然不要这个,就不要采集了嘛。。。

2.5. TLS的配置

我们默认暴露的都是http的请求的,也没有验证,也就是明文传输,很容易存在安全隐患,如果需要对数据加密,我们就要使用参数./node_exporter --web.config=web-config.yml,这个文件的内容如下

tls_server_config:
  # Certificate and key files for server to use to authenticate to client.
  cert_file: 
  key_file: 

  # Server policy for client authentication. Maps to ClientAuth Policies.
  # For more detail on clientAuth options: [ClientAuthType](https://golang.org/pkg/crypto/tls/#ClientAuthType)
  [ client_auth_type:  | default = "NoClientCert" ]

  # CA certificate for client certificate authentication to the server.
  [ client_ca_file:  ]

  # Minimum TLS version that is acceptable.
  [ min_version:  | default = "TLS12" ]

  # Maximum TLS version that is acceptable.
  [ max_version:  | default = "TLS13" ]

  # List of supported cipher suites for TLS versions up to TLS 1.2. If empty,
  # Go default cipher suites are used. Available cipher suites are documented
  # in the go documentation:
  # https://golang.org/pkg/crypto/tls/#pkg-constants
  [ cipher_suites:
    [ -  ] ]

  # prefer_server_cipher_suites controls whether the server selects the
  # client's most preferred ciphersuite, or the server's most preferred
  # ciphersuite. If true then the server's preference, as expressed in
  # the order of elements in cipher_suites, is used.
  [ prefer_server_cipher_suites:  | default = true ]

  # Elliptic curves that will be used in an ECDHE handshake, in preference
  # order. Available curves are documented in the go documentation:
  # https://golang.org/pkg/crypto/tls/#CurveID
  [ curve_preferences:
    [ -  ] ]

http_server_config:
  # Enable HTTP/2 support. Note that HTTP/2 is only supported with TLS.
  # This can not be changed on the fly.
  [ http2:  | default = true ]

# Usernames and hashed passwords that have full access to the web
# server via basic authentication. If empty, no basic authentication is
# required. Passwords are hashed with bcrypt.
basic_auth_users:
  [ :  ... ]

这里面的密码是使用hash来加密的,我们常用的工具是htpasswd,比如,我们要加密密码Passw0rd

htpasswd -nBC 10 "" | tr -d ':\n'
New password: 
Re-type new password: 
$2y$10$6oDcZssS/R6yPy8ixVx7ue8LiPX7CHpNtXxdVGlYkNgzW3CT48TfC%

使用不交互的方式,生成文件

htpasswd -bc /application/prometheus/conf/htpasswd admin Passw0rd

在原有文件中添加一个用户

htpasswd -b /application/prometheus/conf/htpasswd admin Passw0rd

不更新密码文件,只在屏幕上输出用户名和经过加密后的密码

htpasswd -nb admin Passw0rd

2.6. systemd例子

为了便于管理,我们使用systemd来管理这个服务

cat > /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/node_exporter \
          --web.config=web-config.yml \
          --collector.ntp \
          --collector.mountstats \
          --collector.systemd \
          --collector.ntp \
          --collector.tcpstat
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
Restart=always
[Install]
WantedBy=multi-user.target
EOF

为了方便大家学习,请大家加我的微信,我会把大家加到微信群(微信群的二维码会经常变)和qq群821119334,问题答案云原生技术课堂,有问题可以一起讨论

  • 微信公众号 云原生技术课堂

  • 专题讲座

2020 CKA考试视频 真题讲解 https://www.bilibili.com/video/BV167411K7hp

2020 CKA考试指南 https://www.bilibili.com/video/BV1sa4y1479B/

2020年 5月CKA考试真题 https://mp.weixin.qq.com/s/W9V4cpYeBhodol6AYtbxIA

你可能感兴趣的:(监控系列讲座(十九)node_exporter详解)