Prometheus(由go语言(golang)开发)是一套开源的监控&报警&时间序列数 据库的组合。适合监控docker容器。因为kubernetes(俗称k8s)的流行带动 了prometheus的发展。
普罗米修斯官网
时间序列数据(TimeSeries Data) : 按照时间顺序记录系统、设备状态变化的数据被称为时序数据。
应用的场景:
多维度数据模型灵活的查询语言不依赖分布式存储,单个服务器节点是自主的 以HTTP方式,通过pull模型拉去时间序列数据也可以通过中间网关支持push模型通过服务发现或者静态配置,来发现目标服务对象支持多种多样的图表和界面展示。
Prometheus 直接或通过中介推送网关从检测的作业中抓取指标,用于短期作业。它将所有抓取的样本存储在本地,并对这些数据运行规则,以从现有数据聚合和记录新的时间序列或生成警报。Grafana或其他 API 使用者可用于可视化收集的数据。
Prometheus 适用于记录任何纯数字时间序列。它既适合以机器为中心的监控,也适合监控高度动态的面向服务的架构。在微服务的世界中,它对多维数据收集和查询的支持是一个特殊的优势。
Prometheus 专为可靠性而设计,成为您在停电期间访问的系统,让您能够快速诊断问题。每个 Prometheus 服务器都是独立的,不依赖于网络存储或其他远程服务。当基础架构的其他部分损坏时,您可以依赖它,并且您无需设置大量基础架构即可使用它。
Prometheus 重视可靠性。即使在出现故障的情况下,您也可以随时查看有关系统的可用统计信息。如果您需要 100% 的准确性,例如按请求计费,Prometheus 不是一个好的选择,因为收集的数据可能不够详细和完整。在这种情况下,您最好使用其他系统来收集和分析计费数据,并使用 Prometheus 进行其余的监控。
环境需求
系统 | 主机名 | IP | 所需服务 |
---|---|---|---|
Centos8 | server | 192.168.249.141 | prometheus-2.28.0 grafana |
Centos8 | agent | 192.168.249.145 | node_exporter-1.1.2 |
Centos8 | haproxy | 192.168.249.146 |
相关软件下载地址
在server主机上部署普罗米修斯
//下载安装包
[root@server ~]# ls
anaconda-ks.cfg prometheus-2.28.0.linux-amd64.tar.gz
//解压
[root@server ~]# tar xf prometheus-2.28.0.linux-amd64.tar.gz
[root@server ~]# ls
anaconda-ks.cfg prometheus-2.28.0.linux-amd64 prometheus-2.28.0.linux-amd64.tar.gz
[root@server ~]# mv prometheus-2.28.0.linux-amd64 /usr/local/prometheus
[root@server ~]# useradd -r -M -s /sbin/nologin prometheus
[root@server ~]# ls /usr/local/
bin etc games include lib lib64 libexec prometheus sbin share src
[root@server ~]# chown -R prometheus.prometheus /usr/local/prometheus/
//查看主程序的帮助文档,怎么启动主程序
[root@server ~]# cd /usr/local/prometheus/
[root@server prometheus]# ls
console_libraries consoles LICENSE NOTICE prometheus prometheus.yml promtool
[root@server prometheus]# ./prometheus --help
usage: prometheus [<flags>]
The Prometheus monitoring server
Flags:
-h, --help Show context-sensitive help (also try --help-long and --help-man).
--version Show application version.
--config.file="prometheus.yml" #这个就是启动方法
Prometheus configuration file path.
--web.listen-address="0.0.0.0:9090"
Address to listen on for UI, API, and telemetry.
--web.config.file="" [EXPERIMENTAL] Path to configuration file that can enable TLS or
authentication.
--web.read-timeout=5m Maximum duration before timing out read of the request, and closing idle
connections.
--web.max-connections=512 Maximum number of simultaneous connections.
--web.external-url=<URL> The URL under which Prometheus is externally reachable (for example, if
Prometheus is served via a reverse proxy). Used for generating relative and
absolute links back to Prometheus itself. If the URL has a path portion, it
will be used to prefix all HTTP endpoints served by Prometheus. If omitted,
relevant URL components will be derived automatically.
--web.route-prefix=<path> Prefix for the internal routes of web endpoints. Defaults to path of
--web.external-url.
--web.user-assets=<path> Path to static asset directory, available at /user.
--web.enable-lifecycle Enable shutdown and reload via HTTP request.
--web.enable-admin-api Enable API endpoints for admin control actions.
--web.console.templates="consoles"
Path to the console template directory, available at /consoles.
--web.console.libraries="console_libraries"
Path to the console library directory.
--web.page-title="Prometheus Time Series Collection and Processing Server"
Document title of Prometheus instance.
--web.cors.origin=".*" Regex for CORS origin. It is fully anchored. Example:
'https?://(domain1|domain2)\.com'
--storage.tsdb.path="data/"
Base path for metrics storage.
--storage.tsdb.retention=STORAGE.TSDB.RETENTION
[DEPRECATED] How long to retain samples in storage. This flag has been
deprecated, use "storage.tsdb.retention.time" instead.
--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME
How long to retain samples in storage. When this flag is set it overrides
"storage.tsdb.retention". If neither this flag nor "storage.tsdb.retention" nor
"storage.tsdb.retention.size" is set, the retention time defaults to 15d. Units
Supported: y, w, d, h, m, s, ms.
--storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE
[EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. A unit is
required, supported units: B, KB, MB, GB, TB, PB, EB. Ex: "512MB". This flag is
experimental and can be changed in future releases.
--storage.tsdb.no-lockfile
Do not create lockfile in data directory.
--storage.tsdb.allow-overlapping-blocks
[EXPERIMENTAL] Allow overlapping blocks, which in turn enables vertical
compaction and vertical query merge.
--storage.tsdb.wal-compression
Compress the tsdb WAL.
--storage.remote.flush-deadline=<duration>
How long to wait flushing sample on shutdown or config reload.
--storage.remote.read-sample-limit=5e7
Maximum overall number of samples to return via the remote read interface, in a
single query. 0 means no limit. This limit is ignored for streamed response
types.
--storage.remote.read-concurrent-limit=10
Maximum number of concurrent remote read calls. 0 means no limit.
--storage.remote.read-max-bytes-in-frame=1048576
Maximum number of bytes in a single frame for streaming remote read response
types before marshalling. Note that client might have limit on frame size as
well. 1MB as recommended by protobuf by default.
--storage.exemplars.exemplars-limit=100000
[EXPERIMENTAL] Maximum number of exemplars to store in in-memory exemplar
storage total. 0 disables the exemplar storage. This flag is effective only
with --enable-feature=exemplar-storage.
--rules.alert.for-outage-tolerance=1h
Max time to tolerate prometheus outage for restoring "for" state of alert.
--rules.alert.for-grace-period=10m
Minimum duration between alert and restored "for" state. This is maintained
only for alerts with configured "for" time greater than grace period.
--rules.alert.resend-delay=1m
Minimum amount of time to wait before resending an alert to Alertmanager.
--alertmanager.notification-queue-capacity=10000
The capacity of the queue for pending Alertmanager notifications.
--query.lookback-delta=5m The maximum lookback duration for retrieving metrics during expression
evaluations and federation.
--query.timeout=2m Maximum time a query may take before being aborted.
--query.max-concurrency=20
Maximum number of queries executed concurrently.
--query.max-samples=50000000
Maximum number of samples a single query can load into memory. Note that
queries will fail if they try to load more samples than this into memory, so
this also limits the number of samples a query can return.
--enable-feature= ... Comma separated feature names to enable. Valid options: promql-at-modifier,
promql-negative-offset, remote-write-receiver, exemplar-storage,
expand-external-labels. See
https://prometheus.io/docs/prometheus/latest/disabled_features/ for more
details.
--log.level=info Only log messages with the given severity or above. One of: [debug, info, warn,
error]
--log.format=logfmt Output format of log messages. One of: [logfmt, json]
//直接启动
[root@server prometheus]# ./prometheus --config.file="prometheus.yml"
[root@server prometheus]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 *:9090 *:*
LISTEN 0 128 [::]:22 [::]:*
//手动启动很麻烦,可以编写一个service文件,添加到systemd下面来管理
[root@server ~]# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
User=prometheus
Restart=on-failure
WorkingDirectory=/usr/local/prometheus/
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target
//现在就可以通过systemctl命令来启动了,并可以设置开机自启
[root@server ~]# systemctl daemon-reload
[root@server ~]# systemctl start prometheus
[root@server ~]# systemctl --now enable prometheus
Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service.
[root@server ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 *:9090 *:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 128 *:3000 *:*
关闭防火墙和selinux
[root@server ~]# systemctl stop firewalld
[root@server ~]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@server ~]# setenforce 0
[root@server ~]# vim /etc/selinux/config
selinux=disabled
在要监控的主机上安装node_exporter-1.1.2.linux-amd64.tar.gz组件,可去官网下载
//解压
[root@agent ~]# useradd -r -M -s /sbin/nologin prometheus
[root@agent ~]# ls
anaconda-ks.cfg node_exporter-1.1.2.linux-amd64.tar.gz
[root@agent ~]# tar xf node_exporter-1.1.2.linux-amd64.tar.gz
[root@agent ~]# ls
anaconda-ks.cfg node_exporter-1.1.2.linux-amd64 node_exporter-1.1.2.linux-amd64.tar.gz
[root@agent ~]# mv node_exporter-1.1.2.linux-amd64 /usr/local/node_exporter
[root@agent ~]# chown -R prometheus.prometheus /usr/local/node_exporter/
//启动
[root@agent node_exporter]# nohup /usr/local/node_exporter/node_exporter &
[1] 10337
[root@agent node_exporter]# nohup: 忽略输入并把输出追加到'nohup.out'
[root@agent node_exporter]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 *:9100 *:*
LISTEN 0 128 [::]:22 [::]:*
//也可以编写service文件设置开机自启
[root@agent ~]# vim /etc/systemd/system/node_exporter.service
[Unit]
Description=node_exporter Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
User=prometheus
Restart=on-failure
WorkingDirectory=/usr/local/node_exporter/
ExecStart=/usr/local/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
[root@agent ~]# systemctl daemon-reload
//设置开机自启
[root@agent ~]# systemctl start node_exporter
[root@agent ~]# systemctl enable node_exporter
[root@lserver ~]# cd /usr/local/prometheus/
[root@server prometheus]# ls
console_libraries consoles data LICENSE NOTICE prometheus prometheus.yml promtool
[root@server prometheus]# vim prometheus.yml
static_configs:
- targets: ['localhost:9090','192.168.249.145:9100'] #加入要监控的主机
//重启
[root@server prometheus]# pkill prometheus
[root@server prometheus]# systemctl start prometheus
//下载相关模块,并解压
[root@haproxy ~]# wget https://github.com/prometheus/haproxy_exporter/releases/download/v0.12.0/haproxy_exporter-0.12.0.linux-amd64.tar.gz
[root@haproxy ~]# ls
anaconda-ks.cfg haproxy_exporter-0.12.0.linux-amd64.tar.gz
[root@haproxy ~]# tar xf haproxy_exporter-0.12.0.linux-amd64.tar.gz
[root@localhost ~]# ls
anaconda-ks.cfg haproxy_exporter-0.12.0.linux-amd64 haproxy_exporter-0.12.0.linux-amd64.tar.gz
[root@haproxy ~]# mv haproxy_exporter-0.12.0.linux-amd64 /usr/local/haproxy_exporter
[root@haproxy ~]# useradd -r -M -s /sbin/nologin prometheus
[root@haproxy ~]# chown -R prometheus.prometheus /usr/local/haproxy_exporter/
//编写service文件
[root@haproxy ~]# vim /etc/systemd/system/haproxy_exporter.service
[Unit]
Description=haproxy_exporter Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
User=prometheus
Restart=on-failure
WorkingDirectory=/usr/local/haproxy_exporter/
ExecStart=/usr/local/haproxy_exporter/haproxy_exporter
[Install]
WantedBy=multi-user.target
//设置开机自启
[root@haproxy ~]# systemctl daemon-reload
[root@haproxy ~]# systemctl start haproxy_exporter
[root@haproxy ~]# systemctl --now enable haproxy_exporter
Created symlink /etc/systemd/system/multi-user.target.wants/haproxy_exporter.service → /etc/systemd/system/haproxy_exporter.service.
[root@haproxy ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 128 *:9101 *:*
web界面可以查看监控的数据
prometheus主机界面可以看到新增了一台主机
//去官网下载软件包
[root@server ~]# ls
anaconda-ks.cfg grafana-7.5.6-1.x86_64.rpm prometheus-2.28.0.linux-amd64.tar.gz
[root@server ~]# dnf -y install grafana-7.5.6-1.x86_64.rpm
//开启
[root@server ~]# systemctl start grafana-server
[root@server ~]# systemctl enable grafana-server
[root@server ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 128 *:3000 *:*
LISTEN 0 128 *:9090 *:*
web界面访问,第一次登录默认用户名是admin,密码admin。然后需要设置新密码。