本篇文章你可以初步体验Prometheus + Grafana,并且能够自己搭建一套服务。
Prometheus usually refers to the core binary of the Prometheus system. It may also refer to the Prometheus monitoring system as a whole.
Prometheus 二进制服务端
An exporter is a binary running alongside the application you want to obtain metrics from. The exporter exposes Prometheus metrics, commonly by converting metrics that are exposed in a non-Prometheus format into a format that Prometheus supports.
Exporters是与要获取指标的应用程序运行在一起的二进制文件。Exporter 通常通过将非Prometheus格式的指标转换为Prometheus支持的格式的指标。
Prometheus 通过 Exporters 暴露的HTTP接口获取主机和应用程序上的指标, 我们可以安装不同的Exporters扩展我们需要收集的信息。
Prometheus Server通过访问该Exporter提供的Endpoint端点,即可获取到需要采集的监控数据
Exporters 主要功能是将收集到到指标转换成 Prometheus的格式。
Kubernetes主要提供了如下5种服务发现模式和Prometheus进行集成
TODO 这里的说明我们在后续的文章当中介绍
EndPoint
(A source of metrics that can be scraped, usually corresponding to a single process. )
endpoint暴露了格式化的metrics数据给Prometheus服务器
A bridge is a component that takes samples from a client library and exposes them to a non-Prometheus monitoring system. For example, the Python, Go, and Java clients can export metrics to Graphite.
Bridge 是从客户端获取样本并将其发送给非Prometheus监视系统。例如,python、go、java客户端可以将指标导出到Graphite。
A client library is a library in some language (e.g. Go, Java, Python, Ruby) that makes it easy to directly instrument your code, write custom collectors to pull metrics from other systems and expose the metrics to Prometheus.
客户端库是某种语言(例如go、java、python、ruby)的库,可以轻松的检测代码,编写自定义收集器以从其他系统提取指标并将指标发送给Prometheus。
An instance is a label that uniquely identifies a target in a job.
实例是标识作业中目标的唯一标签。
A collection of targets with the same purpose, for example monitoring a group of like processes replicated for scalability or reliability, is called a job.
具有相同目的的目标的集合(例如,监视一组为可伸缩性或可靠性而复制的相似过程)被称为作业。
A notification represents a group of one or more alerts, and is sent by the Alertmanager to email, Pagerduty, Slack etc.
通知代表一组一个或多个警报,并由Alertmanager(警报管理器)发送到电子邮件,Pagerduty,Slack等
Promdash was a native dashboard builder for Prometheus. It has been deprecated and replaced by Grafana.
Promdash是Prometheus的内置的仪表板构建器。它已被弃用并由Grafana代替。
Prometheus usually refers to the core binary of the Prometheus system. It may also refer to the Prometheus monitoring system as a whole.
Prometheus通常是指Prometheus系统的核心二进制文件。它也可以指整个普罗米修斯监测系统。
PromQL is the Prometheus Query Language. It allows for a wide range of operations including aggregation, slicing and dicing, prediction and joins.
PromQL是Prometheus的查询语言。它允许广泛的操作,包括聚合、切片和切割、预测和连接。
The Pushgateway persists the most recent push of metrics from batch jobs. This allows Prometheus to scrape their metrics after they have terminated.
Pushgateway维持获取批处理作业指标的最新数据。Prometheus可以在指标关闭后抓取其指标。
Prometheus 可以看作一个代理。 Prometheus Pushgateway的存在是为了允许临时任务和批处理作业向Prometheus公开其指标。由于这类作业可能存在的时间不够长,无法被清除,因此可以将其指标推送到Pushgateway。然后,Pushgateway将这些指标暴露给Prometheus。
Remote read is a Prometheus feature that allows transparent reading of time series from other systems (such as long term storage) as part of queries.
远程读取是普罗米修斯的一个特性,它允许从其他系统(如长期存储)透明地读取时间序列,作为查询的一部分。
Not all systems directly support remote read. A remote read adapter sits between Prometheus and another system, converting time series requests and responses between them.
不是所有系统都直接支持远程读取。远程读取适配器位于Prometheus和其他系统之间,转换时间序列请求和响应。
A remote read endpoint is what Prometheus talks to when doing a remote read.
远程读取端点是Prometheus在进行远程读取时要与之通信的端点。
Remote write is a Prometheus feature that allows sending ingested samples on the fly to other systems, such as long term storage.
远程写是Prometheus的一项功能,它允许将收取的样本即时发送到其他系统,例如长期存储。
Not all systems directly support remote write. A remote write adapter sits between Prometheus and another system, converting the samples in the remote write into a format the other system can understand.
并非所有系统都直接支持远程写入。远程写适配器位于Prometheus和另一个系统之间,将远程写操作中的样本转换为另一个系统可以理解的格式。
A remote write endpoint is what Prometheus talks to when doing a remote write.
远程写端点是Prometheus在进行远程写操作时使用的对象。
A sample is a single value at a point in time in a time series.
In Prometheus, each sample consists of a float64 value and a millisecond-precision timestamp.
样本是时间序列中某个时间点上的单个值。
在普罗米修斯中,每个样本由一个float64值和一个毫秒级精度的时间戳组成。
A silence in the Alertmanager prevents alerts, with labels matching the silence, from being included in notifications.
Alertmanager中的“静音”功能可以防止将标签与“静音”匹配的警报包含在通知中。
A target is the definition of an object to scrape. For example, what labels to apply, any authentication required to connect, or other information that defines how the scrape will occur.
目标是要刮除的对象的定义。例如,要应用哪些标签,连接所需的任何身份验证或定义刮擦方式的其他信息。
docker run -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
export VERSION=2.26.0
curl -LO https://github.com/prometheus/prometheus/releases/download/v$VERSION/prometheus-$VERSION.linux-amd64.tar.gz
# 下载之后包含默认的配置文件和启动程序
tar zxvf prometheus-2.26.0.linux-amd64.tar.gz
./prometheus --config.file=prometheus.yml
开源地址:https://github.com/prometheus/node_exporter
export VERSION=1.1.2
curl -OL https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-$VERSION.darwin-amd64.tar.gz
tar -xzf node_exporter-$VERSION.darwin-amd64.tar.gz
# 启动服务
./node_exporter --web.listen-address 127.0.0.1:9100
访问node_exporter可以看到收集的指标信息
http://localhost:9100/metrics
# my global config
global:
# 数据采集周期
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# 规则计算周期
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
#
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
# 收集节点的配置
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
# 静态配置
static_configs:
- targets: ['localhost:9090']
# 采集node exporter监控数据
- job_name: 'node'
static_configs:
- targets: ['localhost:9100', '192.168.1.2:9100','192.168.1.3:9100'] # 添加3数据
检查配置文件
go get github.com/prometheus/prometheus/cmd/promtool
promtool check config prometheus.yml
promtool 参数说明
# 检查配置文件
check config <config-files>...
Check if the config files are valid or not.
# 检查规则
check rules <rule-files>...
Check if the rule files are valid or not.
# 检查metrics
check metrics
Pass Prometheus metrics over stdin to lint them for consistency and correctness.
examples:
$ cat metrics.prom | promtool check metrics
$ curl -s http://localhost:9090/metrics | promtool check metrics
初次接触者可以先在线体验一下 https://play.grafana.org/
docker run -d --name=grafana -p 3000:3000 grafana/grafana
访问 loclhost:3000
系统默认用户名和密码为admin/admin,第一次登陆系统会要求修改密码,修改密码后登陆
登陆后的界面如下
数据源:Graphite,InfluxDB,OpenTSDB,Prometheus,Elasticsearch,CloudWatch和KairosDB等;
我们在数据源里面添加Prometheus
Configuration->Data Srources ->Add Resource->Prometheus ->Select
grafana 提供了很多的监控模版可以快速生成监控页面。
可以访问地址 https://grafana.com/grafana/dashboards
看到很多已经做好的模版,我们可以根据我们监控的对象选择不同的模版。
我们监控主机的信息,我选择了如下模版
红色框内是对应的模版介绍。
我们点击右侧的Copy ID to Clipboard
Create ->Import-> 粘贴刚刚复制的ID 这里是8919 , 点击Load
之后我们配置好名称和数据源点击Import
我们就会看到如下图, 还是特别炫酷的是不是。
如果到实际到业务,我们还是需要根据业务到需求定制我们到页面。
关于详细到操作和详细的Prometheus信息后续继续介绍
后记
2021年200篇计划 第7篇