Prometheus configuration is YAML,本文将以一个示例配置来进行解读
官方配置在此,英文要好
#全局配置将应用到所有的配置项上去,对于具体配置项中的同一配置将重写全局配置
global:
scrape_interval: 15s
#抓取间隔,即prometheus server从给定输出器/端口获取指标的时间间隔
evaluation_interval: 30s
#评估间隔,即prometheus server对抓取到的指标进行评估的时间间隔
# scrape_timeout is set to the global default (10s).
#抓取的超时时间被设置为10s
external_labels:
#外部标签,自定义键值对,可多个
monitor: codelab
foo: bar
rule_files:
#规则读取文件或者路径,可多个,支持一定的正则匹配,后文将附上一般规则文件格式示例
- "first.rules"
- "my/*.rules"
remote_write:
#远程写入,将本台服务器收集到的数据写到另外的主机上去,可以对部分标签执行具体的动作
- url: http://remote1/push
write_relabel_configs:
- source_labels: [__name__]
regex: expensive.*
action: drop
- url: http://remote2/push
remote_read:
#远程读取,从其他主机获取抓取的指标,可以指明具体的服务/任务(通过标签来过滤)
- url: http://remote1/read
read_recent: true
- url: http://remote3/read
read_recent: false
required_matchers:
job: special
scrape_configs:
#抓取的配置
- job_name: prometheus
honor_labels: true#用来解决抓取到的数据与服务器端标签冲突的情况,设置为true则保留抓取到的数据的标签,否则应用抓取对象+服务端标签
# scrape_interval is defined by the configured global (15s).
# scrape_timeout is defined by the global default (10s).
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
file_sd_configs:
#通过指定的文件或者路径进行自动服务.目标发现,一般需要执行reload才能生效,后文将对文件格式做示例讲解
- files:
- foo/*.slow.json
- foo/*.slow.yml
- single/file.yml
refresh_interval: 10m
- files:
- bar/*.yaml
static_configs:
#服务/目标的静态配置,格式如下,列表中包含的是端点
- targets: ['localhost:9090', 'localhost:9191']
labels:#目标标签,自定义键值对,可多个
my: label
your: label
relabel_configs:#标签重写配置
- source_labels: [job, __meta_dns_name]#源标签
regex: (.*)some-[regex]#正则匹配
target_label: job#目标标签
replacement: foo-${1}#替换值
# action defaults to 'replace'
- source_labels: [abc]
target_label: cde
- replacement: static
target_label: abc
- regex:
replacement: static
target_label: abc
bearer_token_file: valid_token_file
- job_name: service-x
basic_auth:#对于需要进行认证的服务进行的认证设置
username: admin_name
password: "multiline\nmysecret\ntest"
scrape_interval: 50s
scrape_timeout: 5s
sample_limit: 1000
metrics_path: /my_path
scheme: https
dns_sd_configs:
- refresh_interval: 15s
names:
- first.dns.address.domain.com
- second.dns.address.domain.com
- names:
- first.dns.address.domain.com
# refresh_interval defaults to 30s.
relabel_configs:
- source_labels: [job]
regex: (.*)some-[regex]
action: drop
- source_labels: [__address__]
modulus: 8
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: 1
action: keep
- action: labelmap
regex: 1
- action: labeldrop
regex: d
- action: labelkeep
regex: k
metric_relabel_configs:
- source_labels: [__name__]
regex: expensive_metric.*
action: drop
- job_name: service-y
consul_sd_configs:
- server: 'localhost:1234'
token: mysecret
services: ['nginx', 'cache', 'mysql']
scheme: https
tls_config:
ca_file: valid_ca_file
cert_file: valid_cert_file
key_file: valid_key_file
insecure_skip_verify: false
relabel_configs:
- source_labels: [__meta_sd_consul_tags]
separator: ','
regex: label:([^=]+)=([^,]+)
target_label: ${1}
replacement: ${2}
- job_name: service-z
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
bearer_token: mysecret
- job_name: service-kubernetes
kubernetes_sd_configs:
- role: endpoints
api_server: 'https://localhost:1234'
basic_auth:
username: 'myusername'
password: 'mysecret'
- job_name: service-kubernetes-namespaces
kubernetes_sd_configs:
- role: endpoints
api_server: 'https://localhost:1234'
namespaces:
names:
- default
- job_name: service-marathon
marathon_sd_configs:
- servers:
- 'https://marathon.example.com:443'
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
- job_name: service-ec2
ec2_sd_configs:
- region: us-east-1
access_key: access
secret_key: mysecret
profile: profile
- job_name: service-azure
azure_sd_configs:
- subscription_id: 11AAAA11-A11A-111A-A111-1111A1111A11
tenant_id: BBBB222B-B2B2-2B22-B222-2BB2222BB2B2
client_id: 333333CC-3C33-3333-CCC3-33C3CCCCC33C
client_secret: mysecret
port: 9100
- job_name: service-nerve
nerve_sd_configs:
- servers:
- localhost
paths:
- /monitoring
- job_name: 0123service-xxx
metrics_path: /metrics
static_configs:
- targets:
- localhost:9090
- job_name: 測試
metrics_path: /metrics
static_configs:
- targets:
- localhost:9090
- job_name: service-triton
triton_sd_configs:
- account: 'testAccount'
dns_suffix: 'triton.example.com'
endpoint: 'triton.example.com'
port: 9163
refresh_interval: 1m
version: 1
tls_config:
cert_file: testdata/valid_cert_file
key_file: testdata/valid_key_file
alerting:#配置接收报警的alertmanager,端点可以是多个
alertmanagers:
- scheme: https
static_configs:
- targets:
- "1.2.3.4:9093"
- "1.2.3.5:9093"
- "1.2.3.6:9093"
规则格式:
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
示例配置
groups:#规则组
- name: example#第一个组名
rules:#组内规则,下面的报警规则可以是多个,以下列出两个
- alert: HighErrorRate1
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
- alert: HighErrorRate2
expr: job:request_latency_seconds:mean5m{job="yourjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
服务配置,示例配置,json格式,服务以列表形式,服务内目标也以列表形式:
[
{
"targets": [
"127.0.0.1:9104"
],
"labels": {
"job":"job1",
"service":"service1"
}
},
{
"targets": [
"127.0.0.1:9105",
"127.0.0.1:9106",
"127.0.0.1:9107"
],
"labels": {
"job":"job2",
"service":"service2"
}
}
]
另外需要注意的一点是关于prometheus配置重载:
Prometheus can reload its configuration at runtime. If the new configuration is not well-formed, the changes will not be applied. A configuration reload is triggered by sending a SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is enabled). This will also reload any configured rule files.
如果想要通过向端点发送重载请求来实现服务配置重载那么我们需要在运行程序的时候添加参数如下(两种运行方式):
命令行启动
nohup ./prometheus --web.enable-lifecycle --config.file=prometheus.yml &
或者修改服务启动文件:
"/usr/lib/systemd/system/prometheus.service"
# -*- mode: conf -*-
[Unit]
Description=The Prometheus monitoring system and time series database.
Documentation=https://prometheus.io
After=network.target
[Service]
EnvironmentFile=-/etc/default/prometheus
User=prometheus
ExecStart=/usr/bin/prometheus \
--web.enable-lifecycle \##注意这一行默认没有的,需要加上才能开启Lifecycle APIs
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/data \
--web.console.libraries=/usr/share/prometheus/console_libraries \
--web.console.templates=/usr/share/prometheus/consoles \
$PROMETHEUS_OPTS
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
要使prometheus的配置重载有两种方法:
一:发送SIGHUP信号给应用程序的主进程:
kill -1 pid
二:发送post请求给指定端点:
curl -XPOST http://ip:9090/-/reload
#对于此种方法要注意在启动时加上以上所说的--web.enable-lifecycle启动参数