Prometheus源码分析(二)配置文件说明

本想直接对Prometheus各个组件进行源码分析,但考虑到源码中与prometheus、alertmanager组件中配置文件(prometheus.yml、alertmanager.yml)有很大的关联,所以这一节主要针对配置文件进行说明。

Prometheus更多功能介绍请访问以下网址:

https://prometheus.io/docs/introduction/overview/

对Prometheus感兴趣的朋友请加入QQ群:70860761 一起探讨

配置说明

prometheus.yml

# my global config
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
  # - "first.rules"
  # - "second.rules"
  - "alert.rules"
  # - "record.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'windows-test'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 1s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['192.168.3.1:9090','192.168.3.120:9090']

  - job_name: 'windows-chenx'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 3s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['192.168.3.1:9091']

参数说明:

  • global下的scrape_interval
    用于向pushgateway采集数据的频率,上图所示:每隔15秒向pushgateway采集一次指标数据
  • global下的evaluation_interval
    表示规则计算的频率,上图所示:每隔15秒根据所配置的规则集,进行规则计算
  • global下的external_labels
    为指标增加额外的维度,可用于区分不同的prometheus,在应用中多个prometheus可以对应一个alertmanager
  • rule_files
    指定所配置规则文件,文件中每行可表示一个规则
  • scrape_configs下的job_name
    指定任务名称,在指标中会增加该维度,表示该指标所属的job
  • scrape_configs下的scrape_interval
    覆盖global下的scrape_interval配置
  • static_configs下的targets
    指定指标数据源的地址,多个地址之间用逗号隔开

alertmanager.yml

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'nihao206206#'
  # The auth token for Hipchat.
  hipchat_auth_token: '1234556789'
  # Alternative host for Hipchat.
  hipchat_url: 'https://hipchat.foobar.org/'

# The directory from which notification templates are read.
templates: 
- '/etc/alertmanager/template/*.tmpl'

# The root route on which each incoming alert enters.
route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname', 'cluster', 'service']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first 
  # notification.
  group_wait: 30s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 5m

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 3h 

  # A default receiver
  receiver: team-X-mails

  # All the above attributes are inherited by all child routes and can 
  # overwritten on each.

  # The child route trees.
  routes:
  # This routes performs a regular expression match on alert labels to
  # catch alerts that are related to a list of services.
  - match_re:
      service: ^(foo1|foo2|baz)$
    receiver: team-X-mails
    # The service has a sub-route for critical alerts, any alerts
    # that do not match, i.e. severity != critical, fall-back to the
    # parent node and are sent to 'team-X-mails'
    routes:
    - match:
        severity: critical
      receiver: team-X-pager
  - match:
      service: files
    receiver: team-Y-mails

    routes:
    - match:
        severity: critical
      receiver: team-Y-pager

  # This route handles all alerts coming from a database service. If there's
  # no team to handle it, it defaults to the DB team.
  - match:
      service: database
    receiver: team-DB-pager
    # Also group alerts by affected database.
    group_by: [alertname, cluster, database]
    routes:
    - match:
        owner: team-X
      receiver: team-X-pager
    - match:
        owner: team-Y
      receiver: team-Y-pager


# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is 
# already critical.
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  # Apply inhibition if the alertname is the same.
  equal: ['alertname', 'cluster', 'service']


receivers:
- name: 'team-X-mails'
  webhook_configs:
  - url: 'http://u2.kugou.net:11770/sendRtxByPost'

- name: 'team-X-pager'
  email_configs:
  - to: '[email protected]'
  pagerduty_configs:
  - service_key: 

- name: 'team-Y-mails'
  email_configs:
  - to: '[email protected]'

- name: 'team-Y-pager'
  pagerduty_configs:
  - service_key: 

- name: 'team-DB-pager'
  pagerduty_configs:
  - service_key: 
- name: 'team-X-hipchat'
  hipchat_configs:
  - auth_token: 
    room_id: 85
    message_format: html
    notify: true

参数说明

  • global
    smtp_smarthost、smtp_from、smtp_auth_username、smtp_auth_password用于设置smtp邮件的地址及用户信息
    hipchat_auth_token与安全性认证有关
  • templates
    指定告警信息展示的模版
  • route
    group_by:指定所指定的维度对告警进行分组
    group_wait:指定每组告警发送等待的时间
    group_interval:指定告警调度的时间间隔
    repeat_interval:在连续告警触发的情况下,重复发送告警的时间间隔
  • receiver
    指定告警默认的接受者
  • routes
    match_re:定义告警接收者的匹配方式
    service:定义匹配的方式,纬度service值以foo1或foo2或baz开始/结束时表示匹配成功
    receiver:定义了匹配成功的的情况下的接受者
  • inhibit_rules
    定义告警的抑制条件,过滤不必要的告警
  • receivers
    定义了具体的接收者,也就是告警具体的方式方式

你可能感兴趣的:(Prometheus)