Prometheus Server:负责采集监控数据,并且对外提供PromQL实现监控数据的查询以及聚合分析;
Exporters:用于向Prometheus Server暴露数据采集的endpoint,Prometheus轮训这些Exporter采集并且保存数据;
AlertManager以及其它组件(……和本文无关就不说这些)
scrape_configs:
- job_name: prometheus
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- localhost:9090
只要Exporter在运行,你可以在任何地方(比如在本地),搭建你的监控系统
你可以更容器的去定位Instance实例的健康状态以及故障定位
version: '2'
services:
consul:
image: consul
ports:
- 8400:8400
- 8500:8500
- 8600:53/udp
command: agent -server -client=0.0.0.0 -dev -node=node0 -bootstrap-expect=1 -data-dir=/tmp/consul
labels:
SERVICE_IGNORE: 'true'
registrator:
image: gliderlabs/registrator
depends_on:
- consul
volumes:
- /var/run:/tmp:rw
command: consul://consul:8500
prometheus:
image: quay.io/prometheus/prometheus
ports:
- 9090:9090
node_exporter:
image: quay.io/prometheus/node-exporter
pid: "host"
ports:
- 9100:9100
cadvisor:
image: google/cadvisor:latest
ports:
- 8080:8080
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /var/lib/docker/:/var/lib/docker:ro
global:
scrape_interval: 5s
scrape_timeout: 5s
evaluation_interval: 15s
scrape_configs:
- job_name: consul_sd
metrics_path: /metrics
scheme: http
consul_sd_configs:
- server: consul:8500
scheme: http
services:
- node_exporter
- cadvisor
server:指定了Consul的访问地址
services:为注册到Consul中的实例信息
services:
prometheus:
volumes:
- ./prometheus/prometheus:/etc/prometheus/prometheus.yml
需要按照不同的环境dev、stage、prod聚合监控数据?
对于研发团队而言,我可能只关心dev环境的监控数据?
为每一个团队单独搭建一个Prometheus Server? 如何让不同团队的Prometheus Server采集不同的环境监控数据?
node_cpu{cpu="cpu0",instance="172.21.0.3:9100",job="consul_sd",mode="guest"}
node_cpu{cpu="cpu0",instance="172.21.0.3:9100",dc="dc1",job="consul_sd",mode="guest"}
_meta_consul_address:Consul地址
_meta_consul_dc:Consul中服务所在的数据中心
_meta_consul_ metadata_
_meta_consul_node:服务所在Consul节点的信息
_meta_consul_ service_address:服务访问地址
_meta_consul_ service_id:服务ID
_meta_consul_ service_port:服务端口
_meta_consul_service:服务名称
_meta_consul_tags:服务包含的标签信息
...
scrape_configs:
- job_name: consul_sd
relabel_configs:
- source_labels: ["__meta_consul_dc"]
regex: "(.*)"
replacement: $1
action: replace
target_label: "dc"
...
target_label: "dc"
node_cpu{cpu="cpu0",dc="dc1",instance="172.21.0.6:9100",job="consul_sd",mode="guest"} 0
node_cpu{cpu="cpu0",dc="dc1",instance="172.21.0.6:9100",job="consul_sd",mode="guest_nice"} 0
node_cpu{cpu="cpu0",dc="dc1",instance="172.21.0.6:9100",job="consul_sd",mode="idle"} 91933.77
node_cpu{cpu="cpu0",dc="dc1",instance="172.21.0.6:9100",job="consul_sd",mode="iowait"} 56.8
node_cpu{cpu="cpu0",dc="dc1",instance="172.21.0.6:9100",job="consul_sd",mode="irq"} 0
node_cpu{cpu="cpu0",dc="dc1",instance="172.21.0.6:9100",job="consul_sd",mode="nice"} 0
node_cpu{cpu="cpu0",dc="dc1",instance="172.21.0.6:9100",job="consul_sd",mode="softirq"} 19.02
relabel_configs:
- source_labels: ["__meta_consul_tags"]
regex: ".*,development,.*"
action: keep
version: '2'
services:
consul:
image: consul
ports:
- 8400:8400
- 8500:8500
- 8600:53/udp
command: agent -server -client=0.0.0.0 -dev -node=node0 -bootstrap-expect=1 -data-dir=/tmp/consul
labels:
SERVICE_IGNORE: 'true'
registrator:
image: gliderlabs/registrator
depends_on:
- consul
volumes:
- /var/run:/tmp:rw
command: consul://consul:8500
prometheus:
image: quay.io/prometheus/prometheus
ports:
- 9090:9090
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
node_exporter:
image: quay.io/prometheus/node-exporter
pid: "host"
ports:
- 9100:9100
labels:
SERVICE_TAGS: "development" # 设置该服务向consul注册的TAGS为development
cadvisor:
image: google/cadvisor:latest
ports:
- 8080:8080
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /var/lib/docker/:/var/lib/docker:ro
labels:
SERVICE_TAGS: "production,scraped" # 设置该服务向consul注册的TAGS为development,production
在云平台/容器平台中我们可以通过Prometheus的SD能力动态发现监控的目标实例
通过relabeling可以在写入metrics数据之前,动态修改metrics的label
通过relabeling可以对Target实例进行过滤和选择
基于Kubernetes的DevOps实践培训