sum() 将括号内的指标值求和
count() 将括号内的指标求总数
increase() 增量
rate() 计算某个时间序列范围内的每秒平均增长率
irate() 指计算一段时间范围内某个时刻的每秒增长率
count_scalar() 值将时间序列向量中的元素个数作为标量返回
就Prometheus而言,pull拉取采样点的端点服务称之为instance
当Prometheus拉取一个目标, 会自动地把两个标签添加到度量名称的标签列表中,分别是:
host:port
up{instance="[instance-id]",job="[job-name]"}
up值=1,表示采样点所在服务健康;否则,网络不通,或者服务挂掉了
scrape_duration_seconds{instance="instance-id",job="job-name"}
尝试获取目前采样点的时间开销
scrape_samples_post_metric_relabeling{instance="instance-id",job="job-name"}
表示度量指标的标签变化后,标签没有变化的度量指标数量
scrape_samples_scraped{instance="instance-id",job="job-name"}
这个采样点目标暴露的样本点数量
s
- seconds(秒)m
- minutes(分钟)h
- hours(小时)d
- days(天)w
- weeks(周)y
- years(年)选择过去5分钟内,度量指标名称为http_requests_total,标签为 job=“prometheus” 的时间序列数据
http_requests_total{job="prometheus"}[5m]
offset
偏移修饰符允许在查询中改变单个瞬时向量和范围向量中的时间偏移
返回相对于当前时间的前五分钟的时刻,度量指标名称为node_cpu_seconds_total的时间序列数据
sum(node_cpu_seconds_total{mode="idle"} offset 5m)
对于硬件指标阈值的设置
警告:90 报警:98
警告:90 报警:95
警告:90 报警:98
groups:
- name: #报警规则组的名字
rules:
- alert: #检查job的状态,持续1分钟metrices不能访问会发给alertmanager进行报警
expr: #promQL
for: 1m #持续时间,表示持续一分钟获取不到信息,则会触发报警
labels:
serverity:page # 自定义标签
annotations:
summary: #自定义摘要
description:#自定义具体描述
label_values(node_exporter_build_info,instance)
过滤端口的正则表达式
/([^:]+):.*/
(1 - (node_memory_MemAvailable_bytes{instance=~"$node:9100"} / (node_memory_MemTotal_bytes{instance=~"$node:9100"})))* 100
count(count(node_cpu_seconds_total{instance=~"$node:9100", mode='system'}) by (cpu))
100 - (avg(irate(node_cpu_seconds_total{instance=~"$node",mode="idle"}[5m])) * 100)
node_load1{instance=~"$node:9100"}/ count by(job, instance)(count by(job, instance, cpu)(node_cpu_seconds_total{instance=~"$node:9100"}))
node_load1:指1分钟内cpu平均负载,同样cpu_load5指5分钟内cpu平均负载,cpu_load15指15分钟内cpu平均负载
node_memory_MemTotal_bytes{instance=~"$node:9100"}
time() - node_boot_time_seconds{instance=~"$node:9100"}
总内存
node_memory_MemTotal_bytes{instance=~"$node:9100"}
已使用内存 (总内存-空闲内存-缓存=已使用内存)
node_memory_MemTotal_bytes{instance=~"$node:9100"} - node_memory_MemFree_bytes{instance=~"$node:9100"} - node_memory_Cached_bytes{instance=~"$node:9100"} - node_memory_Buffers_bytes{instance=~"$node:9100"} - node_memory_Slab_bytes{instance=~"$node:9100"}
可用内存
node_memory_MemAvailable_bytes{instance=~"$node:9100"}
Buffers缓存
node_memory_Buffers_bytes{instance=~"$node:9100"}
Cached缓存
node_memory_Cached_bytes{instance=~"$node:9100"} + node_memory_Slab_bytes{instance=~"$node:9100"}
Free空闲内存
node_memory_MemFree_bytes{instance=~"$node:9100"}
上传速率:
irate(node_network_transmit_bytes_total{instance=~'$node:9100',device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])*8
下载速率:
irate(node_network_receive_bytes_total{instance=~'$node:9100',device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])*8
Swap In:
node_vmstat_pswpin{instance=~"$node:9100"}
Swap out:
node_vmstat_pswpout{instance=~"$node:9100"}
node_filefd_allocated{instance=~"$node:9100"}
100 - ((node_filesystem_avail_bytes{instance=~"$node:9100",mountpoint="/",fstype=~"ext4|xfs"} * 100) / node_filesystem_size_bytes {instance=~"$node:9100",mountpoint="/",fstype=~"ext4|xfs"})
100 - ((node_filesystem_avail_bytes{instance=~"$node:9100",mountpoint="$maxmount",fstype=~"ext4|xfs"} * 100) / node_filesystem_size_bytes {instance=~"$node:9100",mountpoint="$maxmount",fstype=~"ext4|xfs"})
1分钟
node_load1{instance=~"$node:9100"}
5分钟
node_load5{instance=~"$node:9100"}
15分钟
node_load15{instance=~"$node:9100"}
node_filesystem_size_bytes {instance=~"$node:9100",fstype=~"ext4|xfs"}
node_filesystem_avail_bytes {instance=~'$node',fstype=~"ext4|xfs"}
node_filesystem_size_bytes{instance=~'$node',fstype=~"ext4|xfs"}
1-(node_filesystem_free_bytes{instance=~'$node',fstype=~"ext4|xfs"} / node_filesystem_size_bytes{instance=~'$node',fstype=~"ext4|xfs"})
avg(irate(node_cpu_seconds_total{instance=~"$node:9100",mode="system"}[5m])) by (instance)
avg(irate(node_cpu_seconds_total{instance=~"$node:9100",mode="user"}[5m])) by (instance)
avg(irate(node_cpu_seconds_total{instance=~"$node:9100",mode="idle"}[5m])) by (instance)
avg(irate(node_cpu_seconds_total{instance=~"$node:9100",mode="iowait"}[5m])) by (instance)
irate(node_disk_io_time_seconds_total{instance=~"$node:9100"}[5m])