Prometheus在/api/v1的路径下开放了HTTP接口,用户可以通过这些接口进行二次开发。这篇笔记挑选了此次监控平台可能会用到的接口进行解析。
以JSON格式进行响应。
若API请求成功 返回一个2xx的状态码。
若请求失败,分情况返回以下状态码:
响应JSON内容如下:
{
"status": "success" | "error",
"data": ,
// Only set if status is "error". The data field may still hold
// additional data.
"errorType": "",
"error": "",
// Only if there were warnings while executing the request.
// There will still be data in the data field.
"warnings": [""]
}
输出中的时间戳为以秒为单位的Unix时间戳,故若请求中有时间戳,推荐使用以秒为单位的Unix时间戳。
可以重复查询的参数名称应用[]为后缀。
占位符 指Prometheus持续时间字符串。[0-9]+[smhdwy]。例如,5m指持续时间为5分钟。
占位符指布尔值(字符串true和false)。
用户可以通过接口使用promQL查询瞬时或某一个时间段的值,
url地址:
GET /api/v1/query
POST /api/v1/query
URL查询参数:
注:
如果time参数未填写,则使用当前服务器时间。
使用POST方法和 Content-Type: application/x-www-form-urlencoded标头直接在请求正文中对这些参数进行URL编码。
返回结果格式:
{
"resultType": "matrix" | "vector" | "scalar" | "string",
"result":
}
示例1: 查询Prometheus自带metric
$ curl 'http://localhost:9090/api/v1/query?query=up
{
"status" : "success",
"data" : {
"resultType" : "vector",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"value": [ 1435781451.781, "1" ]
},
{
"metric" : {
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9100"
},
"value" : [ 1435781451.781, "0" ]
}
]
}
}
示例2: 使用promQL查询
$ curl 'http://localhost:9090/api/v1/query?query=sum(time() - node_boot_time_seconds)
{
"status": "success",
"data": {
"resultType": "vector",
"result": [ {
"metric": {},
"value": [
1.581498526305E9,
"6767484.305000067"
]
}]
}
}
ur地址:
GET /api/v1/query_range
POST /api/v1/query_range
URL查询参数:
返回结果可能有以下字段:
{
"resultType": "matrix",
"result":
}
示例1:
$ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s'
{
"status" : "success",
"data" : {
"resultType" : "matrix",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"values" : [
[ 1435781430.781, "1" ],
[ 1435781445.781, "1" ],
[ 1435781460.781, "1" ]
]
},
{
"metric" : {
"__name__" : "up",
"job" : "node",
"instance" : "localhost:9091"
},
"values" : [
[ 1435781430.781, "0" ],
[ 1435781445.781, "0" ],
[ 1435781460.781, "1" ]
]
}
]
}
}
查询Prometheus所监控的目标端、rules等,主要用作全局配置
返回Prometheus所监控的目标端的当前状态的概述。
URL地址:
GET /api/v1/targets
默认会返回所有的端点,包括当前检测端点和已经删除的端点。
其中
$ curl http://localhost:9090/api/v1/targets
{
"status": "success",
"data": {
"activeTargets": [
{
"discoveredLabels": {
"__address__": "127.0.0.1:9090",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"job": "prometheus"
},
"labels": {
"instance": "127.0.0.1:9090",
"job": "prometheus"
},
"scrapePool": "prometheus",
"scrapeUrl": "http://127.0.0.1:9090/metrics",
"lastError": "",
"lastScrape": "2017-01-17T15:07:44.723715405+01:00",
"lastScrapeDuration": 0.050688943,
"health": "up"
}
],
"droppedTargets": [
{
"discoveredLabels": {
"__address__": "127.0.0.1:9100",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"job": "node"
}
}
]
}
}
可以看到Prometheus将监控的target节点以json的格式返回,我们可以通过处理Json数据来统计当前处于各种状态的主机数量。
prometheuse还提供了state查询参数,用来过滤target
state可以选填 state=active,state=dropped,state=any
例如:
$ curl 'http://localhost:9090/api/v1/targets?state=active'
{
"status": "success",
"data": {
"activeTargets": [
{
"discoveredLabels": {
"__address__": "127.0.0.1:9090",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"job": "prometheus"
},
"labels": {
"instance": "127.0.0.1:9090",
"job": "prometheus"
},
"scrapePool": "prometheus",
"scrapeUrl": "http://127.0.0.1:9090/metrics",
"lastError": "",
"lastScrape": "2017-01-17T15:07:44.723715405+01:00",
"lastScrapeDuration": 50688943,
"health": "up"
}
],
"droppedTargets": []
}
}
该接口返回告警并记录当前配置生效的规则列表,此外,还返回当前活动的告警实例;
URL地址:
GET /api/v1/rules
URL查询参数 - type=alert|record: :仅返回警报规则(例如type=alert)或记录规则(例如type=record)。如果该参数不存在或为空,则不执行任何过滤。
$ curl http://localhost:9090/api/v1/rules
{
"data": {
"groups": [
{
"rules": [
{
"alerts": [
{
"activeAt": "2018-07-04T20:27:12.60602144+02:00",
"annotations": {
"summary": "High request latency"
},
"labels": {
"alertname": "HighRequestLatency",
"severity": "page"
},
"state": "firing",
"value": "1e+00"
}
],
"annotations": {
"summary": "High request latency"
},
"duration": 600,
"health": "ok",
"labels": {
"severity": "page"
},
"name": "HighRequestLatency",
"query": "job:request_latency_seconds:mean5m{job=\"myjob\"} > 0.5",
"type": "alerting"
},
{
"health": "ok",
"name": "job:http_inprogress_requests:sum",
"query": "sum(http_inprogress_requests) by (job)",
"type": "recording"
}
],
"file": "/rules.yaml",
"interval": 60,
"name": "example"
}
]
},
"status": "success"
}
可以看到这个接口会返回告警信息,所以这个接口也可以用来获取当前告警。
该/alerts路径返回所有活动警报的列表。
URL地址:
GET /api/v1/alerts
$ curl http://localhost:9090/api/v1/alerts
{
"data": {
"alerts": [
{
"activeAt": "2018-07-04T20:27:12.60602144+02:00",
"annotations": {},
"labels": {
"alertname": "my-alert"
},
"state": "firing",
"value": "1e+00"
}
]
},
"status": "success"
}
此处返回告警信息,需要注意的是返回信息中的标签字段返回的是rules中配置的标签值,如果要根据对metrics的标签进行告警的区分还需要通过其他手段来获取。