监控
1. 业务监控(上层概念 - 领导层):
需求方:老板、运营
开发方: 大数据库 ,都会访问业务库,大数据库会从同步库, 宽表
QPS、DAU日活、访问状态(http code)、业务接口(登录、注册、聊天、上传、留言、搜索、投诉)、 产品转换率、充值额度
2. 系统监控
需求方: 运维
开发方: 运维
操作系统相关: cpu使用率、内存使用、磁盘使用率、磁盘空间(非常常见)、TCP(上W的链接),流量
组件: mysql、redis、kafka
3. 日志监控
需求方:运维、开发
开发方:开发
两种日志:业务日志(大数据, 普通日志)、 系统日志(操作系统日志、mysql组件日志、kakfa的日志)
监控中的重头戏,一般我们都会对单独针对日志设计日志管理系统, ELK日志系统, loki
4. 网络监控:
需求方:机房管理
开放方:服务器管理
IDC 交换机、路由器、防火墙、负载均衡、服务器、机柜、电源、UPS、空调、网络设备、机房环境监控,
网络:内部网络(物理内网,虚拟内网(VPN))监控
5. 程序监控:
需求方:开发
开发方:开发
比如产生了 500 ErrUserNotFound
一般要运维和开发人员配合,开发人员在程序中提供监控接口,运维人员通过接口获取监控数据
prometheus的数据格式: metrics
metrics是一种对采样数据的总称
官网:https://prometheus.fuckcloudnative.io/di-yi-zhang-jie-shao/overview
Prometheus通过 Exporters 从目标系统收集数据,并将数据存储到 Prometheus server。Prometheus server 还可以通过 Pushgateway 收集短暂运行的任务或服务的数据。Alertmanager 负责处理 Prometheus 发出的告警,并将告警发送到指定的通知系统。Prometheus web UI 用于查看 Prometheus 收集的数据。
Grafana 与 Prometheus 的结合
Grafana 可以与 Prometheus 结合使用,将 Prometheus 的数据进行可视化展示。Grafana 可以创建各种类型的图表,用于展示 Prometheus 的数据,例如曲线图、柱状图、饼图等。
Prometheus 和 Grafana 的结合具有以下优点:
docker pull prom/node-exporter
docker pull prom/prometheus
docker pull grafana/grafana
docker run -d -p 9100:9100 -v "/proc:/host/proc:ro" -v "/sys:/host/sys:ro" -v "/:/rootfs:ro" prom/node-exporter
访问url:
http://127.0.0.1:9100/metrics
建立 /opt/prometheus/prometheus.yml
内容如下:
global:
scrape_interval: 60s
evaluation_interval: 60s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
labels:
instance: prometheus
- job_name: linux
static_configs:
- targets: ['自己的ip:9100']
labels:
instance: localhost
启动:
docker run -d \
-p 9090:9090 \
-v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
访问url:
127.0.0.1:9090/graph
不需要花过多的精力学习它 用到的时候使用即可
Prometheus Query Language(PromQL)是用于查询和分析从 Prometheus 中收集的监控指标数据的查询语言。以下是 PromQL 的基本语法和一些常见的查询操作符:
选择时间范围:
time()
: 获取当前时间戳。timestamp()
: 将时间戳转换为日期和时间。offset
: 偏移查询的时间范围。选择指标数据:
: 选择具体的指标名称。{=}
: 使用标签选择指标实例。up{job="api"}
: 选择标签 job
等于 api
的指标数据。基本查询操作符:
=
: 等于。!=
: 不等于。=~
: 正则表达式匹配。!~
: 不匹配正则表达式。聚合操作:
sum()
: 对指标数据进行求和。avg()
: 对指标数据取平均值。min()
: 获取指标数据的最小值。max()
: 获取指标数据的最大值。count()
: 计算指标数据的数量。rate([])
: 计算速率,通常用于计算速率指标,例如请求速率。increase([])
: 计算增长量,通常用于计算计数器类型的指标。时间窗口:
[]
: 指定查询的时间范围。offset
: 设置查询时间范围的偏移量。聚合函数:
by(
: 按标签对结果进行分组。topk(, )
: 获取前 k 个结果。quantile(, )
: 计算分位数。布尔操作:
and
: 逻辑与。or
: 逻辑或。unless
: 逻辑非。函数:PromQL 支持多种函数,用于对指标数据进行操作和处理,如 abs()
, floor()
, ceil()
, round()
等。
括号:可以使用括号来控制操作符的优先级。
以下是一些示例 PromQL 查询:
up{job="api"}
: 选择标签 job
等于 api
的 up
指标数据。sum(rate(http_requests_total{job="web"}[5m]))
: 计算过去 5 分钟内 job
为 web
的 http_requests_total
指标的速率总和。node_cpu{mode="idle"} / ignoring(cpu) group_left sum(node_cpu)
:计算 node_cpu
中 mode
为 “idle” 的 CPU 使用率与所有 CPU 使用率的比例,同时按 node_cpu
的标签进行分组。PromQL 具有丰富的功能和语法,允许您执行各种复杂的查询和分析操作,以满足您的监控需求。要深入了解 PromQL,请参考 Prometheus 官方文档或相关教程。
mkdir /opt/grafana-storage
chmod 777 -R /opt/grafana-storage
docker run -d -p 3000:3000 --name=grafana -v /opt/grafana-storage/:/var/lib/grafana grafana/grafana
访问:
127.0.0.1:3000
默认用户名密码:admin/admin
然后add一个
输入自己的ip直接完成:
这时候没有展示 展示什么需要自己配置
重点了解一下panel
或row
就可以了
panel是仪表盘
row是很多panel
然后进行查询就可以看到数据了
这里apply以后可以save保存
保存完以后可以直接进来看你创建的指标 也就是一个row
官方模板:grafana.com/grafana/dashboards/?search=kafka
比如找一个redis的模板下载:
下载完json之后导入到grafana:
可以找其他的模板导入 比如jaeger redis等等
最简单的度量指标,只是一个简单的返回值,或者叫瞬时状态,我们想要知道一个队列中的个数
比如:当前的内存使用率、当前的CPU使用率、当前的磁盘使用率、当前的磁盘空间、当前的TCP连接数、当前的流量、当前的QPS、当前的DAU、当前的访问状态、当前的业务接口、当前的产品转换率、当前的充值额度、当前的业务日志、当前的系统日志、当前的网络设备、当前的服务器、当前的机柜、当前的电源、当前的UPS、当前的空调、当前的网络设备、当前的机房环境监控、当前的程序监控
随着时间的推移, 这个值是不断变化的, 这个值有可能增加,有可能减少
是计数器, 这个值是从0开始累积,在理想状态下,这个值不可能减少
在理想状态下:如果我的服务器重启,同时这个数是放在内存中的
guages和counter是最主要的类型 70%
http_res_time 表示http请求的响应时间
nginx
如果我要统计一天的所有访问的平均耗时
如果我们统计下来平均耗时是50ms 但是, 现在中午有一段时间系统卡住了, 1W个请求 平均耗时是在5s,
但是由于我们每天的访问量很大, 1000W访问量,这个5s耗时的请求就被平均掉了
越早发现越好, 有可能是程序的bug,也有可能是系统的bug
50ms以内有多少请求, 50-200ms有多少请求 200ms-500ms有多少请求 500ms-1s有多少请求 1s-5s有多少请求 5s以上有多少请求
分布式图
直接上代码:
package main
import (
"github.com/gin-gonic/gin"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
"time"
)
// 声明一个counter
var (
opt = promauto.NewCounter(prometheus.CounterOpts{
Name: "jzin_test",
Help: "just for test",
})
)
// 每秒自增
func recordMetrics() {
for {
opt.Inc()
time.Sleep(2 * time.Second)
}
}
// 启动一个http服务,暴露metrics 让prometheus拉取
func main() {
go recordMetrics()
r := gin.Default()
//promauto.NewCounter会把counter注册到defaultRegisterer中 gin.WrapH(promhttp.Handler())会把defaultRegisterer中的metrics暴露出来
r.GET("/metrics", gin.WrapH(promhttp.Handler()))
_ = r.Run(":8050")
}
启动后集成到/opt/prometheus里 添加你自己的ip:端口
比如:
global:
scrape_interval: 60s
evaluation_interval: 60s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
labels:
instance: prometheus
- job_name: linux
static_configs:
- targets: ['172.26.28.143:9100', '你自己的ip:端口']
labels:
instance: localhost
然后重新运行prometheus
也可以集成到Garfana中
代码有点多 有点复杂 想要的私信吧
使用现用的库:https://github.com/penglongli/gin-metrics.git
按照第三方实现即可:
package main
import (
"github.com/gin-gonic/gin"
"github.com/penglongli/gin-metrics/ginmetrics"
)
func main() {
r := gin.Default()
// get global Monitor object
m := ginmetrics.GetMonitor()
// +optional set metric path, default /debug/metrics
m.SetMetricPath("/metrics")
// +optional set slow time, default 5s
m.SetSlowTime(10)
// +optional set request duration, default {0.1, 0.3, 1.2, 5, 10}
// used to p95, p99
m.SetDuration([]float64{0.1, 0.3, 1.2, 5, 10})
// set middleware for gin
m.Use(r)
r.GET("/product/:id", func(ctx *gin.Context) {
"productId": ctx.Param("id"),
})
})
_ = r.Run()
}
第三方还提供了garfana的直方图:
看起来效果挺好
json导入:
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 1,
"links": [],
"panels": [
{
"datasource": null,
"description": "Application request rate every 5 minutes.",
"fieldConfig": {
"defaults": {
"custom": {},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 6,
"w": 8,
"x": 0,
"y": 0
},
"id": 4,
"options": {
"reduceOptions": {
"calcs": [
"mean"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "7.2.0",
"targets": [
{
"expr": "rate(gin_request_total[5m])",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "PV Rate",
"type": "gauge"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"description": "",
"fieldConfig": {
"defaults": {
"custom": {}
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 6,
"w": 8,
"x": 8,
"y": 0
},
"hiddenSeries": false,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.2.0",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "gin_request_total",
"format": "time_series",
"instant": false,
"interval": "",
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "PV",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {
"custom": {}
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 6,
"w": 8,
"x": 16,
"y": 0
},
"hiddenSeries": false,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.2.0",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "gin_request_uv_total",
"interval": "",
"legendFormat": "{{instance}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "UV",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {
"custom": {},
"unit": "Bps"
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 15,
"x": 0,
"y": 6
},
"hiddenSeries": false,
"id": 12,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.2.0",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(gin_request_body_total[5m])",
"interval": "",
"legendFormat": "{{instance}}-in",
"refId": "A"
},
{
"expr": "rate(gin_response_body_total[5m])",
"interval": "",
"legendFormat": "{{instance}}-out",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Traffic In-Out",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"cacheTimeout": null,
"datasource": null,
"fieldConfig": {
"defaults": {
"custom": {
"align": null,
"filterable": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "blue",
"value": null
},
{
"color": "green",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 9,
"x": 15,
"y": 6
},
"id": 10,
"interval": null,
"links": [],
"options": {
"displayMode": "basic",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"last"
],
"fields": "",
"values": false
},
"showUnfilled": true
},
"pluginVersion": "7.2.0",
"targets": [
{
"expr": "sum by(uri, instance) (gin_uri_request_total)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{instance}}-{{uri}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "URI Request",
"type": "bargauge"
},
{
"aliasColors": {},
"breakPoint": "50%",
"cacheTimeout": null,
"combine": {
"label": "Others",
"threshold": 0
},
"datasource": null,
"decimals": null,
"fieldConfig": {
"defaults": {
"custom": {
"align": null,
"filterable": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "blue",
"value": null
},
{
"color": "green",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"fontSize": "80%",
"format": "none",
"gridPos": {
"h": 7,
"w": 7,
"x": 0,
"y": 14
},
"id": 13,
"interval": null,
"legend": {
"show": true,
"values": true
},
"legendType": "Right side",
"links": [],
"nullPointMode": "connected",
"pieType": "pie",
"pluginVersion": "7.2.0",
"strokeWidth": 1,
"targets": [
{
"expr": "sum by(method, instance) (gin_uri_request_total)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{instance}}-{{method}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Method",
"type": "grafana-piechart-panel",
"valueName": "current"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {
"custom": {},
"unit": "s"
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 17,
"x": 7,
"y": 14
},
"hiddenSeries": false,
"id": 16,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.2.0",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.95, sum (rate(gin_request_duration_bucket[5m])) by (le, instance))",
"interval": "",
"legendFormat": "p95",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99, sum (rate(gin_request_duration_bucket[5m])) by (le, instance))",
"interval": "",
"legendFormat": "p99",
"refId": "B"
},
{
"expr": "sum (gin_request_duration_sum) / sum(gin_request_duration_count)",
"interval": "",
"legendFormat": "avg",
"refId": "C"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Request Duration",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"breakPoint": "50%",
"cacheTimeout": null,
"combine": {
"label": "Others",
"threshold": 0
},
"datasource": null,
"decimals": null,
"description": "",
"fieldConfig": {
"defaults": {
"custom": {
"align": null,
"filterable": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "blue",
"value": null
},
{
"color": "green",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"fontSize": "80%",
"format": "none",
"gridPos": {
"h": 5,
"w": 7,
"x": 0,
"y": 21
},
"id": 14,
"interval": null,
"legend": {
"show": true,
"values": true
},
"legendType": "Right side",
"links": [],
"nullPointMode": "connected",
"pieType": "pie",
"pluginVersion": "7.2.0",
"strokeWidth": 1,
"targets": [
{
"expr": "sum by(code, instance) (gin_uri_request_total)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{instance}}-{{code}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Code",
"type": "grafana-piechart-panel",
"valueName": "current"
},
{
"cacheTimeout": null,
"datasource": null,
"fieldConfig": {
"defaults": {
"custom": {
"align": null,
"filterable": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "blue",
"value": null
},
{
"color": "green",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 5,
"w": 17,
"x": 7,
"y": 21
},
"id": 19,
"interval": null,
"links": [],
"options": {
"displayMode": "basic",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"last"
],
"fields": "",
"values": false
},
"showUnfilled": true
},
"pluginVersion": "7.2.0",
"targets": [
{
"expr": "sum by(uri, instance) (gin_slow_request_total)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{instance}}-{{uri}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "Slow Request(default 5s)",
"type": "bargauge"
}
],
"refresh": "5s",
"schemaVersion": 26,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Gin Application Metrics",
"uid": "FDB061FMz",
"version": 11
}
如果直方图报错Panel plugin not found: grafana-piechart-panel
那就给garfana安装插件
下载安装后放到插件目录/var/lib/grafana/plugins后重启grafana就可以了。
wget https://grafana.com/api/plugins/grafana-piechart-panel/versions/latest/download -O grafana-piechart-panel.zip
unzip grafana-piechart-panel.zip
mv grafana-piechart-panel grafana_data/plugins/
chown -R 472:472 *
docker restart grafana
把程序放到自己的服务器中 多写几条get post命令进行测试:
测试结果: