kubernetes上用helm部署loki prometheus grafana

这里是不需要存在的目录

  • kubernetes上用helm部署loki+prometheus+grafana
    • loki_k8s
      • 提前在loki所有节点拉取所需要的镜像
      • 部署
    • prometheus_k8s
      • 修改apiserver
      • 获取helm安装prometheus使需要用的详细配置文件
      • 提前在所有节点拉取prometheus所需要的镜像
      • helm安装promethheus
      • 验证
    • grafana_k8s
      • 获取helm安装grafana使需要用的详细配置文件
      • 修改配置文件的persistence段落
      • 提前在所有节点拉取grafana所需镜像
      • 使用helm安装grafana
      • 安装成功的NOTES提示
      • 验证
      • 导入一些较好用的模板
  • 最终部署效果
    • kubectl get po -n monitoring
    • prometheus导入模板
    • 创建grafana的Data Sources
    • Explore查看loki视图
    • 正则
  • 管理prometheus操作
    • 验证告警
    • prometheus告警设置
  • 总结
  • 总结还是要写的!!!

kubernetes上用helm部署loki+prometheus+grafana

Loki: like Prometheus, but for logs.
Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.

Compared to other log aggregation systems, Loki:

does not do full text indexing on logs. By storing compressed, unstructured logs and only indexing metadata, Loki is simpler to operate and cheaper to run.
indexes and groups log streams using the same labels you’re already using with Prometheus, enabling you to seamlessly switch between metrics and logs using the same labels that you’re already using with Prometheus.
is an especially good fit for storing Kubernetes Pod logs. Metadata such as Pod labels is automatically scraped and indexed.
has native support in Grafana (needs Grafana v6.0).
A Loki-based logging stack consists of 3 components:

promtail is the agent, responsible for gathering logs and sending them to Loki.
loki is the main server, responsible for storing logs and processing queries.
Grafana for querying and displaying the logs.
Loki is like Prometheus, but for logs: we prefer a multidimensional label-based approach to indexing, and want a single-binary, easy to operate system with no dependencies. Loki differs from Prometheus by focussing on logs instead of metrics, and delivering logs via push, instead of pull.

loki_k8s

提前在loki所有节点拉取所需要的镜像

docker pull grafana/loki:v0.1.0
docker pull grafana/promtail:v0.1.0

部署

helm repo add loki https://grafana.github.io/loki/charts
helm repo update
helm upgrade --install loki loki/loki-stack --namespace monitoring

prometheus_k8s

修改apiserver

使apiserver开放所端口1-65535,在配置文件中添加一行信息即可

vim /etc/kubernetes/manifests/kube-apiserver.yaml

- --service-node-port-range=1-65535

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=192.168.22.45
    - --allow-privileged=true
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --insecure-port=0
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=front-proxy-client
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --requestheader-extra-headers-prefix=X-Remote-Extra-
    - --requestheader-group-headers=X-Remote-Group
    - --requestheader-username-headers=X-Remote-User
    - --secure-port=6443
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-cluster-ip-range=10.96.0.0/12
    - --service-node-port-range=1-65535
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    image: k8s.gcr.io/kube-apiserver:v1.14.2
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 192.168.22.45
        path: /healthz
        port: 6443
        scheme: HTTPS
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: kube-apiserver
    resources:
      requests:
        cpu: 250m
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ca-certs
      readOnly: true
    - mountPath: /etc/pki
      name: etc-pki
      readOnly: true
    - mountPath: /etc/kubernetes/pki
      name: k8s-certs
      readOnly: true
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/pki
      type: DirectoryOrCreate
    name: etc-pki
  - hostPath:
      path: /etc/kubernetes/pki
      type: DirectoryOrCreate
    name: k8s-certs
status: {
     }

获取helm安装prometheus使需要用的详细配置文件

git clone https://github.com/gjmzj/kubeasz/
cd kubeasz-master/manifests/prometheus/
ls

提前在所有节点拉取prometheus所需要的镜像

docker pull prom/node-exporter:v0.15.2
docker pull mirrorgooglecontainers/kube-state-metrics:v1.4.0
docker pull jimmidyson/configmap-reload:v0.2.2
dcoker pull prom/prometheus:v2.4.3

helm安装promethheus

helm install \
        --name monitor \
        --namespace monitoring \
        -f prom-settings.yaml \
        -f prom-alertsmanager.yaml \
        -f prom-alertrules.yaml \
        prometheus

验证

##访问prometheus的web界面:
http://$NodeIP:39000

##访问alertmanager的web界面:
http://$NodeIP:39001

grafana_k8s

获取helm安装grafana使需要用的详细配置文件

git clone https://github.com/gjmzj/kubeasz/
cd kubeasz-master/manifests/prometheus/
ls

修改配置文件的persistence段落

【想使用动态存储卷配置此项】

vim grafana/values.yaml

persistence:
  enabled: ture
  storageClassName: "glusterfs-storage"
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  annotations: {
     }
  subPath: ""
  existingClaim:

提前在所有节点拉取grafana所需镜像

docker pull grafana/grafana:6.1.6

##从git拉取的默认配置为grafana/grafana:5.2.4,此版本过旧不能展示loki,所以选择pull镜像6.1.6版本。所以需要去修改kubeasz/manifests/prometheus/prometheus/values.yaml和kubeasz/manifests/prometheus/prometheus/Chart.yaml中的tag,使其为6.1.6

使用helm安装grafana

helm install \
  --name grafana \
  --namespace monitoring \
  -f grafana-settings.yaml \
  grafana

安装成功的NOTES提示

NOTES:
1. Get your 'admin' user password by running:

   kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   grafana.monitoring.svc.cluster.local

   Get the Grafana URL to visit by running these commands in the same shell:
export NODE_PORT=$(kubectl get --namespace monitoring -o jsonpath="{.spec.ports[0].nodePort}" services grafana)
     export NODE_IP=$(kubectl get nodes --namespace monitoring -o jsonpath="{.items[0].status.addresses[0].address}")
     echo http://$NODE_IP:$NODE_PORT


3. Login with the password from step 1 and the username: admin

验证

##访问grafana的web界面:
http://$NodeIP:39002

导入一些较好用的模板

去grafana官网查找一些合适的kubernetes监控模板 官网地址:https://grafana.com/dashboards 觉得还不错的模板:https://grafana.com/dashboards/6417 自己用的其中一个模板文件: kubernetes-apps.json

{
     
  "__inputs": [
    {
     
      "name": "DS_MYDS_PROMETHEUS",
      "label": "MYDS_Prometheus",
      "description": "",
      "type": "datasource",
      "pluginId": "prometheus",
      "pluginName": "Prometheus"
    }
  ],
  "__requires": [
    {
     
      "type": "grafana",
      "id": "grafana",
      "name": "Grafana",
      "version": "5.2.3"
    },
    {
     
      "type": "panel",
      "id": "graph",
      "name": "Graph",
      "version": "5.0.0"
    },
    {
     
      "type": "datasource",
      "id": "prometheus",
      "name": "Prometheus",
      "version": "5.0.0"
    }
  ],
  "annotations": {
     
    "list": [
      {
     
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": null,
  "iteration": 1536221609943,
  "links": [],
  "panels": [
    {
     
      "aliasColors": {
     },
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_MYDS_PROMETHEUS}",
      "fill": 1,
      "gridPos": {
     
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 4,
      "legend": {
     
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
     
          "expr": "avg(rate (container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",namespace=~'$namespace$',pod_name=~'$pod_name$',container_name=~'$container_name$'}[5m]))",
          "format": "time_series",
          "hide": false,
          "instant": false,
          "intervalFactor": 1,
          "legendFormat": "CPU Usage",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeShift": null,
      "title": "CPU Usage",
      "tooltip": {
     
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
     
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
     
        "align": false,
        "alignLevel": null
      }
    },
    {
     
      "aliasColors": {
     },
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_MYDS_PROMETHEUS}",
      "fill": 1,
      "gridPos": {
     
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "id": 6,
      "legend": {
     
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
     
          "expr": "avg(container_memory_usage_bytes{image!=\"\",container_name!=\"POD\",namespace=~'$namespace$',pod_name=~'$pod_name$',container_name=~'$container_name$'})",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "MEM Usage",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeShift": null,
      "title": "MEM Usage",
      "tooltip": {
     
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
     
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
     
        "align": false,
        "alignLevel": null
      }
    },
    {
     
      "aliasColors": {
     },
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_MYDS_PROMETHEUS}",
      "fill": 1,
      "gridPos": {
     
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 9
      },
      "id": 8,
      "legend": {
     
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
     
          "expr": "sum (rate (container_network_receive_bytes_total{image!=\"\",namespace=~'$namespace$',pod_name=~'$pod_name$'}[5m]))",
          "format": "time_series",
          "hide": false,
          "intervalFactor": 1,
          "legendFormat": "Network Input",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeShift": null,
      "title": "Network Input Bytes",
      "tooltip": {
     
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
     
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
     
        "align": false,
        "alignLevel": null
      }
    },
    {
     
      "aliasColors": {
     },
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_MYDS_PROMETHEUS}",
      "fill": 1,
      "gridPos": {
     
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 9
      },
      "id": 10,
      "legend": {
     
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
     
          "expr": "sum (rate (container_network_transmit_bytes_total{image!=\"\",namespace=~'$namespace$',pod_name=~'$pod_name$'}[5m]))",
          "format": "time_series",
          "hide": false,
          "intervalFactor": 1,
          "legendFormat": "Network Output",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeShift": null,
      "title": "Network Output Bytes",
      "tooltip": {
     
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
     
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
     
        "align": false,
        "alignLevel": null
      }
    },
    {
     
      "aliasColors": {
     },
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_MYDS_PROMETHEUS}",
      "fill": 1,
      "gridPos": {
     
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 18
      },
      "id": 12,
      "legend": {
     
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
     
          "expr": "sum (rate (container_network_receive_packets_total{image!=\"\",namespace=~'$namespace$',pod_name=~'$pod_name$'}[5m]))",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Input Packets",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeShift": null,
      "title": "Network Input Packets",
      "tooltip": {
     
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
     
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
     
        "align": false,
        "alignLevel": null
      }
    },
    {
     
      "aliasColors": {
     },
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_MYDS_PROMETHEUS}",
      "fill": 1,
      "gridPos": {
     
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 18
      },
      "id": 14,
      "legend": {
     
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
     
          "expr": "sum (rate (container_network_transmit_packets_total{image!=\"\",namespace=~'$namespace$',pod_name=~'$pod_name$'}[5m]))",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Output Packets",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeShift": null,
      "title": "Network Output Packets",
      "tooltip": {
     
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
     
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
     
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
     
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "schemaVersion": 16,
  "style": "dark",
  "tags": [],
  "templating": {
     
    "list": [
      {
     
        "allValue": null,
        "current": {
     },
        "datasource": "${DS_MYDS_PROMETHEUS}",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": false,
        "name": "namespace",
        "options": [],
        "query": "label_values(container_memory_usage_bytes, namespace)",
        "refresh": 1,
        "regex": "",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
     
        "allValue": null,
        "current": {
     },
        "datasource": "${DS_MYDS_PROMETHEUS}",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": false,
        "name": "app_name",
        "options": [],
        "query": "label_values(container_memory_usage_bytes{namespace =~\"$namespace.*\"}, pod_name)",
        "refresh": 1,
        "regex": "/(.*)-.*/",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
     
        "allValue": null,
        "current": {
     },
        "datasource": "${DS_MYDS_PROMETHEUS}",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": false,
        "name": "pod_name",
        "options": [],
        "query": "label_values(container_memory_usage_bytes{pod_name =~\"$app_name.*\"}, pod_name)",
        "refresh": 1,
        "regex": "",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
     
        "allValue": null,
        "current": {
     },
        "datasource": "${DS_MYDS_PROMETHEUS}",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": false,
        "name": "container_name",
        "options": [],
        "query": "label_values(container_memory_usage_bytes{pod_name =~\"$pod_name.*\",container_name!=\"POD\"}, container_name)",
        "refresh": 1,
        "regex": "",
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
     
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {
     
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "",
  "title": "Kubernetes Apps",
  "uid": "0xvoGCkmz",
  "version": 1
}

最终部署效果

kubectl get po -n monitoring

kubernetes上用helm部署loki prometheus grafana_第1张图片

prometheus导入模板

此时可以看到集群状态

kubernetes上用helm部署loki prometheus grafana_第2张图片

创建grafana的Data Sources

kubernetes上用helm部署loki prometheus grafana_第3张图片

Explore查看loki视图

选择loki视图查看

kubernetes上用helm部署loki prometheus grafana_第4张图片

通过log labels标签选择想要看的具体视图

kubernetes上用helm部署loki prometheus grafana_第5张图片

正则

适用于Prometheus标签选择器规则同样也适用于Loki日志流选择器。

目前支持以下标签匹配运算符:

  • =等于
  • !=不相等
  • =~正则表达式匹配
  • !~不匹配正则表达式

管理prometheus操作

  • 升级(修改配置):修改配置请在prom-settings.yaml, prom-alertsmanager.yaml等文件中进行,保存后执行如下操作:
# 修改prometheus
$ helms upgrade monitor -f prom-settings.yaml -f prom-alertsmanager.yaml -f prom-alertrules.yaml prometheus
# 修改grafana
$ helms upgrade grafana -f grafana-settings.yaml -f grafana-dashboards.yaml grafana
  • 回退:具体可以参考helm help rollback文档
$ helms rollback monitor [REVISION]
  • 删除
$ helms del monitor --purge
$ helms del grafana --purge

验证告警

  1. 修改prom-alertsmanager.yaml文件中邮件告警为有效的配置内容,并使用 helms upgrade更新安装
  2. 查看prom-alertrules.yaml文件,确认文件中设置了内存使用超过90%的告警规则
  3. 部署测试应用,并压力测试使其内存超过90%,看是否触发告警并发送告警邮件
# 创建deploy和service
$ kubectl run nginx1 --image=nginx --port=80 --expose --limits='cpu=500m,memory=4Mi'

# 增加负载(可用Ctrl + C 停止)
$ kubectl run --rm -it load-generator --image=busybox /bin/sh
Hit enter for command prompt
$ while true; do wget -q -O- http://nginx1; done;

# 等待约几分钟查看是否有告警

prometheus告警设置

  • 访问prometheus的web界面:

    http://$NodeIP:39000

点击可查看具体告警规则配置

[外链图片转存失败(img-HleGvWIK-1568184884429)(https://lichi6174.github.io/assets/img/prometheus-alers-rules.jpg)]

  • 访问alertmanager的web界面:
http://$NodeIP:39001

如果已产生告警,可以在这里面查看到

[外链图片转存失败(img-WQor99cH-1568184884440)(https://lichi6174.github.io/assets/img/prometheus-alertmanager.jpg)]

  • 关于prometheus告警规则的编写,可以在以下图示位置进行规则语法和效果验证:

    [外链图片转存失败(img-Ba3d1MxV-1568184884443)(https://lichi6174.github.io/assets/img/prometheus-graph.jpg)]

  • 增加告警规则后生效方法:

$helms upgrade monitor -f prom-settings.yaml -f prom-alertsmanager.yaml -f prom-alertrules.yaml prometheus
  • prometheus告警规则举例:

prom-alertrules.yaml

serverFiles:
  alerts:
    groups:
    - name: k8s_alert_rules
      rules:
      # ALERT when container memory usage exceed 90%
      - alert: container_mem_over_90
        expr: (sum(container_memory_working_set_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) / (sum (container_spec_memory_limit_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) > 0.9 and (sum(container_memory_working_set_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) / (sum (container_spec_memory_limit_bytes{image!="",name=~"^k8s_.*", pod_name!=""}) by (pod_name)) < 2
        for: 2m
        annotations:
          summary: "'s memory usage alert"
          description: "Memory Usage of Pod  on  has exceeded 90% for more than 2 minutes."

      # ALERT when node is down
      - alert: node_down
        #expr: up == 0
        expr: up{instance!~"^.*:53$"} == 0
        for: 60s
        annotations:
          summary: "Node  is down"
          description: "Node  is down"
  
      # ALERT when pod 启动失败 !
      - alert: pod_start_false
        expr: kube_pod_status_phase{phase=~"Failed|Unknown"} == 1
        for: 60s
        annotations:
          summary: "Pod  start false"
          description: "Pod  start false"
      
      # ALERT when 集群节点内存或磁盘资源短缺!
      - alert: Cluster node DiskOverpressure|MemoryOverpressure|DiskOverpressure
        expr: kube_node_status_condition{condition=~"OutOfDisk|MemoryPressure|DiskPressure",status!="false"} == 1
        for: 300s
        annotations:
          summary: "Node  start false"
          description: "Pod  start false"
      
      # ALERT when 集群节点状态错误!
      - alert: Cluster node status error
        expr: kube_node_status_condition{condition="Ready",status!="true"} == 1
        for: 300s
        annotations:
          summary: "Node  start false"
          description: "Pod  start false"
  • 其他监控规则参考:

存在执行失败的Job:

kube_job_status_failed{job="kubernetes-service-endpoints",k8s_app="kube-state-metrics"} == 1

集群中存在失败的PVC:

kube_persistentvolumeclaim_status_phase{phase="Failed"} == 1
  • 最近30分钟内有Pod容器重启:
changes(kube_pod_container_status_restarts_total[30m]) > 0

总结

总结还是要写的!!!

总个结:【此处吧啦吧啦吧啦讲了一大堆】,还有很多不成熟的地方,有啥就留言或者加我VX:youaremysuperwomen45

二维码在这里:
kubernetes上用helm部署loki prometheus grafana_第6张图片

你可能感兴趣的:(kubernetes,loki,prometheus,grafana,kubernetes,helm)