2018-12-12 Prometheus+grafana监控kubernetes 遇到的坑

grafana启用插件:grafana-kubernetes-app

grafana有一个专门针对Kubernetes集群监控的插件:grafana-kubernetes-app
安装这个插件

  • 可以在部署grafana的时候,直接把插件装上:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
   name: monitoring-grafana
   namespace: janny
spec:
   replicas: 1
   template:
     metadata:
       labels:
         task: monitoring
         k8s-app: grafana     
     spec:              
       containers:
       - name: grafana
         image: registry.cn-shanghai.aliyuncs.com/grafana_cluster/grafana:latest
         ports:
         - containerPort: 3000
           protocol: TCP         
         env:
         - name: INFLUXDB_HOST
           value: monitoring-influxdb
         - name: GF_INSTALL_PLUGINS
           value: grafana-kubernetes-app
      
 ---
apiVersion: v1
kind: Service
metadata:
   name: monitoring-grafana
   namespace: kube-system
spec:  
   ports:
   - port: 80
     targetPort: 3000
   type: LoadBalancer
   selector:
     k8s-app: grafana
  • 在grafana的pod中执行安装命令:
kubectl get pods -n 
kubectl exec -it  /bin/bash -n 
grafana-cli plugins install grafana-kubernetes-app
  • 装好插件后,需要在grafana中配置,才会生效
    在grafana页面,点击plugins
    点击kubernets-enable


    image.png

    配置集群访问地址以及访问证书:


    2018-12-12 Prometheus+grafana监控kubernetes 遇到的坑_第1张图片
    image.png
  • 集群访问证书,用几圈配置文件中的证书信息即可


    image.png

其中属性certificate-authority-data、client-certificate-data、client-key-data对应 CA 证书、Client 证书、Client 私钥, config 文件里面的内容是base64编码过后的,这里填写内容需要做base64解码(百度搜索base64解码),保存。

grafana dashboard中自动出现下图中的dashboard


2018-12-12 Prometheus+grafana监控kubernetes 遇到的坑_第2张图片
image.png
2018-12-12 Prometheus+grafana监控kubernetes 遇到的坑_第3张图片
image.png

从图中看到dashboard中都没有数据。

Edit 图表,修改监控的数据项


2018-12-12 Prometheus+grafana监控kubernetes 遇到的坑_第4张图片
image.png

prometheus alertmanager配置钉钉告警

Prometheus alertmanager支持告警发送到钉钉,但是需要部署prometheus-webhook-dingtalk

apiVersion: v1
kind: Service
metadata:
  name: prometheus-webhook-dingtalk
  namespace: kube-ops
  labels:
    app: prometheus-webhook-dingtalk
spec:
  type: ClusterIP
  selector:
    app: prometheus-webhook-dingtalk
  ports:
  - name: http
    port: 5358
    targetPort: 5358

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus-webhook-dingtalk
  namespace: kube-ops
spec:
  template:
    metadata:
      labels:
        app: prometheus-webhook-dingtalk
    spec:
      containers:
      - name: prometheus-webhook-dingtalk
        image: timonwong/prometheus-webhook-dingtalk:latest
        args:
          - '--web.listen-address=:5358'
          - '--ding.profile=webhook=https://oapi.dingtalk.com/robot/send?access_token=c83f3a98adf27544d6c1b01cbf30674cbb18c5de63784d62ccd3a42c2c06bb2c'
          - '--ding.timeout=5s'
          - '--log.level=info'
        ports:
        - containerPort: 5358

prometheus-alert-configuration 配置如下:

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: kube-ops
data:
  config.yml: |-
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'localhost:25'
      smtp_from: '[email protected]'
      smtp_auth_username: 'alertmanager'
      smtp_auth_password: 'password'
    route:
      receiver: webhook
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      group_by: [alertname]
      routes:
      - receiver: webhook
        group_wait: 10s
        match:
          team: node
    receivers:
    - name: webhook
      webhook_configs:
      - url: 'http://prometheus-webhook-dingtalk:5358/dingtalk/webhook/send'      
        send_resolved: true
      pagerduty_configs:
      - service_key: 84c023a8c96f4339aa9716dcd213f421

你可能感兴趣的:(2018-12-12 Prometheus+grafana监控kubernetes 遇到的坑)