prometheus监控k8s集群系列之kube-state-metrics

前言

前面我们已经介绍了通过cadvisor和node-exporter来监控k8s集群容器和主机资源,今天向大家介绍一下kube-state-metrics对k8s集群的监控,那它主要是监控哪些内容的呢?我们先看一下官方的介绍

kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. 
(See examples in the Metrics section below.) It is not focused on the health of the individual Kubernetes components, but rather on 
the health of the various objects inside, such as deployments, nodes and pods.

从官方介绍中,我们可以知道,kube-state-metrics并不聚焦于k8s集群整体及相关组件的监控,而是更加关注deployments, nodes and pods等内部对象的状态。

例如:

  • 现在计划了多少个副本?目前有多少可用?
  • 有多少个 Pod 正在运行、已停止、已终止?
  • 此 Pod 已重启多少次?

指标类型概览:

  • CertificateSigningRequest Metrics
  • ConfigMap Metrics
  • CronJob Metrics
  • DaemonSet Metrics
  • Deployment Metrics
  • Endpoint Metrics
  • Horizontal Pod Autoscaler Metrics
  • Ingress Metrics
  • Job Metrics
  • Lease Metrics
  • LimitRange Metrics
  • MutatingWebhookConfiguration Metrics
  • Namespace Metrics
  • NetworkPolicy Metrics
  • Node Metrics
  • PersistentVolume Metrics
  • PersistentVolumeClaim Metrics
  • Pod Disruption Budget Metrics
  • Pod Metrics
  • ReplicaSet Metrics
  • ReplicationController Metrics
  • ResourceQuota Metrics
  • Secret Metrics
  • Service Metrics
  • StatefulSet Metrics
  • StorageClass Metrics
  • ValidatingWebhookConfiguration Metrics
  • VerticalPodAutoscaler Metrics
  • VolumeAttachment Metrics

部署

  • 官方部署文档
https://github.com/kubernetes/kube-state-metrics/tree/master/examples/standard
  • cluster-role-binding.yaml
  • cluster-role.yaml
  • deployment.yaml
  • service-account.yaml
  • service.yaml
# 在service中配置被prometheus抓取
metadata:
  annotations:
    prometheus.io/scrape: 'true'

主要做了账号赋权和服务发现

# 执行命令查看部署情况
kubectl apply -f .

数据抓取

我们可以通过endpoints服务发现方式去发现kube-state-metrics暴露的数据抓取地址

 - job_name: 'kubernetes-service-endpoints'
      scrape_timeout: 30s      
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
      - source_labels: [__meta_kubernetes_pod_container_port_number]
        action: replace
        target_label: container_port

将以上配置添加到configmap prometheus-config中,详见“k8s环境下搭建prometheus”一文,更新configmap和pod

## 更新configmap
kubectl create configmap prometheus-config --from-file=prometheus.yaml -n monitoring -o yaml --dry-run | kubectl replace -f -

## 更新pod
kubectl apply -f prometheus-deploy.yaml

## 热更新配置
curl -XPOST http://localhost:30090/-/reload

页面查看prometheus,就可以看到相应的metrics

常用查询

kube_node_status_condition{condition="Ready",status="true"}

time() - kube_cronjob_next_schedule_time

kube_job_status_failed

kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled

kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled
 
kube_pod_container_status_restarts_total
 
kube_pod_container_status_terminated_reason
  
kube_pod_container_status_waiting_reason
  

欢迎关注我的个人微信公众号,一个菜鸟程序猿的技术分享和奔溃日常

一个菜鸟程序猿的技术技术分享和奔溃日常

你可能感兴趣的:(prometheus,k8s,运维,kubernetes,docker,devops)