每天5分钟玩转Kubernetes | Prometheus Operator

书籍来源:cloudman《每天5分钟玩转Kubernetes》

一边学习一边整理老师的课程内容及试验笔记,并与大家分享,侵权即删,谢谢支持!

附上汇总贴:每天5分钟玩转Kubernetes | 汇总_COCOgsta的博客-CSDN博客


前面我们介绍了Kubernetes的两种监控方案,即Weave Scope和Heapster,它们主要的监控对象是Node和Pod。这些数据对Kubernetes运维人员是必需的,但还不够。我们通常还希望监控集群本身的运行状态,比如Kubernetes的API Server、Scheduler、Controller Manager等管理组件是否正常工作以及负荷是否过大等。

本节我们将学习监控方案Prometheus Operator,它能回答上面这些问题。

Prometheus Operator是CoreOS开发的基于Prometheus的Kubernetes监控方案,也可能是目前功能最全面的开源方案。我们先通过截图了解一下它能干什么。

Prometheus Operator通过Grafana展示监控数据,预定义了一系列的Dashboard,如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第1张图片

  • Kubernetes集群的整体健康状态如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第2张图片

  • 整个集群的资源使用情况如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第3张图片

每天5分钟玩转Kubernetes | Prometheus Operator_第4张图片

  • Kubernetes各个管理组件的状态如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第5张图片

每天5分钟玩转Kubernetes | Prometheus Operator_第6张图片

  • 节点的资源使用情况如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第7张图片

每天5分钟玩转Kubernetes | Prometheus Operator_第8张图片

  • Deployment的运行状态如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第9张图片

  • Pod的运行状态如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第10张图片

这些Dashboard展示了从集群到Pod的运行状况,能够帮助用户更好地运维Kubernetes,而且Prometheus Operator迭代非常快,相信会继续开发出更多更好的功能,所以值得我们花些时间学习和实践。

14.3.1 Prometheus架构

因为Prometheus Operator是基于Prometheus的,所以我们需要先了解一下Prometheus。

Prometheus是一个非常优秀的监控工具。准确地说应该是监控方案。Prometheus提供了数据搜集、存储、处理、可视化和告警一套完整的解决方案。Prometheus的架构如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第11张图片

官网上的原始架构图比上面这张要复杂一些,为了避免注意力分散,这里只保留了最重要的组件。

  1. Prometheus Server

Prometheus Server负责从Exporter拉取和存储监控数据,并提供一套灵活的查询语言(PromQL)供用户使用。

  1. Exporter

Exporter负责收集目标对象(host、container等)的性能数据,并通过HTTP接口供Prometheus Server获取。

  1. 可视化组件

监控数据的可视化展现对于监控方案至关重要。以前Prometheus自己开发了一套工具,不过后来废弃了,因为开源社区出现了更为优秀的产品Grafana。Grafana能够与Prometheus无缝集成,提供完美的数据展示能力。

  1. Alertmanager

用户可以定义基于监控数据的告警规则,规则会触发告警。一旦Alermanager收到告警,就会通过预定义的方式发出告警通知,支持的方式包括Email、PagerDuty、Webhook等。

14.3.2 Prometheus Operator架构

Prometheus Operator的目标是尽可能简化在Kubernetes中部署和维护Prometheus的工作。其架构如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第12张图片

图中的每一个对象都是Kubernetes中运行的资源。

  1. Operator

Operator即Prometheus Operator,在Kubernetes中以Deployment运行。其职责是部署和管理Prometheus Server,根据ServiceMonitor动态更新Prometheus Server的监控对象。

  1. Prometheus Server

Prometheus Server会作为Kubernetes应用部署到集群中。为了更好地在Kubernetes中管理Prometheus,CoreOS的开发人员专门定义了一个命名为Prometheus类型的Kubernetes定制化资源。我们可以把Prometheus看作一种特殊的Deployment,它的用途就是专门部署Prometheus Server。

  1. Service

这里的Service就是Cluster中的Service资源,也是Prometheus要监控的对象,在Prometheus中叫作Target。每个监控对象都有一个对应的Service。比如要监控Kubernetes Scheduler,就得有一个与Scheduler对应的Service。当然,Kubernetes集群默认是没有这个Service的,Prometheus Operator会负责创建。

  1. ServiceMonitor

Operator能够动态更新Prometheus的Target列表,ServiceMonitor就是Target的抽象。比如想监控Kubernetes Scheduler,用户可以创建一个与Scheduler Service相映射的ServiceMonitor对象。Operator则会发现这个新的ServiceMonitor,并将Scheduler的Target添加到Prometheus 的监控列表中。

ServiceMonitor也是Prometheus Operator专门开发的一种Kubernetes定制化资源类型。

  1. AlertManager

除了Prometheus和ServiceMonitor,Alertmanager是Operator开发的第三种Kubernetes定制化资源。我们可以把Alertmanager看作一种特殊的Deployment,它的用途就是专门部署Alertmanager组件。

14.3.3 部署Prometheus Operator

笔者在实践时使用的是Prometheus Operator最新版本v0.14.0。由于项目开发迭代速度很快,部署方法可能会更新,必要时请参考官方文档。

  1. 下载最新源码
git clone https://github.com/coreos/prometheus-operator.git 
cd prometheus-operator 

为方便管理,创建一个单独的Namespace monitoring,Prometheus Operator相关的组件都会部署到这个Namespace。

kubectl create namespace monitoring
  1. 安装Prometheus Operator Deployment
helm repo add aliyuncs https://apphub.aliyuncs.com
helm repo update
helm install --name prometheus-operator --set rbacEnable=true--namespace=monitoring helm/prometheus-operator 
[root@k8s-master ~]# helm install --name prometheus-operator --set rbacEnable=true --namespace=monitoring aliyuncs/prometheus-operator
NAME:   prometheus-operator
LAST DEPLOYED: Wed Jun  8 10:16:22 2022
NAMESPACE: monitoring
STATUS: DEPLOYED

RESOURCES:
==> v1/Alertmanager
NAME                              AGE
prometheus-operator-alertmanager  34s

==> v1/ClusterRole
NAME                                              CREATED AT
prometheus-operator-grafana-clusterrole           2022-06-08T14:17:30Z
prometheus-operator-operator                      2022-06-08T14:17:30Z
prometheus-operator-operator-psp                  2022-06-08T14:17:30Z
prometheus-operator-prometheus                    2022-06-08T14:17:30Z
prometheus-operator-prometheus-psp                2022-06-08T14:17:30Z
psp-prometheus-operator-kube-state-metrics        2022-06-08T14:17:30Z
psp-prometheus-operator-prometheus-node-exporter  2022-06-08T14:17:30Z

==> v1/ClusterRoleBinding
NAME                                              ROLE                                                          AGE
prometheus-operator-grafana-clusterrolebinding    ClusterRole/prometheus-operator-grafana-clusterrole           34s
prometheus-operator-operator                      ClusterRole/prometheus-operator-operator                      34s
prometheus-operator-operator-psp                  ClusterRole/prometheus-operator-operator-psp                  34s
prometheus-operator-prometheus                    ClusterRole/prometheus-operator-prometheus                    34s
prometheus-operator-prometheus-psp                ClusterRole/prometheus-operator-prometheus-psp                34s
psp-prometheus-operator-kube-state-metrics        ClusterRole/psp-prometheus-operator-kube-state-metrics        34s
psp-prometheus-operator-prometheus-node-exporter  ClusterRole/psp-prometheus-operator-prometheus-node-exporter  34s

==> v1/ConfigMap
NAME                                                   DATA  AGE
prometheus-operator-apiserver                          1     34s
prometheus-operator-cluster-total                      1     34s
prometheus-operator-controller-manager                 1     34s
prometheus-operator-etcd                               1     34s
prometheus-operator-grafana                            1     34s
prometheus-operator-grafana-config-dashboards          1     34s
prometheus-operator-grafana-datasource                 1     34s
prometheus-operator-grafana-test                       1     34s
prometheus-operator-k8s-coredns                        1     34s
prometheus-operator-k8s-resources-cluster              1     34s
prometheus-operator-k8s-resources-namespace            1     34s
prometheus-operator-k8s-resources-node                 1     34s
prometheus-operator-k8s-resources-pod                  1     34s
prometheus-operator-k8s-resources-workload             1     34s
prometheus-operator-k8s-resources-workloads-namespace  1     34s
prometheus-operator-kubelet                            1     34s
prometheus-operator-namespace-by-pod                   1     34s
prometheus-operator-namespace-by-workload              1     34s
prometheus-operator-node-cluster-rsrc-use              1     34s
prometheus-operator-node-rsrc-use                      1     34s
prometheus-operator-nodes                              1     34s
prometheus-operator-persistentvolumesusage             1     34s
prometheus-operator-pod-total                          1     34s
prometheus-operator-pods                               1     34s
prometheus-operator-prometheus                         1     34s
prometheus-operator-proxy                              1     34s
prometheus-operator-scheduler                          1     34s
prometheus-operator-statefulset                        1     34s
prometheus-operator-workload-total                     1     34s

==> v1/DaemonSet
NAME                                          DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
prometheus-operator-prometheus-node-exporter  3        3        3      3           3                   34s

==> v1/Deployment
NAME                                    READY  UP-TO-DATE  AVAILABLE  AGE
prometheus-operator-grafana             0/1    1           0          34s
prometheus-operator-kube-state-metrics  0/1    1           0          34s
prometheus-operator-operator            0/1    1           0          34s

==> v1/Pod(related)
NAME                                                     READY  STATUS             RESTARTS  AGE
prometheus-operator-grafana-8cbcdf6cc-s9hf5              0/2    Init:0/1           0         35s
prometheus-operator-kube-state-metrics-5855d94d57-2twd9  0/1    ContainerCreating  0         35s
prometheus-operator-operator-f85ccdb89-rsvt4             0/2    ContainerCreating  0         35s
prometheus-operator-prometheus-node-exporter-4sd9z       1/1    Running            0         34s
prometheus-operator-prometheus-node-exporter-vtlzn       1/1    Running            0         34s
prometheus-operator-prometheus-node-exporter-xz5v8       1/1    Running            0         35s

==> v1/Prometheus
NAME                            AGE
prometheus-operator-prometheus  33s

==> v1/PrometheusRule
NAME                                                      AGE
prometheus-operator-alertmanager.rules                    32s
prometheus-operator-etcd                                  32s
prometheus-operator-general.rules                         32s
prometheus-operator-k8s.rules                             32s
prometheus-operator-kube-apiserver-error                  32s
prometheus-operator-kube-apiserver.rules                  32s
prometheus-operator-kube-prometheus-node-recording.rules  32s
prometheus-operator-kube-scheduler.rules                  32s
prometheus-operator-kubernetes-absent                     32s
prometheus-operator-kubernetes-apps                       32s
prometheus-operator-kubernetes-resources                  32s
prometheus-operator-kubernetes-storage                    32s
prometheus-operator-kubernetes-system                     32s
prometheus-operator-kubernetes-system-apiserver           32s
prometheus-operator-kubernetes-system-controller-manager  32s
prometheus-operator-kubernetes-system-kubelet             32s
prometheus-operator-kubernetes-system-scheduler           32s
prometheus-operator-node-exporter                         32s
prometheus-operator-node-exporter.rules                   32s
prometheus-operator-node-network                          32s
prometheus-operator-node-time                             32s
prometheus-operator-node.rules                            32s
prometheus-operator-prometheus                            32s
prometheus-operator-prometheus-operator                   32s

==> v1/Role
NAME                              CREATED AT
prometheus-operator-alertmanager  2022-06-08T14:17:30Z
prometheus-operator-grafana-test  2022-06-08T14:17:30Z

==> v1/RoleBinding
NAME                              ROLE                                   AGE
prometheus-operator-alertmanager  Role/prometheus-operator-alertmanager  34s
prometheus-operator-grafana-test  Role/prometheus-operator-grafana-test  34s

==> v1/Secret
NAME                                           TYPE    DATA  AGE
alertmanager-prometheus-operator-alertmanager  Opaque  1     34s
prometheus-operator-grafana                    Opaque  3     34s

==> v1/Service
NAME                                          TYPE       CLUSTER-IP      EXTERNAL-IP  PORT(S)           AGE
prometheus-operator-alertmanager              ClusterIP  10.100.203.250         9093/TCP          34s
prometheus-operator-coredns                   ClusterIP  None                   9153/TCP          34s
prometheus-operator-grafana                   ClusterIP  10.108.97.140          80/TCP            34s
prometheus-operator-kube-controller-manager   ClusterIP  None                   10252/TCP         34s
prometheus-operator-kube-etcd                 ClusterIP  None                   2379/TCP          34s
prometheus-operator-kube-proxy                ClusterIP  None                   10249/TCP         34s
prometheus-operator-kube-scheduler            ClusterIP  None                   10251/TCP         34s
prometheus-operator-kube-state-metrics        ClusterIP  10.110.236.133         8080/TCP          34s
prometheus-operator-operator                  ClusterIP  10.103.154.69          8080/TCP,443/TCP  34s
prometheus-operator-prometheus                ClusterIP  10.108.215.233         9090/TCP          34s
prometheus-operator-prometheus-node-exporter  ClusterIP  10.100.30.192          9100/TCP          34s

==> v1/ServiceAccount
NAME                                          SECRETS  AGE
prometheus-operator-alertmanager              1        34s
prometheus-operator-grafana                   1        34s
prometheus-operator-grafana-test              1        34s
prometheus-operator-kube-state-metrics        1        34s
prometheus-operator-operator                  1        34s
prometheus-operator-prometheus                1        34s
prometheus-operator-prometheus-node-exporter  1        34s

==> v1/ServiceMonitor
NAME                                         AGE
prometheus-operator-alertmanager             32s
prometheus-operator-apiserver                32s
prometheus-operator-coredns                  32s
prometheus-operator-grafana                  32s
prometheus-operator-kube-controller-manager  32s
prometheus-operator-kube-etcd                32s
prometheus-operator-kube-proxy               32s
prometheus-operator-kube-scheduler           32s
prometheus-operator-kube-state-metrics       32s
prometheus-operator-kubelet                  32s
prometheus-operator-node-exporter            32s
prometheus-operator-operator                 32s
prometheus-operator-prometheus               32s

==> v1beta1/ClusterRole
NAME                                    CREATED AT
prometheus-operator-kube-state-metrics  2022-06-08T14:17:30Z

==> v1beta1/ClusterRoleBinding
NAME                                    ROLE                                                AGE
prometheus-operator-kube-state-metrics  ClusterRole/prometheus-operator-kube-state-metrics  34s

==> v1beta1/MutatingWebhookConfiguration
NAME                           WEBHOOKS  AGE
prometheus-operator-admission  1         33s

==> v1beta1/PodSecurityPolicy
NAME                                          PRIV   CAPS      SELINUX           RUNASUSER  FSGROUP    SUPGROUP  READONLYROOTFS  VOLUMES
prometheus-operator-alertmanager              false  RunAsAny  RunAsAny          MustRunAs  MustRunAs  false     configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-grafana                   false  RunAsAny  RunAsAny          RunAsAny   RunAsAny   false     configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-grafana-test              false  RunAsAny  RunAsAny          RunAsAny   RunAsAny   false     configMap,downwardAPI,emptyDir,projected,secret
prometheus-operator-kube-state-metrics        false  RunAsAny  MustRunAsNonRoot  MustRunAs  MustRunAs  false     secret
prometheus-operator-operator                  false  RunAsAny  RunAsAny          MustRunAs  MustRunAs  false     configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-prometheus                false  RunAsAny  RunAsAny          MustRunAs  MustRunAs  false     configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-prometheus-node-exporter  false  RunAsAny  RunAsAny          MustRunAs  MustRunAs  false     configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim,hostPath

==> v1beta1/Role
NAME                         CREATED AT
prometheus-operator-grafana  2022-06-08T14:17:30Z

==> v1beta1/RoleBinding
NAME                         ROLE                              AGE
prometheus-operator-grafana  Role/prometheus-operator-grafana  34s

==> v1beta1/ValidatingWebhookConfiguration
NAME                           WEBHOOKS  AGE
prometheus-operator-admission  1         32s


NOTES:
The Prometheus Operator has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=prometheus-operator"

Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.
[root@k8s-master ~]# 

Prometheus Operator所有的组件都打包成Helm Chart,安装部署非常方便,如图所示。如果对Helm不熟悉,可以参考前面相关的章节。

  1. 安装Prometheus、AlertManager和Grafana
helm install --name prometheus --set serviceMonitorsSelector.app=prometheus --set ruleSelector.app=prometheus --namespace=monitoring aliyuncs/prometheus 
helm install --name alertmanager --namespace=monitoring aliyuncs/alertmanager # 无法下载
helm install --name grafana --namespace=monitoring aliyuncs/grafana

可以通过kubectl get prometheus查看Prometheus类型的资源,如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第13张图片

为了方便访问Prometheus Server,这里已经将Service类型通过kubectl edit改为NodePort。

同样可以查看Alertmanager和Grafana的相关资源,如图所示。(因前面无法安装Alertmanager,此处仅查看Grafana)

每天5分钟玩转Kubernetes | Prometheus Operator_第14张图片

Service类型也都已经改为NodePort。

  1. 安装kube-prometheus(无法下载)

kube-prometheus是一个Helm Chart,打包了监控Kubernetes需要的所有Exporter和ServiceMonitor。

helm install --name kube-prometheus --namespace=monitoring helm/kube-prometheus 

每个Exporter会对应一个Service,为Pormetheus提供Kubernetes集群的各类监控数据,如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第15张图片

每个Service对应一个ServiceMonitor,组成Pormetheus的Target列表,如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第16张图片

与Prometheus Operator相关的所有Pod如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第17张图片

我们注意到有些Exporter没有运行Pod,这是因为像API Server、Scheduler、Kubelet等Kubernetes内部组件原生就支持Prometheus,只需要定义Service就能直接从预定义端口获取监控数据。

通过浏览器打开Pormetheus的Web UI(
http://192.168.56.105:30413/targets),如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第18张图片

可以看到所有Target的状态都是UP。

  1. 安装Alert规则

Prometheus Operator提供了默认的Alertmanager告警规则,通过如下命令安装。

sed -ie 's/role: prometheus-rulefiles/app: prometheus/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml 
sed -ie 's/prometheus: k8s/prometheus: prometheus/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml 
sed -ie 's/job=\"kube-controller-manager/job=\"kube-prometheus-exporter-kube-controller-manager/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml 
sed -ie 's/job=\"apiserver/job=\"kube-prometheus-exporter-kube-api/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml 
sed -ie 's/job=\"kube-scheduler/job=\"kube-prometheus-exporter-kube-scheduler/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml 
sed -ie 's/job=\"node-exporter/job=\"kube-prometheus-exporter-node/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml 
kubectl apply -n monitoring -f contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
  1. 安装Grafana Dashboard

Prometheus Operator定义了显示监控数据的默认Dashboard,通过如下命令安装。

sed -ie 's/grafana-dashboards-0/grafana-grafana/g' contrib/kube-prometheus/manifests/grafana/grafana-dashboards.yaml 
sed -ie 's/prometheus-k8s.monitoring/prometheus-prometheus.monitoring/g' contrib/kube-prometheus/manifests/grafana/grafana-dashboards.yaml 
kubectl apply -n monitoring -f contrib/kube-prometheus/manifests/grafana/grafana-dashboards.yaml 

打开Grafana的Web UI(
http://192.168.56.105:32342/),如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第19张图片

Grafana的DataSource和Dashboard已自动配置,单击Home就可以使用我们在最开始讨论过的那些Dashboard了,如图所示。

每天5分钟玩转Kubernetes | Prometheus Operator_第20张图片

你可能感兴趣的:(读书笔记,kubernetes,docker,运维)