书籍来源:cloudman《每天5分钟玩转Kubernetes》
一边学习一边整理老师的课程内容及试验笔记,并与大家分享,侵权即删,谢谢支持!
附上汇总贴:每天5分钟玩转Kubernetes | 汇总_COCOgsta的博客-CSDN博客
前面我们介绍了Kubernetes的两种监控方案,即Weave Scope和Heapster,它们主要的监控对象是Node和Pod。这些数据对Kubernetes运维人员是必需的,但还不够。我们通常还希望监控集群本身的运行状态,比如Kubernetes的API Server、Scheduler、Controller Manager等管理组件是否正常工作以及负荷是否过大等。
本节我们将学习监控方案Prometheus Operator,它能回答上面这些问题。
Prometheus Operator是CoreOS开发的基于Prometheus的Kubernetes监控方案,也可能是目前功能最全面的开源方案。我们先通过截图了解一下它能干什么。
Prometheus Operator通过Grafana展示监控数据,预定义了一系列的Dashboard,如图所示。
这些Dashboard展示了从集群到Pod的运行状况,能够帮助用户更好地运维Kubernetes,而且Prometheus Operator迭代非常快,相信会继续开发出更多更好的功能,所以值得我们花些时间学习和实践。
因为Prometheus Operator是基于Prometheus的,所以我们需要先了解一下Prometheus。
Prometheus是一个非常优秀的监控工具。准确地说应该是监控方案。Prometheus提供了数据搜集、存储、处理、可视化和告警一套完整的解决方案。Prometheus的架构如图所示。
官网上的原始架构图比上面这张要复杂一些,为了避免注意力分散,这里只保留了最重要的组件。
Prometheus Server负责从Exporter拉取和存储监控数据,并提供一套灵活的查询语言(PromQL)供用户使用。
Exporter负责收集目标对象(host、container等)的性能数据,并通过HTTP接口供Prometheus Server获取。
监控数据的可视化展现对于监控方案至关重要。以前Prometheus自己开发了一套工具,不过后来废弃了,因为开源社区出现了更为优秀的产品Grafana。Grafana能够与Prometheus无缝集成,提供完美的数据展示能力。
用户可以定义基于监控数据的告警规则,规则会触发告警。一旦Alermanager收到告警,就会通过预定义的方式发出告警通知,支持的方式包括Email、PagerDuty、Webhook等。
Prometheus Operator的目标是尽可能简化在Kubernetes中部署和维护Prometheus的工作。其架构如图所示。
图中的每一个对象都是Kubernetes中运行的资源。
Operator即Prometheus Operator,在Kubernetes中以Deployment运行。其职责是部署和管理Prometheus Server,根据ServiceMonitor动态更新Prometheus Server的监控对象。
Prometheus Server会作为Kubernetes应用部署到集群中。为了更好地在Kubernetes中管理Prometheus,CoreOS的开发人员专门定义了一个命名为Prometheus类型的Kubernetes定制化资源。我们可以把Prometheus看作一种特殊的Deployment,它的用途就是专门部署Prometheus Server。
这里的Service就是Cluster中的Service资源,也是Prometheus要监控的对象,在Prometheus中叫作Target。每个监控对象都有一个对应的Service。比如要监控Kubernetes Scheduler,就得有一个与Scheduler对应的Service。当然,Kubernetes集群默认是没有这个Service的,Prometheus Operator会负责创建。
Operator能够动态更新Prometheus的Target列表,ServiceMonitor就是Target的抽象。比如想监控Kubernetes Scheduler,用户可以创建一个与Scheduler Service相映射的ServiceMonitor对象。Operator则会发现这个新的ServiceMonitor,并将Scheduler的Target添加到Prometheus 的监控列表中。
ServiceMonitor也是Prometheus Operator专门开发的一种Kubernetes定制化资源类型。
除了Prometheus和ServiceMonitor,Alertmanager是Operator开发的第三种Kubernetes定制化资源。我们可以把Alertmanager看作一种特殊的Deployment,它的用途就是专门部署Alertmanager组件。
笔者在实践时使用的是Prometheus Operator最新版本v0.14.0。由于项目开发迭代速度很快,部署方法可能会更新,必要时请参考官方文档。
git clone https://github.com/coreos/prometheus-operator.git
cd prometheus-operator
为方便管理,创建一个单独的Namespace monitoring,Prometheus Operator相关的组件都会部署到这个Namespace。
kubectl create namespace monitoring
helm repo add aliyuncs https://apphub.aliyuncs.com
helm repo update
helm install --name prometheus-operator --set rbacEnable=true--namespace=monitoring helm/prometheus-operator
[root@k8s-master ~]# helm install --name prometheus-operator --set rbacEnable=true --namespace=monitoring aliyuncs/prometheus-operator
NAME: prometheus-operator
LAST DEPLOYED: Wed Jun 8 10:16:22 2022
NAMESPACE: monitoring
STATUS: DEPLOYED
RESOURCES:
==> v1/Alertmanager
NAME AGE
prometheus-operator-alertmanager 34s
==> v1/ClusterRole
NAME CREATED AT
prometheus-operator-grafana-clusterrole 2022-06-08T14:17:30Z
prometheus-operator-operator 2022-06-08T14:17:30Z
prometheus-operator-operator-psp 2022-06-08T14:17:30Z
prometheus-operator-prometheus 2022-06-08T14:17:30Z
prometheus-operator-prometheus-psp 2022-06-08T14:17:30Z
psp-prometheus-operator-kube-state-metrics 2022-06-08T14:17:30Z
psp-prometheus-operator-prometheus-node-exporter 2022-06-08T14:17:30Z
==> v1/ClusterRoleBinding
NAME ROLE AGE
prometheus-operator-grafana-clusterrolebinding ClusterRole/prometheus-operator-grafana-clusterrole 34s
prometheus-operator-operator ClusterRole/prometheus-operator-operator 34s
prometheus-operator-operator-psp ClusterRole/prometheus-operator-operator-psp 34s
prometheus-operator-prometheus ClusterRole/prometheus-operator-prometheus 34s
prometheus-operator-prometheus-psp ClusterRole/prometheus-operator-prometheus-psp 34s
psp-prometheus-operator-kube-state-metrics ClusterRole/psp-prometheus-operator-kube-state-metrics 34s
psp-prometheus-operator-prometheus-node-exporter ClusterRole/psp-prometheus-operator-prometheus-node-exporter 34s
==> v1/ConfigMap
NAME DATA AGE
prometheus-operator-apiserver 1 34s
prometheus-operator-cluster-total 1 34s
prometheus-operator-controller-manager 1 34s
prometheus-operator-etcd 1 34s
prometheus-operator-grafana 1 34s
prometheus-operator-grafana-config-dashboards 1 34s
prometheus-operator-grafana-datasource 1 34s
prometheus-operator-grafana-test 1 34s
prometheus-operator-k8s-coredns 1 34s
prometheus-operator-k8s-resources-cluster 1 34s
prometheus-operator-k8s-resources-namespace 1 34s
prometheus-operator-k8s-resources-node 1 34s
prometheus-operator-k8s-resources-pod 1 34s
prometheus-operator-k8s-resources-workload 1 34s
prometheus-operator-k8s-resources-workloads-namespace 1 34s
prometheus-operator-kubelet 1 34s
prometheus-operator-namespace-by-pod 1 34s
prometheus-operator-namespace-by-workload 1 34s
prometheus-operator-node-cluster-rsrc-use 1 34s
prometheus-operator-node-rsrc-use 1 34s
prometheus-operator-nodes 1 34s
prometheus-operator-persistentvolumesusage 1 34s
prometheus-operator-pod-total 1 34s
prometheus-operator-pods 1 34s
prometheus-operator-prometheus 1 34s
prometheus-operator-proxy 1 34s
prometheus-operator-scheduler 1 34s
prometheus-operator-statefulset 1 34s
prometheus-operator-workload-total 1 34s
==> v1/DaemonSet
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
prometheus-operator-prometheus-node-exporter 3 3 3 3 3 34s
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus-operator-grafana 0/1 1 0 34s
prometheus-operator-kube-state-metrics 0/1 1 0 34s
prometheus-operator-operator 0/1 1 0 34s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
prometheus-operator-grafana-8cbcdf6cc-s9hf5 0/2 Init:0/1 0 35s
prometheus-operator-kube-state-metrics-5855d94d57-2twd9 0/1 ContainerCreating 0 35s
prometheus-operator-operator-f85ccdb89-rsvt4 0/2 ContainerCreating 0 35s
prometheus-operator-prometheus-node-exporter-4sd9z 1/1 Running 0 34s
prometheus-operator-prometheus-node-exporter-vtlzn 1/1 Running 0 34s
prometheus-operator-prometheus-node-exporter-xz5v8 1/1 Running 0 35s
==> v1/Prometheus
NAME AGE
prometheus-operator-prometheus 33s
==> v1/PrometheusRule
NAME AGE
prometheus-operator-alertmanager.rules 32s
prometheus-operator-etcd 32s
prometheus-operator-general.rules 32s
prometheus-operator-k8s.rules 32s
prometheus-operator-kube-apiserver-error 32s
prometheus-operator-kube-apiserver.rules 32s
prometheus-operator-kube-prometheus-node-recording.rules 32s
prometheus-operator-kube-scheduler.rules 32s
prometheus-operator-kubernetes-absent 32s
prometheus-operator-kubernetes-apps 32s
prometheus-operator-kubernetes-resources 32s
prometheus-operator-kubernetes-storage 32s
prometheus-operator-kubernetes-system 32s
prometheus-operator-kubernetes-system-apiserver 32s
prometheus-operator-kubernetes-system-controller-manager 32s
prometheus-operator-kubernetes-system-kubelet 32s
prometheus-operator-kubernetes-system-scheduler 32s
prometheus-operator-node-exporter 32s
prometheus-operator-node-exporter.rules 32s
prometheus-operator-node-network 32s
prometheus-operator-node-time 32s
prometheus-operator-node.rules 32s
prometheus-operator-prometheus 32s
prometheus-operator-prometheus-operator 32s
==> v1/Role
NAME CREATED AT
prometheus-operator-alertmanager 2022-06-08T14:17:30Z
prometheus-operator-grafana-test 2022-06-08T14:17:30Z
==> v1/RoleBinding
NAME ROLE AGE
prometheus-operator-alertmanager Role/prometheus-operator-alertmanager 34s
prometheus-operator-grafana-test Role/prometheus-operator-grafana-test 34s
==> v1/Secret
NAME TYPE DATA AGE
alertmanager-prometheus-operator-alertmanager Opaque 1 34s
prometheus-operator-grafana Opaque 3 34s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-operator-alertmanager ClusterIP 10.100.203.250 9093/TCP 34s
prometheus-operator-coredns ClusterIP None 9153/TCP 34s
prometheus-operator-grafana ClusterIP 10.108.97.140 80/TCP 34s
prometheus-operator-kube-controller-manager ClusterIP None 10252/TCP 34s
prometheus-operator-kube-etcd ClusterIP None 2379/TCP 34s
prometheus-operator-kube-proxy ClusterIP None 10249/TCP 34s
prometheus-operator-kube-scheduler ClusterIP None 10251/TCP 34s
prometheus-operator-kube-state-metrics ClusterIP 10.110.236.133 8080/TCP 34s
prometheus-operator-operator ClusterIP 10.103.154.69 8080/TCP,443/TCP 34s
prometheus-operator-prometheus ClusterIP 10.108.215.233 9090/TCP 34s
prometheus-operator-prometheus-node-exporter ClusterIP 10.100.30.192 9100/TCP 34s
==> v1/ServiceAccount
NAME SECRETS AGE
prometheus-operator-alertmanager 1 34s
prometheus-operator-grafana 1 34s
prometheus-operator-grafana-test 1 34s
prometheus-operator-kube-state-metrics 1 34s
prometheus-operator-operator 1 34s
prometheus-operator-prometheus 1 34s
prometheus-operator-prometheus-node-exporter 1 34s
==> v1/ServiceMonitor
NAME AGE
prometheus-operator-alertmanager 32s
prometheus-operator-apiserver 32s
prometheus-operator-coredns 32s
prometheus-operator-grafana 32s
prometheus-operator-kube-controller-manager 32s
prometheus-operator-kube-etcd 32s
prometheus-operator-kube-proxy 32s
prometheus-operator-kube-scheduler 32s
prometheus-operator-kube-state-metrics 32s
prometheus-operator-kubelet 32s
prometheus-operator-node-exporter 32s
prometheus-operator-operator 32s
prometheus-operator-prometheus 32s
==> v1beta1/ClusterRole
NAME CREATED AT
prometheus-operator-kube-state-metrics 2022-06-08T14:17:30Z
==> v1beta1/ClusterRoleBinding
NAME ROLE AGE
prometheus-operator-kube-state-metrics ClusterRole/prometheus-operator-kube-state-metrics 34s
==> v1beta1/MutatingWebhookConfiguration
NAME WEBHOOKS AGE
prometheus-operator-admission 1 33s
==> v1beta1/PodSecurityPolicy
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
prometheus-operator-alertmanager false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-grafana false RunAsAny RunAsAny RunAsAny RunAsAny false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-grafana-test false RunAsAny RunAsAny RunAsAny RunAsAny false configMap,downwardAPI,emptyDir,projected,secret
prometheus-operator-kube-state-metrics false RunAsAny MustRunAsNonRoot MustRunAs MustRunAs false secret
prometheus-operator-operator false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-prometheus false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
prometheus-operator-prometheus-node-exporter false RunAsAny RunAsAny MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim,hostPath
==> v1beta1/Role
NAME CREATED AT
prometheus-operator-grafana 2022-06-08T14:17:30Z
==> v1beta1/RoleBinding
NAME ROLE AGE
prometheus-operator-grafana Role/prometheus-operator-grafana 34s
==> v1beta1/ValidatingWebhookConfiguration
NAME WEBHOOKS AGE
prometheus-operator-admission 1 32s
NOTES:
The Prometheus Operator has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l "release=prometheus-operator"
Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.
[root@k8s-master ~]#
Prometheus Operator所有的组件都打包成Helm Chart,安装部署非常方便,如图所示。如果对Helm不熟悉,可以参考前面相关的章节。
helm install --name prometheus --set serviceMonitorsSelector.app=prometheus --set ruleSelector.app=prometheus --namespace=monitoring aliyuncs/prometheus
helm install --name alertmanager --namespace=monitoring aliyuncs/alertmanager # 无法下载
helm install --name grafana --namespace=monitoring aliyuncs/grafana
可以通过kubectl get prometheus查看Prometheus类型的资源,如图所示。
为了方便访问Prometheus Server,这里已经将Service类型通过kubectl edit改为NodePort。
同样可以查看Alertmanager和Grafana的相关资源,如图所示。(因前面无法安装Alertmanager,此处仅查看Grafana)
Service类型也都已经改为NodePort。
kube-prometheus是一个Helm Chart,打包了监控Kubernetes需要的所有Exporter和ServiceMonitor。
helm install --name kube-prometheus --namespace=monitoring helm/kube-prometheus
每个Exporter会对应一个Service,为Pormetheus提供Kubernetes集群的各类监控数据,如图所示。
每个Service对应一个ServiceMonitor,组成Pormetheus的Target列表,如图所示。
与Prometheus Operator相关的所有Pod如图所示。
我们注意到有些Exporter没有运行Pod,这是因为像API Server、Scheduler、Kubelet等Kubernetes内部组件原生就支持Prometheus,只需要定义Service就能直接从预定义端口获取监控数据。
通过浏览器打开Pormetheus的Web UI(
http://192.168.56.105:30413/targets),如图所示。
可以看到所有Target的状态都是UP。
Prometheus Operator提供了默认的Alertmanager告警规则,通过如下命令安装。
sed -ie 's/role: prometheus-rulefiles/app: prometheus/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
sed -ie 's/prometheus: k8s/prometheus: prometheus/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
sed -ie 's/job=\"kube-controller-manager/job=\"kube-prometheus-exporter-kube-controller-manager/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
sed -ie 's/job=\"apiserver/job=\"kube-prometheus-exporter-kube-api/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
sed -ie 's/job=\"kube-scheduler/job=\"kube-prometheus-exporter-kube-scheduler/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
sed -ie 's/job=\"node-exporter/job=\"kube-prometheus-exporter-node/g' contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
kubectl apply -n monitoring -f contrib/kube-prometheus/manifests/prometheus/prometheus-k8s-rules.yaml
Prometheus Operator定义了显示监控数据的默认Dashboard,通过如下命令安装。
sed -ie 's/grafana-dashboards-0/grafana-grafana/g' contrib/kube-prometheus/manifests/grafana/grafana-dashboards.yaml
sed -ie 's/prometheus-k8s.monitoring/prometheus-prometheus.monitoring/g' contrib/kube-prometheus/manifests/grafana/grafana-dashboards.yaml
kubectl apply -n monitoring -f contrib/kube-prometheus/manifests/grafana/grafana-dashboards.yaml
打开Grafana的Web UI(
http://192.168.56.105:32342/),如图所示。
Grafana的DataSource和Dashboard已自动配置,单击Home就可以使用我们在最开始讨论过的那些Dashboard了,如图所示。