Prometheus 是由前 Google
工程师从 2012 年开始在 Soundcloud 以开源软件的形式进行研发的系统监控和告警工具包,自此以后,许多公司和组织都采用了 Prometheus
作为监控告警工具。Prometheus
的开发者和用户社区非常活跃,它现在是一个独立的开源项目,可以独立于任何公司进行维护。为了证明这一点,Prometheus 于 2016 年 5 月加入 CNCF 基金会,成为继 Kubernetes 之后的第二个 CNCF 托管项目。
组件说明:
Prometheus Server
: 用于收集和存储时间序列数据;Client Library
: 客户端库,为需要监控的服务生成相应的 metrics 并暴露给 Prometheus server;PushGateway
: pushgateway是采用被动推送来获取监控数据的 prometheus 插件,它可以单独运行在任何节点上,并不一定要运行在被监控的客户端。而后通过用户自定义编写的脚本把需要监控的数据发送给 pushgateway,pushgateway再将数据推送给prometheus server;Exporters
: 负责监控机器运行状态,提供被监控组件信息的 HTTP 接口被叫做 exporter;Alertmanager
: 从 Prometheus server 端接收到 alerts 后,会进行去除重复数据,分组,并路由到对收的接受方式,发出报警;Grafana
:一个监控仪表系统,它是由 Grafana Labs 公司开源的的一个系统监测工具,它可以大大帮助我们简化监控的复杂度,我们只需要提供需要监控的数据,它就可以帮助生成各种可视化仪表,同时它还有报警功能,可以在系统出现问题时发出通知。Kubernetes Operator 是由 CoreOS 公司开发的,用来扩展 Kubernetes API,特定的应用程序控制器,它用来创建、配置和管理复杂的有状态应用,如数据库、缓存和监控系统。Operator基于 Kubernetes 的资源和控制器概念之上构建,但同时又包含了应用程序特定的一些专业知识,比如创建一个数据库的Operator,则必须对创建的数据库的各种运维方式非常了解,创建Operator的关键是CRD(自定义资源)的设计。
Kubernetes CRD 是对 Kubernetes API 的扩展,Kubernetes 中的每个资源都是一个 API 对象的集合,例如我们在 YAML文件里定义的那些spec都是对 Kubernetes 中的资源对象的定义,所有的自定义资源可以跟 Kubernetes 中内建的资源一样使用 kubectl 操作。
Prometheus Operator 提供 Kubernetes 原生部署和管理Prometheus
及相关监控组件。该项目的目的是为 Kubernetes 集群简化和自动化基于 Prometheus 的监控堆栈的配置。
Prometheus Operator 特性:
这篇文章我将在kind部署的kubernets集群内通过 helm
工具安装定制化的 Prometheus-Operator
。
curl -Lo ./kind https://github.com/kubernetes-sigs/kind/releases/download/v0.17.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
或者
$ wget https://get.helm.sh/helm-v3.11.0-linux-amd64.tar.gz
$ tar -xzvf helm-v3.11.0-linux-amd64.tar.gz
$ cp linux-amd64/helm /usr/local/bin/
$ helm version
version.BuildInfo{Version:"v3.11.0", GitCommit:"d14138609b01886f544b2025f5000351c9eb092e", GitTreeState:"clean", GoVersion:"go1.17.5"}
config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
创建 K8s 集群:
$ kind create cluster --config config.yaml
enabling experimental podman provider
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.25.3)
✓ Preparing nodes
✓ Writing configuration
✓ Starting control-plane ️
✓ Installing CNI
✓ Installing StorageClass
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a nice day!
kubectl create -f https://ghproxy.com/https://raw.githubusercontent.com/Ghostwritten/resource/main/ingress-nginx/ingress-nginx.yaml
系统资源的采集均使用Metrics-Server服务,可以通过Metrics-Server服务采集节点和Pod的内存、磁盘、CPU和网络的使用率等信息。Metric Server组件是实现服务自动扩容不可或缺的组件。
kubectl apply -f https://ghproxy.com/https://raw.githubusercontent.com/Ghostwritten/resource/main/metrics/metrics.yaml
等待 Metric 工作负载就绪
kubectl wait deployment -n kube-system metrics-server --for condition=Available=True --timeout=90s
Metric Server 就绪后,我们通过 kubectl autoscale
命令来为 Deployment 创建自动扩容策略。但这篇不是重点。
默认安装方式:
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm upgrade prometheus prometheus-community/kube-prometheus-stack \
--namespace prometheus --create-namespace --install \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
Release "prometheus" does not exist. Installing it now.
......
STATUS: deployed
但我本地的 linux 环境会有没有“科学网络”,所以部署实际会卡住很长一段时间,最后报:Error: failed pre-install: timed out waiting for the condition
,查看 pod 报以下错误:
$ k get pods -n prometheus
NAME READY STATUS RESTARTS AGE
prometheus-kube-prometheus-admission-create-7494p 0/1 ImagePullBackOff 0 6m20s
$ k get pods -n prometheus prometheus-kube-prometheus-admission-create-7494p -oyaml
.......
state:
waiting:
message: Back-off pulling image "registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0"
reason: ImagePullBackOff
........
没有真正的全球互联网,我们无法拉取 registry.k8s.io
。需要对重新定制自己的 prometheus helm charts
但由于隐形墙的原因,我们无法下载拉取以下:
registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.8.0
所以,你可以获取registry.k8s.io
的方法:
VPS
拉取镜像推送至 DockerHub
这里我由于只是用于测试,直接从 DockerHub
获取他人已经推送好的镜像,当然,这种方法不敢保证别人已经做过改动,存在风险,生产谨慎使用。
wget https://github.com/prometheus-community/helm-charts/releases/download/kube-prometheus-stack-45.2.0/kube-prometheus-stack-45.2.0.tgz
tar -zxvf kube-prometheus-stack-45.2.0.tgz
修改镜像版本
registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.3.0
改为docker.io/ghostwritten/kube-webhook-certgen:v1.3.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.8.0
改为docker.io/ghostwritten/kube-state-metrics:v2.8.0
具体操作:第一步:cd kube-prometheus-stack
目录,vim values.yaml
文件
1872 registry: registry.k8s.io
1873 repository: ingress-nginx/kube-webhook-certgen
1874 tag: v1.3.0
修改为:
1872 registry: docker.io
1873 repository: ghostwritten/kube-webhook-certgen
1874 tag: v1.3.0
第二步:vim charts/kube-state-metrics/values.yaml
3 image:
4 repository: registry.k8s.io/kube-state-metrics/kube-state-metrics
修改为:
3 image:
4 repository: docker.io/ghostwritten/kube-state-metrics
第三步:为了和官方 prometheus helm charts 做一个区分需要修改一下 helm charts
的名字。打开vim Chart.yaml
50 name: kube-prometheus-stack
修改为:
50 name: ghostwritten-kube-prometheus-stack
$ cd ../
$ helm package ./kube-prometheus-stack
Successfully packaged chart and saved it to: /root/ghostwritten-kube-prometheus-stack-45.2.0.tgz
登陆 github
创建一个Token
,你可以在这个链接创建,并勾选 write:packages
权限。
点击“Genarate token
”按钮生成 Token
并复制。
$ helm registry login -u ghostwritten https://ghcr.io
Password: <复制 github_token>
Login Succeeded
推送
$ helm push ghostwritten-kube-prometheus-stack-45.2.0.tgz oci://ghcr.io/ghostwritten/helm
Pushed: ghcr.io/ghostwritten/helm/ghostwritten-kube-prometheus-stack:45.2.0
Digest: sha256:eecc4fcfc2dd3a65dedb692bfbfcd49d1e420e99e2b13aded22b136dae25e146
查看位置 ghostwritten-kube-prometheus-stack-45.2.0.tgz
存放位置
harbor 默认仓库内没有Helm Charts
栏,需要执行以下命令即可:
docker-compose stop
./install.sh --with-chartmuseum
拷贝证书:
scp -r /etc/docker/certs.d/harbor.ghostwritten.com [email protected]:/etc/containers/certs.d/
/etc/hosts
文件配置 192.168.10.81 harbor.ghostwritten.com
登陆
$ podman login -u admin -p Harbor12345 harbor.ghostwritten.com
Login Succeeded!
$ helm registry login --insecure harbor.ghostwritten.com
Username: admin
Password:
Login Succeeded
或者
$ helm registry login --insecure harbor.ghostwritten.com -u admin -p Harbor12345
WARNING: Using --password via the CLI is insecure. Use --password-stdin.
Login Succeeded
跟配置 Docker 仓库一样,配置 Helm 仓库也得提前配置证书,首先进入 ca 签名目录
yum install ca-certificates
cp /etc/containers/certs.d/harbor.ghostwritten.com/ca.crt /etc/pki/ca-trust/source/anchors/
执行更新命令,使证书生效:
update-ca-trust extract
harbor 界面新建一个项目,名字为 helm
将此项目添加 Helm 仓库:
$ helm repo add harbor --username admin --password Harbor12345 https://harbor.ghostwritten.com/chartrepo/helm
"harbor" has been added to your repositories
查看 helm 本地库
helm repo list
NAME URL
prometheus-community https://prometheus-community.github.io/helm-charts
harbor https://harbor.ghostwritten.com/chartrepo/helm
helm 默认不支持推送仓库,需要安装 helm-push
插件
$ helm plugin install https://github.com/chartmuseum/helm-push
Downloading and installing helm-push v0.10.3 ...
https://github.com/chartmuseum/helm-push/releases/download/v0.10.3/helm-push_0.10.3_linux_amd64.tar.gz
Installed plugin: cm-push
推送 ghostwritten-kube-prometheus-stack-45.2.0.tgz
至 harbor
$ helm cm-push ghostwritten-kube-prometheus-stack-45.2.0.tgz harbor
Pushing ghostwritten-kube-prometheus-stack-45.2.0.tgz to harbor...
Done.
我这里选择从刚刚上传的公共 github package上的ghostwritten-kube-prometheus-stack
包进行安装。
helm upgrade prometheus oci://ghcr.io/ghostwritten/helm/ghostwritten-kube-prometheus-stack \
--version 45.2.0 \
--namespace prometheus --create-namespace --install \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
输出:
Release "prometheus" does not exist. Installing it now.
Pulled: ghcr.io/ghostwritten/helm/ghostwritten-kube-prometheus-stack:45.2.0
Digest: sha256:2d17d54b97cbd2ce45f34bf2d25b7d424cff43508054ea0232b7d2f485a5b7a2
NAME: prometheus
LAST DEPLOYED: Tue Feb 28 14:32:22 2023
NAMESPACE: prometheus
STATUS: deployed
REVISION: 1
NOTES:
ghostwritten-kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace prometheus get pods -l "release=prometheus"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
查看 helm 部署应用列表
$ helm list -n prometheus
NAME NAMESPACE REVISION UPDATED STATUS CHART
APP VERSION
prometheus prometheus 1 2023-02-28 14:32:22.526037959 +0800 CST deployed ghostwritten-kube-prometheus-stack-45.2.0 v0.63.0
查看 pod 以及其他资源对象
$ kubectl --namespace prometheus get pods -l "release=prometheus"
NAME READY STATUS RESTARTS AGE
prometheus-ghostwritten-ku-operator-7c89d9649c-hl5sx 1/1 Running 0 49s
prometheus-kube-state-metrics-bf46f46cc-944cr 1/1 Running 0 49s
prometheus-prometheus-node-exporter-jbv7n 1/1 Running 0 49s
$ k get all -n prometheus
NAME READY STATUS RESTARTS AGE
pod/alertmanager-prometheus-ghostwritten-ku-alertmanager-0 2/2 Running 1 (88m ago) 88m
pod/prometheus-ghostwritten-ku-operator-7c89d9649c-fvd4x 1/1 Running 0 88m
pod/prometheus-grafana-6f77bc5bc9-mnsn5 3/3 Running 0 88m
pod/prometheus-kube-state-metrics-bf46f46cc-dqbrm 1/1 Running 0 88m
pod/prometheus-prometheus-ghostwritten-ku-prometheus-0 2/2 Running 0 88m
pod/prometheus-prometheus-node-exporter-gkd29 1/1 Running 0 88m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 88m
service/prometheus-ghostwritten-ku-alertmanager ClusterIP 10.96.193.144 <none> 9093/TCP
88m
service/prometheus-ghostwritten-ku-operator ClusterIP 10.96.112.10 <none> 443/TCP
88m
service/prometheus-ghostwritten-ku-prometheus ClusterIP 10.96.110.97 <none> 9090/TCP
88m
service/prometheus-grafana ClusterIP 10.96.86.245 <none> 80/TCP
88m
service/prometheus-kube-state-metrics ClusterIP 10.96.205.51 <none> 8080/TCP
88m
service/prometheus-operated ClusterIP None <none> 9090/TCP
88m
service/prometheus-prometheus-node-exporter ClusterIP 10.96.154.79 <none> 9100/TCP
88m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-prometheus-node-exporter 1 1 1 1 1 <none> 88m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-ghostwritten-ku-operator 1/1 1 1 88m
deployment.apps/prometheus-grafana 1/1 1 1 88m
deployment.apps/prometheus-kube-state-metrics 1/1 1 1 88m
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-ghostwritten-ku-operator-7c89d9649c 1 1 1 88m
replicaset.apps/prometheus-grafana-6f77bc5bc9 1 1 1 88m
replicaset.apps/prometheus-kube-state-metrics-bf46f46cc 1 1 1 88m
NAME READY AGE
statefulset.apps/alertmanager-prometheus-ghostwritten-ku-alertmanager 1/1 88m
statefulset.apps/prometheus-prometheus-ghostwritten-ku-prometheus 1/1 88m
当然如果我们想要在外网访问这两个服务的话可以通过创建对应的 Ingress 对象或者使用 NodePort 类型的 Service,使用 NodePort 类型的服务,编辑prometheus-grafana
、 prometheus-k8sprometheus-ghostwritten-ku-prometheus
、prometheus-ghostwritten-ku-alertmanager
这三个 Service,将服务类型(type)更改为 NodePort
。但我更喜欢通过域名访问。
编写 prometheus-grafana-alertmanager-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
labels:
app: ghostwritten-kube-prometheus-stack-prometheus
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: grafana.demo.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-grafana
port:
name: http-web
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-ingress
labels:
app: ghostwritten-kube-prometheus-stack-prometheus
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: prometheus.demo.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-ghostwritten-ku-prometheus
port:
name: http-web
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: alertmanager-ingress
labels:
app: ghostwritten-kube-prometheus-stack-prometheus
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: alertmanager.demo.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-ghostwritten-ku-alertmanager
port:
name: http-web
执行:
k apply -f prometheus-grafana-alertmanager-ingress.yaml
windows: C:\Windows\System32\drivers\etc\hosts 配置:
192.168.10.29 grafana.demo.com prometheus.demo.com alertmanager.demo.com
访问: http://prometheus.demo.com
访问:http://alertmanager.demo.com
访问:http://grafana.demo.com
因为我第一次登陆admin/admin
, 并没有登陆成功,通过grafana-cli
命令修改 admin 密码为 admin
:
$ grafana-cli admin reset-admin-password admin
INFO [02-28|10:24:48] Starting Grafana logger=settings version= commit= branch= compiled=1970-01-01T00:00:00Z
INFO [02-28|10:24:48] Config loaded from logger=settings file=/usr/share/grafana/conf/defaults.ini
INFO [02-28|10:24:48] Config overridden from Environment variable logger=settings var="GF_PATHS_DATA=/var/lib/grafana/"
INFO [02-28|10:24:48] Config overridden from Environment variable logger=settings var="GF_PATHS_LOGS=/var/log/grafana"
INFO [02-28|10:24:48] Config overridden from Environment variable logger=settings var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
INFO [02-28|10:24:48] Config overridden from Environment variable logger=settings var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
INFO [02-28|10:24:48] Config overridden from Environment variable logger=settings var="GF_SECURITY_ADMIN_USER=admin"
INFO [02-28|10:24:48] Config overridden from Environment variable logger=settings var="GF_SECURITY_ADMIN_PASSWORD=*********"
INFO [02-28|10:24:48] Path Home logger=settings path=/usr/share/grafana
INFO [02-28|10:24:48] Path Data logger=settings path=/var/lib/grafana/
INFO [02-28|10:24:48] Path Logs logger=settings path=/var/log/grafana
INFO [02-28|10:24:48] Path Plugins logger=settings path=/var/lib/grafana/plugins
INFO [02-28|10:24:48] Path Provisioning logger=settings path=/etc/grafana/provisioning
INFO [02-28|10:24:48] App mode production logger=settings
INFO [02-28|10:24:48] Connecting to DB logger=sqlstore dbtype=sqlite3
INFO [02-28|10:24:48] Starting DB migrations logger=migrator
INFO [02-28|10:24:48] migrations completed logger=migrator performed=0 skipped=464 duration=2.142374ms
INFO [02-28|10:24:48] Envelope encryption state logger=secrets enabled=true current provider=secretKey.v1
Admin password changed successfully ✔
再次admin/admin
登陆成功,直接跳转修改密码,我修改新的密码为 12345678
终于可以正常访问了。
删除 prometheus
应用
k delete -f prometheus-grafana-alertmanager-ingress.yaml
helm delete prometheus -n prometheus
k delete ns prometheus
强制删除命名空间,创建 delete_ns.sh
#!/bin/bash
NAMESPACE=$1
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
bash delete_ns.sh prometheus
参考: