Prometheus 是一款基于时序数据库的开源监控告警系统,非常适合Kubernetes集群的监控。Prometheus的基本原理是通过HTTP协议周期性抓取被监控组件的状态,任意组件只要提供对应的HTTP接口就可以接入监控。不需要任何SDK或者其他的集成过程。这样做非常适合做虚拟化环境监控系统,比如VM、Docker、Kubernetes等。输出被监控组件信息的HTTP接口被叫做exporter 。目前互联网公司常用的组件大部分都有exporter可以直接使用,比如Varnish、Haproxy、Nginx、MySQL、Linux系统信息(包括磁盘、内存、CPU、网络等等)。Promethus有以下特点:
主要负责数据采集和存储,提供PromQL查询语言的支持。包含了三个组件:
Retrieval: 获取监控数据
TSDB: 时间序列数据库(Time Series Database),我们可以简单的理解为一个优化后用来处理时间序列数据的软件,并且数据中的数组是由时间进行索引的。具备以下特点:
HTTP Server: 为告警和出图提供查询接口
通过相关的告警配置,对触发阈值的告警通过页面展示、短信和邮件通知的方式告知运维人员。
通过ProQL语句查询指标信息,并在页面展示。虽然Prometheus自带UI界面,但是大部分都是使用Grafana出图。另外第三方也可以通过 API 接口来获取监控指标。
常见的部署方式:
https://hub.docker.com/r/prom/prometheus/tags
docker run -p 9090:9090 prom/prometheus
https://github.com/prometheus-operator/kube-prometheus
Operator部署器是基于已经编写好的yaml文件,可以将prometheus server,alertmanager,grafana,node-exporter等组件一键批量部署在k8s内部.
根据k8s版本选择Operator
kube-prometheus stack | Kubernetes 1.20 | Kubernetes 1.21 | Kubernetes 1.22 | Kubernetes 1.23 | Kubernetes 1.24 |
---|---|---|---|---|---|
release-0.8 | ✔ | ✔ | ✗ | ✗ | ✗ |
release-0.9 | ✗ | ✔ | ✔ | ✗ | ✗ |
release-0.10 | ✗ | ✗ | ✔ | ✔ | ✗ |
release-0.11 | ✗ | ✗ | ✗ | ✔ | ✔ |
main | ✗ | ✗ | ✗ | ✗ | ✔ |
root@k8s-master-01:/opt/k8s-data/yaml# kubectl get node
NAME STATUS ROLES AGE VERSION
192.168.31.101 Ready,SchedulingDisabled master 123d v1.22.5
下载或克隆0.10版本
https://codeload.github.com/prometheus-operator/kube-prometheus/zip/refs/heads/release-0.10
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus-release-0.10/manifests/
kubectl create -f setup/
下载镜像并上传harbor,如果没有代理服务器,使用阿里镜像曲线下载
alertmanager-alertmanager.yaml: image: harbor.intra.com/prometheus/alertmanager:v0.23.0
blackboxExporter-deployment.yaml: image: harbor.intra.com/prometheus/blackbox-exporter:v0.19.0
blackboxExporter-deployment.yaml: image: harbor.intra.com/prometheus/configmap-reload:v0.5.0
blackboxExporter-deployment.yaml: image: harbor.intra.com/prometheus/kube-rbac-proxy:v0.11.0
grafana-deployment.yaml: image: harbor.intra.com/prometheus/grafana:8.3.3
kubeStateMetrics-deployment.yaml: image: harbor.intra.com/prometheus/kube-state-metrics:v2.3.0
kubeStateMetrics-deployment.yaml: image: harbor.intra.com/prometheus/kube-rbac-proxy:v0.11.0
kubeStateMetrics-deployment.yaml: image: harbor.intra.com/prometheus/kube-rbac-proxy:v0.11.0
nodeExporter-daemonset.yaml: image: harbor.intra.com/prometheus/node-exporter:v1.3.1
nodeExporter-daemonset.yaml: image: harbor.intra.com/prometheus/kube-rbac-proxy:v0.11.0
prometheusAdapter-deployment.yaml: image: harbor.intra.com/prometheus/prometheus-adapter:v0.9.1
prometheusOperator-deployment.yaml: image: harbor.intra.com/prometheus/prometheus-operator:v0.53.1
prometheusOperator-deployment.yaml: image: harbor.intra.com/prometheus/kube-rbac-proxy:v0.11.0
prometheus-prometheus.yaml: image: harbor.intra.com/prometheus/prometheus:v2.32.1
修改service使得prometheus通过NodePort提供服务
如果不修改可以临时使用以下命令临时指定
kubectl -n monitoring port-forward svc/prometheus-k8s 9090
prometheus-service.yaml
spec:
type: NodePort
ports:
- name: web
port: 9090
targetPort: web
- name: reloader-web
port: 8080
targetPort: reloader-web
部署
root@k8s-master-01:/opt/k8s-data/yaml/kube-prometheus-release-0.10/manifests# kubectl apply .
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager-main created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
略
root@k8s-master-01:/opt/k8s-data/yaml/kube-prometheus-release-0.10/manifests# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.200.236.235 <none> 9093/TCP,8080/TCP 47m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 47m
blackbox-exporter ClusterIP 10.200.16.237 <none> 9115/TCP,19115/TCP 47m
grafana ClusterIP 10.200.168.96 <none> 3000/TCP 47m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 47m
node-exporter ClusterIP None <none> 9100/TCP 47m
prometheus-adapter ClusterIP 10.200.166.31 <none> 443/TCP 47m
prometheus-k8s NodePort 10.200.129.185 <none> 9090:60805/TCP,8080:52252/TCP 47m
prometheus-operated ClusterIP None <none> 9090/TCP 47m
prometheus-operator ClusterIP None <none> 8443/TCP 47m
修改service使得grafana通过NodePort提供服务
grafana-service.yaml
spec:
type: NodePort
ports:
- name: http
port: 3000
targetPort: http
nodePort: 33000
部署grafana
kubectl apply -f grafana-service.yaml
root@k8s-master-01:/opt/k8s-data/yaml/kube-prometheus-release-0.10/manifests# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.200.236.235 <none> 9093/TCP,8080/TCP 51m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 51m
blackbox-exporter ClusterIP 10.200.16.237 <none> 9115/TCP,19115/TCP 51m
grafana NodePort 10.200.168.96 <none> 3000:33000/TCP 51m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 51m
node-exporter ClusterIP None <none> 9100/TCP 51m
prometheus-adapter ClusterIP 10.200.166.31 <none> 443/TCP 51m
prometheus-k8s NodePort 10.200.129.185 <none> 9090:60805/TCP,8080:52252/TCP 51m
prometheus-operated ClusterIP None <none> 9090/TCP 51m
prometheus-operator ClusterIP None <none> 8443/TCP 51m
此时可以正常访问到grafana.至此Operator部署Prometheus完成
mkdir /apps
cd /apps
wget https://github.com/prometheus/prometheus/releases/download/v2.38.0/prometheus-2.38.0.linux-amd64.tar.gz
tar xf prometheus-2.38.0.linux-amd64.tar.gz
ln -sf /apps/prometheus-2.38.0.linux-amd64 /apps/prometheus
root@prometheus-2:/apps# cd /apps/prometheus
root@prometheus-2:/apps/prometheus# ll
total 207312
drwxr-xr-x 4 3434 3434 169 Aug 29 15:21 ./
drwxr-xr-x 4 root root 118 Aug 29 15:22 ../
-rw-r--r-- 1 3434 3434 11357 Aug 16 21:42 LICENSE
-rw-r--r-- 1 3434 3434 3773 Aug 16 21:42 NOTICE
drwxr-xr-x 2 3434 3434 38 Aug 16 21:42 console_libraries/
drwxr-xr-x 2 3434 3434 173 Aug 16 21:42 consoles/
-rwxr-xr-x 1 3434 3434 110234973 Aug 16 21:26 prometheus* # Prometheus主程序
-rw-r--r-- 1 3434 3434 934 Aug 16 21:42 prometheus.yml # Prometheus主配置文件
-rwxr-xr-x 1 3434 3434 102028302 Aug 16 21:28 promtool* # 测试工具,用于检测配置prometheus配置文件,检测metrics数据
/etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Docmentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.targe
启动prometheus
systemctl daemon-reload
systemctl enable --now prometheus.service
mkdir /apps
cd /apps
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xf node_exporter-1.3.1.linux-amd64.tar.gz
ln -sf /apps/node_exporter-1.3.1.linux-amd64 /apps/node_exporter
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/apps/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
启动node_exporter
root@zookeeper-1:/apps/node_exporter# systemctl enable --now node-exporter.service
Created symlink /etc/systemd/system/multi-user.target.wants/node-exporter.service → /etc/systemd/system/node-exporter.service
ss -ntlup|grep 9100
tcp LISTEN 0 4096 *:9100 *:* users:(("node_exporter",pid=123757,fd=3))
可以通过网页获得节点数据
root@prometheus-2:/apps/prometheus# vi prometheus.yml
## 追加以下内容
- job_name: "zookeeper"
static_configs:
- targets: ["192.168.31.121:9100","192.168.31.122:9100","192.168.31.123:9100"]
重启服务,并确认服务状态
root@prometheus-2:/apps/prometheus# systemctl restart prometheus.service
root@prometheus-2:/apps/prometheus# ss -ntlup|grep 9090
tcp LISTEN 0 4096 *:9090 *:* users:(("prometheus",pid=5198,fd=8))
此时可以在Prometheus上看到3个节点的数据已经获取