Prometheus 是一套开源的系统监控、报警、时间序列数据库的组合,而 Prometheus Operator 是 CoreOS 开源的一套用于管理在 Kubernetes 集群上的 Prometheus 控制器,它是为了简化在 Kubernetes 上部署、管理和运行 Prometheus 和 Alertmanager 集群。
创建/销毁:使用操作员轻松为您的Kubernetes命名空间,特定应用程序或团队轻松启动Prometheus实例。
简单配置:从本机Kubernetes资源配置Prometheus的基础知识,如版本,持久性,保留策略和副本。
通过标签进行目标服务:根据熟悉的Kubernetes标签查询自动生成监控目标配置; 无需学习Prometheus特定的配置语言。
Prometheus操作员将Prometheus配置为Kubernetes原生,并管理和操作Prometheus和Alertmanager集群。 这是关于完整端到端监控的难题之一。
kube-prometheus将Prometheus Operator与一系列清单相结合,以帮助开始监控Kubernetes本身以及运行在其上的应用程序。
kube-prometheus没有版本,并且以与Prometheus Operator相同的速度发布。 发行说明仅描述对操作员的更改,发布存档仅包含操作员代码的匹配更改。 对于kube-prometheus的更改,请始终引用此存储库的主分支。
kube-prometheus是一个单独的项目,将来会有自己的存储库[1] [operator-vs-kube。
Prometheus,定义了所需的Prometheus部署。 运营商始终确保正在运行与资源定义匹配的部署。
ServiceMonitor,以声明方式指定应如何监视服务组。 操作员根据定义自动生成Prometheus刮削配置。
PrometheusRule,定义所需的Prometheus规则文件,可由包含Prometheus警报和记录规则的Prometheus实例加载。
Alertmanager,定义了所需的Alertmanager部署。 运营商始终确保正在运行与资源定义匹配的部署。
https://github.com/coreos/prometheus-operator.git
目前已经移到 coreos/kube-prometheus
https://github.com/coreos/kube-prometheus.git
Prometheus Operator所有yaml文件所在路径:
https://github.com/coreos/prometheus-operator/contrib/kube-prometheus/manifests
移到
https://github.com/coreos/kube-prometheus/manifests
https://github.com/coreos/
编辑prometheus-operator-0.23.2目录下的bundle.yaml
修改项namespace: monitoring
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-operator
subjects:
- kind: ServiceAccount
name: prometheus-operator
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-operator
rules:
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- '*'
- apiGroups:
- monitoring.coreos.com
resources:
- alertmanagers
- prometheuses
- prometheuses/finalizers
- alertmanagers/finalizers
- servicemonitors
- prometheusrules
verbs:
- '*'
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- '*'
- apiGroups:
- ""
resources:
- configmaps
- secrets
verbs:
- '*'
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- delete
- apiGroups:
- ""
resources:
- services
- endpoints
verbs:
- get
- create
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- namespaces
verbs:
- list
- watch
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
labels:
k8s-app: prometheus-operator
name: prometheus-operator
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
k8s-app: prometheus-operator
template:
metadata:
labels:
k8s-app: prometheus-operator
spec:
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --logtostderr=true
- --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
- --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.23.2
image: quay.io/coreos/prometheus-operator:v0.23.2
name: prometheus-operator
ports:
- containerPort: 8080
name: http
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
nodeSelector:
beta.kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: prometheus-operator
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-operator
namespace: monitoring
执行创建
kubectl create -f bundle.yaml
部署kube-prometheus
kubectl create -f prometheus-operator/contrib/kube-prometheus/manifests
根据命名空间查询
kubectl get all -n monitoring
[root@saas98 usr]$ kubectl get all -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 2/2 Running 0 3h53m
pod/alertmanager-main-1 2/2 Running 0 3h52m
pod/alertmanager-main-2 2/2 Running 0 3h52m
pod/grafana-5c54dbc48b-jvhcd 1/1 Running 0 3h54m
pod/kube-state-metrics-fd9b964d5-srwkp 4/4 Running 0 3h49m
pod/node-exporter-5ndbs 2/2 Running 0 3h54m
pod/node-exporter-nts45 2/2 Running 0 3h54m
pod/node-exporter-pxtw5 2/2 Running 0 3h54m
pod/node-exporter-tvntn 1/2 CrashLoopBackOff 47 3h54m
pod/node-exporter-wb7sx 2/2 Running 0 3h54m
pod/prometheus-k8s-0 3/3 Running 1 3h53m
pod/prometheus-k8s-1 3/3 Running 1 3h49m
pod/prometheus-operator-76599f4b8c-zm5wl 1/1 Running 0 3h50m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main NodePort 10.99.5.67 9093:30662/TCP 3h54m
service/alertmanager-operated ClusterIP None 9093/TCP,6783/TCP 3h53m
service/grafana NodePort 10.106.129.28 3000:31844/TCP 3h54m
service/kube-state-metrics ClusterIP None 8443/TCP,9443/TCP 3h54m
service/node-exporter ClusterIP None 9100/TCP 3h54m
service/prometheus-k8s NodePort 10.99.129.143 9090:31144/TCP 3h54m
service/prometheus-operated ClusterIP None 9090/TCP 3h53m
service/prometheus-operator ClusterIP None 8080/TCP 3h54m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-exporter 5 5 4 5 4 beta.kubernetes.io/os=linux 3h54m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/grafana 1/1 1 1 3h54m
deployment.apps/kube-state-metrics 1/1 1 1 3h54m
deployment.apps/prometheus-operator 1/1 1 1 3h54m
NAME DESIRED CURRENT READY AGE
replicaset.apps/grafana-5c54dbc48b 1 1 1 3h54m
replicaset.apps/kube-state-metrics-6f5c6d88d5 0 0 0 3h54m
replicaset.apps/kube-state-metrics-fd9b964d5 1 1 1 3h53m
replicaset.apps/prometheus-operator-76599f4b8c 1 1 1 3h54m
replicaset.apps/prometheus-operator-f9fcb78bd 0 0 0 3h50m
NAME READY AGE
statefulset.apps/alertmanager-main 3/3 3h53m
statefulset.apps/prometheus-k8s 2/2 3h53m
修改访问方式(集群外部访问)
把svc的访问方式改为NodePort模式。
使用kubectl edit svc [svcname] -n monitoring方式修改
需要修改的是alertmanager-main,grafana,prometheus-k8s
例子:kubectl edit svc alertmanager-main -n monitoring
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
type: NodePort #添加内容
ports:
- name: http
port: 3000
targetPort: http
nodePort: 30100 #添加内容
selector:
app: grafana
[root@saas98 usr]$ kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.99.5.67 9093:30662/TCP 3h55m
alertmanager-operated ClusterIP None 9093/TCP,6783/TCP 3h54m
grafana NodePort 10.106.129.28 3000:31844/TCP 3h55m
kube-state-metrics ClusterIP None 8443/TCP,9443/TCP 3h55m
node-exporter ClusterIP None 9100/TCP 3h55m
prometheus-k8s NodePort 10.99.129.143 9090:31144/TCP 3h55m
prometheus-operated ClusterIP None 9090/TCP 3h54m
prometheus-operator ClusterIP None 8080/TCP 3h55m
访问prometheus 端口 31144 例子http://118.31.17.205:31144/graph
通过访问http://118.31.17.205:31144/target 可以看到prometheus已经成功连接上了k8s的apiserver
访问alertmanager-main 例子:http://118.31.17.205:30662 alertmanager-main 30662
查看service-discovery http://118.31.17.205:31144/service-discovery
访问grafana 例子:http://118.31.17.205:31844 grafana 31844
输入密码就可以了(初始化用户名密码admin)
添加数据源
grafana默认已经添加了Prometheus数据源,可以直接用,grafana支持多种时序数据源,每种数据源都有各自的查询编辑器。
导入dashboard:导入面板,可以直接输入模板编号315在线导入,或者下载好对应的json模板文件本地导入,
官方面板模板下载地址:
https://grafana.com/dashboards/315
https://grafana.com/dashboards/8919
导入面板之后就可以看到对应的监控数据了,点击HOME选择查看,其实Grafana已经预定义了一系列Dashboard:
查看集群监控信息
https://github.com/coreos/prometheus-operator