kube Prometheus项目地址:
https://github.com/coreos/kube-prometheus
prometheus-operator项目地址:
https://github.com/coreos/prometheus-operator
两者区别:
各组件功能说明:
部署环境信息:
kubernetes当前版本:v1.15.0
kube-prometheus当前版本: v0.3.0
注意:必须保证各个节点时间同步,否则可能无法查看到数据。
yum install -y chrony
systemctl enable --now chronyd
timedatectl set-timezone Asia/Shanghai
参考:https://github.com/coreos/kube-prometheus#quickstart
#安装当前发行的稳定版本
mkdir -p /data/prometheus
version=v0.3.0
curl -Lo /data/prometheus/kube-prometheus-$version.tar.gz https://github.com/coreos/kube-prometheus/archive/$version.tar.gz
cd /data/prometheus
tar -zxvf kube-prometheus-$version.tar.gz
#创建namespace和CRDs
kubectl create -f manifests/setup
#等待以上资源创建完成并可用
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/
#清理已安装的prometheus
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
查看创建的namespace
部署完成后,会创建一个名为monitoring的 namespace,所有资源对象都将部署在该命名空间下.
[root@k8s-master data]# kubectl get ns | grep monitoring
monitoring Active 11m
此外 Operator 会自动创建了多个 CRD 资源对象:
[root@k8s-master ~]# kubectl get crd | grep monitoring.coreos.com
alertmanagers.monitoring.coreos.com 2019-09-02T05:01:08Z
podmonitors.monitoring.coreos.com 2019-09-02T05:01:08Z
prometheuses.monitoring.coreos.com 2019-09-02T05:01:11Z
prometheusrules.monitoring.coreos.com 2019-09-02T05:01:13Z
servicemonitors.monitoring.coreos.com 2019-09-02T05:01:14Z
在 monitoring 命名空间下查看所有Pod
[root@k8s-master ~]# kubectl -n monitoring get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 2/2 Running 0 15m 10.244.2.38 k8s-node1 <none> <none>
alertmanager-main-1 2/2 Running 0 15m 10.244.1.77 k8s-node2 <none> <none>
alertmanager-main-2 2/2 Running 0 15m 10.244.1.75 k8s-node2 <none> <none>
grafana-57bfdd47f8-rzcp4 1/1 Running 0 25m 10.244.1.71 k8s-node2 <none> <none>
kube-state-metrics-85df5f68ff-txj7b 4/4 Running 0 14m 10.244.2.40 k8s-node1 <none> <none>
node-exporter-gk9xs 2/2 Running 0 25m 192.168.92.56 k8s-master <none> <none>
node-exporter-mbzbt 2/2 Running 0 25m 192.168.92.57 k8s-node1 <none> <none>
node-exporter-tsdrz 2/2 Running 0 25m 192.168.92.58 k8s-node2 <none> <none>
prometheus-adapter-6b9989ccbd-fclpg 1/1 Running 0 25m 10.244.1.73 k8s-node2 <none> <none>
prometheus-k8s-0 3/3 Running 1 15m 10.244.2.39 k8s-node1 <none> <none>
prometheus-k8s-1 3/3 Running 1 15m 10.244.1.76 k8s-node2 <none> <none>
prometheus-operator-7894d75578-glb99 1/1 Running 0 25m 10.244.1.70 k8s-node2 <none> <none>
查看创建的所有资源
其中 alertmanager 和 prometheus server是用 StatefulSet 控制器管理的,其中还有一个比较核心的 prometheus-operator 的 Pod,用来控制其他资源对象和监听对象变化。
[root@k8s-master ~]# kubectl -n monitoring get all
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 2/2 Running 0 5m51s
pod/alertmanager-main-1 2/2 Running 0 5m51s
pod/alertmanager-main-2 2/2 Running 0 5m51s
pod/grafana-57bfdd47f8-rzcp4 1/1 Running 0 16m
pod/kube-state-metrics-85df5f68ff-txj7b 4/4 Running 0 5m9s
pod/node-exporter-gk9xs 2/2 Running 0 16m
pod/node-exporter-mbzbt 2/2 Running 0 16m
pod/node-exporter-tsdrz 2/2 Running 0 16m
pod/prometheus-adapter-6b9989ccbd-fclpg 1/1 Running 0 16m
pod/prometheus-k8s-0 3/3 Running 1 5m49s
pod/prometheus-k8s-1 3/3 Running 1 5m49s
pod/prometheus-operator-7894d75578-glb99 1/1 Running 0 16m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 10.102.33.86 <none> 9093/TCP 16m
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5m51s
service/grafana ClusterIP 10.99.225.171 <none> 3000/TCP 16m
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 16m
service/node-exporter ClusterIP None <none> 9100/TCP 16m
service/prometheus-adapter ClusterIP 10.105.147.228 <none> 443/TCP 16m
service/prometheus-k8s ClusterIP 10.111.70.49 <none> 9090/TCP 16m
service/prometheus-operated ClusterIP None <none> 9090/TCP 5m49s
service/prometheus-operator ClusterIP None <none> 8080/TCP 16m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-exporter 3 3 3 3 3 kubernetes.io/os=linux 16m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/grafana 1/1 1 1 16m
deployment.apps/kube-state-metrics 1/1 1 1 16m
deployment.apps/prometheus-adapter 1/1 1 1 16m
deployment.apps/prometheus-operator 1/1 1 1 16m
NAME DESIRED CURRENT READY AGE
replicaset.apps/grafana-57bfdd47f8 1 1 1 16m
replicaset.apps/kube-state-metrics-85df5f68ff 1 1 1 5m9s
replicaset.apps/kube-state-metrics-96ff54c65 0 0 0 5m50s
replicaset.apps/kube-state-metrics-9779ff698 0 0 0 16m
replicaset.apps/prometheus-adapter-6b9989ccbd 1 1 1 16m
replicaset.apps/prometheus-operator-7894d75578 1 1 1 16m
NAME READY AGE
statefulset.apps/alertmanager-main 3/3 5m51s
statefulset.apps/prometheus-k8s 2/2 5m49s
查看crd自定义资源,控制副本数,持久卷等
[root@master01 ~]# kubectl -n monitoring get prometheus
NAME VERSION REPLICAS AGE
k8s v2.15.2 2 9h
以nodeport方式为例,端口自定义
以patch方式更新service类型:
kubectl patch svc grafana -n monitoring -p '{"spec":{"type":"NodePort","ports":[{"name":"http","port":3000,"protocol":"TCP","targetPort":"http","nodePort":30030}]}}'
kubectl patch svc prometheus-k8s -n monitoring -p '{"spec":{"type":"NodePort","ports":[{"name":"web","port":9090,"protocol":"TCP","targetPort":"web","nodePort":30090}]}}'
kubectl patch svc alertmanager-main -n monitoring -p '{"spec":{"type":"NodePort","ports":[{"name":"web","port":9093,"protocol":"TCP","targetPort":"web","nodePort":30093}]}}'
成功后浏览器直接访问以下链接即可,nodeip+port方式:
grafana登录的账号密码默认为admin/admin
prometheus对应的nodeport端口为30200,访问http://192.168.92.56:30200
通过访问http://192.168.92.56:30200/target 可以看到prometheus已经成功连接上了k8s的apiserver
查看service-discovery
Prometheus自己的指标
prometheus的WEB界面上提供了基本的查询K8S集群中每个POD的CPU使用情况,查询条件如下:
sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )
上述的查询有出现数据,说明node-exporter往prometheus中写入数据正常,接下来我们就可以部署grafana组件,实现更友好的webui展示数据了。
查看grafana服务暴露的端口号:
[centos@k8s-master ~]$ kubectl get service -n monitoring | grep grafana
grafana NodePort 10.107.56.143 <none> 3000:30100/TCP 20h
[centos@k8s-master ~]$
如上可以看到grafana的端口号是30100,浏览器访问http://192.168.92.56:30100
用户名密码默认admin/admin
修改密码并登陆。
添加数据源
grafana默认已经添加了Prometheus数据源,grafana支持多种时序数据源,每种数据源都有各自的查询编辑器。
Prometheus数据源的相关参数:
目前官方支持了如下几种数据源:
导入dashboard:
导入面板,可以直接输入模板编号315在线导入,或者下载好对应的json模板文件本地导入,面板模板下载地址:
https://grafana.com/dashboards/315
https://grafana.com/dashboards/8919
导入面板之后就可以看到对应的监控数据了,点击HOME选择查看,其实Grafana已经预定义了一系列Dashboard:
查看集群监控信息
另外一个dashborad模板
能够监控的资源: