1、基本概念
本次部署使用的是CoreOS的prometheus-operator。
本次部署包含监控etcd集群。
本次部署适用于二进制和kubeadm安装方式。
本次部署适用于k8s v1.10版本以上,其他版本自行测试。
项目地址:https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus
使用Helm安装:https://github.com/helm/charts/tree/master/stable/prometheus-operator
2、安装
下载安装文件:
[root@k8s-master01 ~]# git clone https://github.com/dotbalo/k8s.git Cloning into 'k8s'... remote: Enumerating objects: 373, done. remote: Counting objects: 100% (373/373), done. remote: Compressing objects: 100% (264/264), done. remote: Total 373 (delta 127), reused 349 (delta 103), pack-reused 0 Receiving objects: 100% (373/373), 4.92 MiB | 553.00 KiB/s, done. Resolving deltas: 100% (127/127), done.
[root@k8s-master01 prometheus-operator]# ls
alertmanager-config.yam.bak bundle.yaml mail-template.tmpl README.md
alertmanager.yaml deploy manifests teardown
修改相关配置:
1) 修改deploy文件中的etcd证书文件,kubeadm安装方式的无须修改
2)修改manifests/prometheus/prometheus-etcd.yaml的tlsConfig(kubeadm安装方式的无须修改)和addresses(etcd地址)
3)修改alertmanager.yaml文件的邮件告警配置和收件人配置
一键安装:(注意:如果集群是二进制安装的,首次安装注册时间可能会很长很长,kubeadm安装方式较迅速。)
[root@k8s-master01 prometheus-operator]# ./deploy
namespace/monitoring created
secret/alertmanager-main created
secret/etcd-certs created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
serviceaccount/prometheus-operator created
service/prometheus-operator created
deployment.apps/prometheus-operator created
Waiting for Operator to register custom resource definitions...done!
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
clusterrole.rbac.authorization.k8s.io/node-exporter created
daemonset.extensions/node-exporter created
serviceaccount/node-exporter created
service/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
deployment.extensions/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics-resizer created
serviceaccount/kube-state-metrics created
service/kube-state-metrics created
secret/grafana-credentials created
secret/grafana-credentials unchanged
configmap/grafana-dashboard-definitions-0 created
configmap/grafana-dashboards created
configmap/grafana-datasources created
deployment.apps/grafana created
service/grafana created
service/etcd-k8s created
endpoints/etcd-k8s created
servicemonitor.monitoring.coreos.com/etcd-k8s created
configmap/prometheus-k8s-rules created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/alertmanager created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kubelet created
servicemonitor.monitoring.coreos.com/node-exporter created
servicemonitor.monitoring.coreos.com/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus created
service/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
service/alertmanager-main created
alertmanager.monitoring.coreos.com/main created
3、验证安装
查看pods
[root@k8s-master01 prometheus-operator]# kubectl get po -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 2m
alertmanager-main-1 2/2 Running 0 1m
alertmanager-main-2 2/2 Running 0 1m
grafana-59f56c4789-dzvgf 1/1 Running 0 2m
kube-state-metrics-575464c49c-m8w4w 4/4 Running 0 2m
node-exporter-5kvxf 2/2 Running 0 2m
node-exporter-66p7h 2/2 Running 0 2m
node-exporter-clxzk 2/2 Running 0 2m
node-exporter-hsgm8 2/2 Running 0 2m
node-exporter-m5l24 2/2 Running 0 2m
prometheus-k8s-0 2/2 Running 0 2m
prometheus-k8s-1 2/2 Running 0 2m
prometheus-operator-8597f9b976-2hvd5 1/1 Running 0 2m
查看svc
[root@k8s-master01 prometheus-operator]# kubectl get svc -n !$
kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.106.201.155 9093:30903/TCP 2m
alertmanager-operated ClusterIP None 9093/TCP,6783/TCP 2m
etcd-k8s ClusterIP None 2379/TCP 2m
grafana NodePort 10.99.143.133 3000:30902/TCP 2m
kube-state-metrics ClusterIP None 8443/TCP,9443/TCP 2m
node-exporter ClusterIP None 9100/TCP 2m
prometheus-k8s NodePort 10.101.175.59 9090:30900/TCP 2m
prometheus-operated ClusterIP None 9090/TCP 2m
prometheus-operator ClusterIP 10.107.31.10 8080/TCP 2m
此时开放了三个端口:
- alertmanager UI:30903
- grafana:30902
- prometheus UI:30900
4、访问测试
alertmanager:
prometheus:
grafana:
告警邮件查看:
5、卸载
[root@k8s-master01 prometheus-operator]# ./teardown
clusterrolebinding.rbac.authorization.k8s.io "node-exporter" deleted
clusterrole.rbac.authorization.k8s.io "node-exporter" deleted
daemonset.extensions "node-exporter" deleted
serviceaccount "node-exporter" deleted
service "node-exporter" deleted
clusterrolebinding.rbac.authorization.k8s.io "kube-state-metrics" deleted
clusterrole.rbac.authorization.k8s.io "kube-state-metrics" deleted
deployment.extensions "kube-state-metrics" deleted
rolebinding.rbac.authorization.k8s.io "kube-state-metrics" deleted
role.rbac.authorization.k8s.io "kube-state-metrics-resizer" deleted
serviceaccount "kube-state-metrics" deleted
service "kube-state-metrics" deleted
secret "grafana-credentials" deleted
configmap "grafana-dashboard-definitions-0" deleted
configmap "grafana-dashboards" deleted
configmap "grafana-datasources" deleted
deployment.apps "grafana" deleted
service "grafana" deleted
service "etcd-k8s" deleted
servicemonitor.monitoring.coreos.com "etcd-k8s" deleted
......
赞助作者: