kube-prometheus

很多地方提到Prometheus Operator是kubernetes集群监控的终极解决方案,但是目前Prometheus Operator已经不包含完整功能,完整的解决方案已经变为kube-prometheus。项目地址为:
https://github.com/coreos/kube-prometheus

安装

下载软件

#git clone https://github.com/coreos/kube-prometheus.git

查看清单文件

#cd manifests
#ls
00namespace-namespace.yaml                                         node-exporter-clusterRole.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml    node-exporter-daemonset.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml      node-exporter-serviceAccount.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml  node-exporter-serviceMonitor.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml  node-exporter-service.yaml
0prometheus-operator-clusterRoleBinding.yaml                       prometheus-adapter-apiService.yaml
0prometheus-operator-clusterRole.yaml                              prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
0prometheus-operator-deployment.yaml                               prometheus-adapter-clusterRoleBindingDelegator.yaml
0prometheus-operator-serviceAccount.yaml                           prometheus-adapter-clusterRoleBinding.yaml
0prometheus-operator-serviceMonitor.yaml                           prometheus-adapter-clusterRoleServerResources.yaml
0prometheus-operator-service.yaml                                  prometheus-adapter-clusterRole.yaml
alertmanager-alertmanager.yaml                                     prometheus-adapter-configMap.yaml
alertmanager-secret.yaml                                           prometheus-adapter-deployment.yaml
alertmanager-serviceAccount.yaml                                   prometheus-adapter-roleBindingAuthReader.yaml
alertmanager-serviceMonitor.yaml                                   prometheus-adapter-serviceAccount.yaml
alertmanager-service.yaml                                          prometheus-adapter-service.yaml
grafana-dashboardDatasources.yaml                                  prometheus-clusterRoleBinding.yaml
grafana-dashboardDefinitions.yaml                                  prometheus-clusterRole.yaml
grafana-dashboardSources.yaml                                      prometheus-prometheus.yaml
grafana-deployment.yaml                                            prometheus-roleBindingConfig.yaml
grafana-serviceAccount.yaml                                        prometheus-roleBindingSpecificNamespaces.yaml
grafana-serviceMonitor.yaml                                        prometheus-roleConfig.yaml
grafana-service.yaml                                               prometheus-roleSpecificNamespaces.yaml
kube-state-metrics-clusterRoleBinding.yaml                         prometheus-rules.yaml
kube-state-metrics-clusterRole.yaml                                prometheus-serviceAccount.yaml
kube-state-metrics-deployment.yaml                                 prometheus-serviceMonitorApiserver.yaml
kube-state-metrics-roleBinding.yaml                                prometheus-serviceMonitorCoreDNS.yaml
kube-state-metrics-role.yaml                                       prometheus-serviceMonitorKubeControllerManager.yaml
kube-state-metrics-serviceAccount.yaml                             prometheus-serviceMonitorKubelet.yaml
kube-state-metrics-serviceMonitor.yaml                             prometheus-serviceMonitorKubeScheduler.yaml
kube-state-metrics-service.yaml                                    prometheus-serviceMonitor.yaml
node-exporter-clusterRoleBinding.yaml                              prometheus-service.yaml

修改prometheus-serviceMonitorKubelet.yaml中的port,由https-metrics改为http-metrics,并将scheme改为http

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
    port: http-metrics
    scheme: http  #很多资料上没有提到scheme

alertmanager-service.yaml 增加nodeport 30093的配置

apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9093
    targetPort: web
    nodePort: 30093
  type: NodePort
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP

grafana-service.yaml 增加nodeport 32000的配置

apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 32000
  type: NodePort
  selector:
    app: grafana

prometheus-service.yaml 增加nodeport 30090的配置

apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 30090
  type: NodePort
  selector:
    app: prometheus
    prometheus: k8s
  sessionAffinity: ClientIP

创建资源,过程中会报资源不存在,建议执行两次

#kubectl apply -f .

查看自定义资源crd

#kubectl get crd | grep coreos
alertmanagers.monitoring.coreos.com           2019-06-03T09:17:48Z
prometheuses.monitoring.coreos.com            2019-06-03T09:17:48Z
prometheusrules.monitoring.coreos.com         2019-06-03T09:17:48Z
servicemonitors.monitoring.coreos.com         2019-06-03T09:17:48Z

查看新建的pod

#kubectl -n monitoring  get pods  -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP               NODE            NOMINATED NODE   READINESS GATES
alertmanager-main-0                    2/2     Running   0          16h   10.244.196.134   node01                     
alertmanager-main-1                    2/2     Running   0          15h   10.244.241.204   ingressnode02              
alertmanager-main-2                    2/2     Running   0          15h   10.244.114.4     node05                     
grafana-69c7b8468d-l8p2b               1/1     Running   0          16h   10.244.17.198    prometheus01               
kube-state-metrics-65b5ccc84-kwfgh     4/4     Running   0          15h   10.244.17.199    prometheus01               
node-exporter-62mkc                    2/2     Running   0          16h   22.22.3.235      master02                   
node-exporter-6bsrb                    2/2     Running   0          16h   22.22.3.239      node04                     
node-exporter-8b5h8                    2/2     Running   0          16h   22.22.3.241      prometheus01               
node-exporter-chssb                    2/2     Running   0          16h   22.22.3.243      ingressnode02              
node-exporter-dwqkc                    2/2     Running   0          16h   22.22.3.240      node05                     
node-exporter-kf2cr                    2/2     Running   0          16h   22.22.3.242      ingressnode01              
node-exporter-krsm4                    2/2     Running   0          16h   22.22.3.238      node03                     
node-exporter-lv4gx                    2/2     Running   0          16h   22.22.3.236      node01                     
node-exporter-v5f9v                    2/2     Running   0          16h   22.22.3.234      master01                   
node-exporter-zgsr2                    2/2     Running   0          16h   22.22.3.237      node02                     
prometheus-adapter-6c75d8686d-gq8bn    1/1     Running   0          16h   10.244.17.197    prometheus01               
prometheus-k8s-0                       3/3     Running   1          16h   10.244.140.68    node02                     
prometheus-k8s-1                       3/3     Running   1          16h   10.244.248.198   node04                     
prometheus-operator-74d449f6b4-q6bjn   1/1     Running   0          16h   10.244.17.196    prometheus01               

确认网页都能正常打开


image.png

image.png

image.png
配置prometheus

展开Status菜单,查看targets,可以看到只有图中两个监控任务没有对应的目标,这和serviceMonitor资源对象有关


image.png

查看yaml文件prometheus-serviceMonitorKubeScheduler,selector匹配的是service的标签,但是kube-system namespace中并没有k8s-app=kube-scheduler的service

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: http-metrics
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler

新建prometheus-kubeSchedulerService.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler #与servicemonitor中的selector匹配
spec:
  selector: 
    component: kube-scheduler # 与scheduler的pod标签一直
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP

创建service kube-scheduler

#kubectl apply -f prometheus-kubeSchedulerService.yaml 

同理新建prometheus-kubeControllerManagerService.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP

创建service kube-controller-manager

#kubectl apply -f prometheus-kubeControllerManagerService.yaml

确认所有targets变为正常


image.png
配置grafana

使用admin/admin登录并修改密码
可以看到数据源已经与prometheus关联


image.png
自定义监控项

以监控etcd为例


image.png

将需要的etcd证书保存到secret对象etcd-certs中

# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt  --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key  --from-file=/etc/kubernetes/pki/etcd/ca.crt 
secret/etcd-certs created

修改prometheus资源k8s,在prometheus-prometheus.yaml里面增加secrets

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.7.2

应用prometheus-prometheus.yaml

#kubectl apply -f prometheus-prometheus.yaml 

在pod中查看证书是否导入成功

# kubectl -n monitoring  exec -it prometheus-k8s-0 /bin/sh
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
# ls -l /etc/prometheus/secrets/etcd-certs/
total 0
lrwxrwxrwx    1 root     root            13 Jun  4 09:12 ca.crt -> ..data/ca.crt
lrwxrwxrwx    1 root     root            29 Jun  4 09:12 healthcheck-client.crt -> ..data/healthcheck-client.crt
lrwxrwxrwx    1 root     root            29 Jun  4 09:12 healthcheck-client.key -> ..data/healthcheck-client.key
/prometheus $ cat /etc/prometheus/secrets/etcd-certs/ca.crt 
-----BEGIN CERTIFICATE-----
MIIC9zCCAd+gAwIBAgIJAMiN3pOWJVGOMA0GCSqGSIb3DQEBCwUAMBIxEDAOBgNV
BAMMB2V0Y2QtY2EwHhcNMTkwNTI3MDgzNDExWhcNMzkwNTIyMDgzNDExWjASMRAw
DgYDVQQDDAdldGNkLWNhMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
rG1xQcAwZ67XXG84PzqIIqoqnq/zM3Ru+02PELbzgiZ4MrNPte32vZuj6HK/JDDQ
nEirgnQQxQJ6OxvnDrFVwyxveNI8jrd+FRfuh2ae0NIiqkWk88O42OioACBW6cJA
hILpIcn066+E+t2vh/3TmqMduV8eY5p8VAwRT1B04fJAQVcr0sJh3JXExppbtdWL
Z0T25QTbbbZ/I6oxLMu/NkS171R5l397rSpD2ox0NV0GASoqiitffPznOHBPa1Zs
UwOlQnZlWaBM5XQHFhRQTG/Bxxhe45azmmPT3DGCpATk+/GnYDPnt4TSZiX9gZ6O
beRsGUzPDrX/LOEV/Uv+VQIDAQABo1AwTjAdBgNVHQ4EFgQUxQl8C8RdG+tU2U+T
gy901tOxUNUwHwYDVR0jBBgwFoAUxQl8C8RdG+tU2U+Tgy901tOxUNUwDAYDVR0T
BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAica5i0wN9ZuCICQOGwMcuVgadBqV
w4dOyP4EPyD2SKx3YpYREMGXOafYkrX2rWKqsCBqS9xUT34x2DQ4/KuoPY/Ee37h
pJ+/i47sq8pmiHxqQRUACyGA6SqWtcApfW62+O97qHnRtyUcCftKKLYEu3djzTJd
FOn6xPehbFzhL9H4tsiZ+kFaXqWDUbhSCAd/LeJ+dxzmOE+Rd0hsPHIyzdmWUKwe
CTkSaf9X4KPWjBUCqPzB/Td6Mz3HHg8zZo2FgkyI98a7c83rHl3aTfBJEi4LND8x
PTFwgOGNlZXa6OnUmkn/sHvoNc88EqDm/GjPI6xfLr7BSWE4jJCIwWROvg==
-----END CERTIFICATE-----

创建serviceMonitor etcd-k8s prometheus-serviceMonitorEtcd.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: etcd-k8s
  name: etcd-k8s
  namespace: monitoring
spec:
  endpoints:
  - port: port
    interval: 30s
    scheme: https
    #port: https-metrics
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.cert
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.cert
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: etcd

应用prometheus-serviceMonitorEtcd.yaml

#kubectl apply -f prometheus-serviceMonitorEtcd.yaml

创建关联的service,因为etcd是外部的,所以需要手动创建endpoints.prometheus-service-etcd.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: etcd
  name: etcd-k8s
  namespace: kube-system
spec:
  ports:
  - name: port
    port: 2379
    protocol: TCP
  type: ClusterIP
  clusterIP: None
---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 22.22.3.231
    nodeName: etcd01
  - ip: 22.22.3.232
    nodeName: etcd02
  - ip: 22.22.3.233
    nodeName: etcd03
  ports:
  - name: port
    port: 2379
    protocol: TCP

应用prometheus-service-etcd.yaml

#kubectl apply -f prometheus-service-etcd.yaml
image.png

到https://grafana.com/dashboards 找到etcd相关dashboard
https://grafana.com/dashboards/3070

image.png

下载json文件,并导入到grafana,需要修改prometheus为prometheus
image.png

查看dashboard
image.png

  • Prometheus和serviceMonitor的配置错误可能导致pod prometheus-k8s-0和prometheus-k8s-1不正常,从而导致prometheus无法打开,只要将配置修改正确就可以恢复。

你可能感兴趣的:(kube-prometheus)