有关prometheus的总结

prometheus主要组成模块:
1、prometheus server,主动去各metric接口拉取数据
2、exporter,为非原生应用暴露metric
3、pushgateway,接收push过来的数据并暴露metric
4、PromQL,数据查询组件
5、TSDB,本地数据库

prometheus server的容器端口是9090
手动安装prometheus需要准备如下内容:
1、命名空间
2、prometheus配置文件
3、用于持久存储的pvc
4、rbac权限
5、prometheus的pod文件
6、prometheus的svc(类型为NodePort)文件或者ingress文件

prometheus配置文件包括3个部分:
global、rule_files和scrape_configs
scrape_configs部分用来控制监控哪些对象,可以有多个job_name。
每个job_name表示一种被监控的对象,默认有一个job_name: 'prometheus',用来监控自己。

k8s中被监控的对象主要包括:
1、本身有metric接口的应用
2、需要使用exporter提供metric接口的应用
常用的exporter有:
node exporter,haproxy exporter,mysql server exporter
redis exporter,ribbitmq exporter
3、将数据push到pushgateway组件中的应用

prometheus监控方式有两种:
1、静态方式
static_configs
targets格式:svc-name.namespace-name.svc.cluster.local:port-number
2、动态方式
kubernetes_sd_configs

静态监控traefik:
traefik有metric接口但是没有启用,需要手动启用
- job_name: 'traefik'
  static_configs:
    - targets: ['traefik-ingress-service.kube-system.svc.cluster.local:8080']
静态监控redis:
redis-exporter以sidecar的方式和redis部署在一个pod中
- job_name: 'redis'
  static_configs:
  - targets: ['redis.kube-ops.svc.cluster.local:9121']
  
在k8s中,prometheus有5种服务发现模式:
Node、Service、Pod、Endpoints、Ingress

动态监控集群node和kubelet
node-exporter以独立pod的形式运行,并且由daemonset控制器部署在每一个node上
node-exporter容器的端口是9100
动态监控node(端口9100):
- job_name: 'kubernetes-nodes'
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:9100'
动态监控kubelet(10255):
- job_name: 'kubernetes-kubelet'
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:10255'

用cadvisor动态监控容器:
cAdvisor已经内置在了kubelet组件之中,不需要单独安装
- job_name: 'kubernetes-cadvisor'
  kubernetes_sd_configs:
  - role: node
  scheme: https
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

动态监控apiserver
和apiserver对应的服务是集群内置服务kubernetes
- job_name: 'kubernetes-apiservers'
  kubernetes_sd_configs:
  - role: endpoints
  scheme: https
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https

动态监控普通svc
- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
关键点是:prometheus_io_scrape: true
动态监控redis:
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "9121"
动态监控traefik:
annotations:
  prometheus.io/scrape: "true"        
  prometheus.io/port: "8080" 

使用kube-state-metrics监控k8s集群中的资源对象
如:Pod、DaemonSet、Deployment、Job、CronJob等
$ git clone https://github.com/kubernetes/kube-state-metrics.git
$ cd kube-state-metrics/kubernetes
$ kubectl create -f .
会在kubernetes-service-endpoints这个job下自动服务发现kube-state-metrics,并开始拉取 metrics,
这是因为部署kube-state-metrics的manifest定义文件kube-state-metrics-service.yaml对Service的定义包含prometheus.io/scrape: 'true'这样的一个annotation

以下是prometheus operator内容:
创建Operator的关键是CRD(自定义资源)的设计
Operator的核心实现就是基于Kubernetes的以下两个概念:
1、资源:对象的状态定义
2、控制器:观测、分析和行动,以调节资源的分布
Prometheus-Operator的核心是Operator控制器,它会创建5个crd资源对象
alertmanagers.monitoring.coreos.com     
podmonitors.monitoring.coreos.com       
prometheuses.monitoring.coreos.com      
prometheusrules.monitoring.coreos.com   
servicemonitors.monitoring.coreos.com   
prometheuses就是prometheus server
servicemonitors可以理解为提供metric接口的各种exporter
servicemonitors的后端是各类service

$ git clone https://github.com/coreos/kube-prometheus.git
$ ls kube-prometheus/manifests/
00namespace-namespace.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
0prometheus-operator-0podmonitorCustomResourceDefinition.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
0prometheus-operator-clusterRoleBinding.yaml
0prometheus-operator-clusterRole.yaml
0prometheus-operator-deployment.yaml
0prometheus-operator-serviceAccount.yaml
0prometheus-operator-serviceMonitor.yaml
0prometheus-operator-service.yaml
alertmanager-alertmanager.yaml
alertmanager-secret.yaml
alertmanager-serviceAccount.yaml
alertmanager-serviceMonitor.yaml
alertmanager-service.yaml
grafana-dashboardDatasources.yaml
grafana-dashboardDefinitions.yaml
grafana-dashboardSources.yaml
grafana-deployment.yaml
grafana-serviceAccount.yaml
grafana-serviceMonitor.yaml
grafana-service.yaml
kube-state-metrics-clusterRoleBinding.yaml
kube-state-metrics-clusterRole.yaml
kube-state-metrics-deployment.yaml
kube-state-metrics-roleBinding.yaml
kube-state-metrics-role.yaml
kube-state-metrics-serviceAccount.yaml
kube-state-metrics-serviceMonitor.yaml
kube-state-metrics-service.yaml
node-exporter-clusterRoleBinding.yaml
node-exporter-clusterRole.yaml
node-exporter-daemonset.yaml
node-exporter-serviceAccount.yaml
node-exporter-serviceMonitor.yaml
node-exporter-service.yaml
prometheus-adapter-apiService.yaml
prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
prometheus-adapter-clusterRoleBindingDelegator.yaml
prometheus-adapter-clusterRoleBinding.yaml
prometheus-adapter-clusterRoleServerResources.yaml
prometheus-adapter-clusterRole.yaml
prometheus-adapter-configMap.yaml
prometheus-adapter-deployment.yaml
prometheus-adapter-roleBindingAuthReader.yaml
prometheus-adapter-serviceAccount.yaml
prometheus-adapter-service.yaml
prometheus-clusterRoleBinding.yaml
prometheus-clusterRole.yaml
prometheus-prometheus.yaml
prometheus-roleBindingConfig.yaml
prometheus-roleBindingSpecificNamespaces.yaml
prometheus-roleConfig.yaml
prometheus-roleSpecificNamespaces.yaml
prometheus-rules.yaml
prometheus-serviceAccount.yaml
prometheus-serviceMonitorApiserver.yaml
prometheus-serviceMonitorCoreDNS.yaml
prometheus-serviceMonitorEtcd.yaml
prometheus-serviceMonitorKubeControllerManager.yaml
prometheus-serviceMonitorKubelet.yaml
prometheus-serviceMonitorKubeScheduler.yaml
prometheus-serviceMonitor.yaml
prometheus-service.yaml

安装之后,会有如下pod
$ kubectl get pods -n monitoring
NAME                                  READY     STATUS    RESTARTS   AGE
alertmanager-main-0                   2/2       Running   0          21h
alertmanager-main-1                   2/2       Running   0          21h
alertmanager-main-2                   2/2       Running   0          21h
grafana-df9bfd765-f4dvw               1/1       Running   0          22h
kube-state-metrics-77c9658489-ntj66   4/4       Running   0          20h
node-exporter-4sr7f                   2/2       Running   0          21h
node-exporter-9mh2r                   2/2       Running   0          21h
node-exporter-m2gkp                   2/2       Running   0          21h
prometheus-adapter-dc548cc6-r6lhb     1/1       Running   0          22h
prometheus-k8s-0                      3/3       Running   1          21h
prometheus-k8s-1                      3/3       Running   1          21h
prometheus-operator-bdf79ff67-9dc48   1/1       Running   0          21h
其中prometheus-k8s就是用StatefulSet控制器管理的prometheus server pod
prometheus-operator就是operator控制器pod

创建的svc有
kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       ClusterIP   10.110.204.224           9093/TCP            23h
alertmanager-operated   ClusterIP   None                     9093/TCP,6783/TCP   23h
grafana                 ClusterIP   10.98.191.31             3000/TCP            23h
kube-state-metrics      ClusterIP   None                     8443/TCP,9443/TCP   23h
node-exporter           ClusterIP   None                     9100/TCP            23h
prometheus-adapter      ClusterIP   10.107.201.172           443/TCP             23h
prometheus-k8s          ClusterIP   10.107.105.53            9090/TCP            23h
prometheus-operated     ClusterIP   None                     9090/TCP            23h
prometheus-operator     ClusterIP   None                     8080/TCP            23h
需要将grafana和prometheus-k8s这两个svc修改为NodePort或者为其创建ingress

operator安装成功后正常监控到的目标包括:
altermanager(9093)
kube-apiserver(6443)
kube-state-metrics(8443和9443)
kubelet(10255或者10250)
node-exporter(9100)
prometheus-operator(8080)
prometheus(9090)
coredns(9153)
没有正常监控到的目标包括:
kube-controller-manager(10252)
kube-scheduler(10251)

创建对应的svc修复kube-scheduler
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP
其中k8s-app: kube-scheduler和其serviceMonitor中定义的selector匹配
component: kube-scheduler和其pod中的labels匹配
然后修改/etc/kubernetes/manifests/kube-scheduler.yaml
containers:
- command:
- kube-scheduler
- --leader-elect=true
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --address=0.0.0.0    #修改为0.0.0.0

修复kube-controller-manager同理。
创建对应的svc修复kube-controller-manager
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP

一般自定义监控的步骤是:
1、创建ServiceMonitor
2、创建service
自定义监控etcd
1、根据etcd的3个证书创建secret
$ kubectl get pod etcd-master -n kube-system -o yaml
--cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
$ kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
2、在名字为k8s的prometheus资源类型中使用secret
$ vim prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
3、创建endpoints和无头service
为了能监控k8s集群外的etcd,所有需要多创建一个endpoints
apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 2379
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 192.168.1.243
    nodeName: etc-master
  ports:
  - name: port
    port: 2379
    protocol: TCP
192.168.1.243是etcd所在的主机地址
4、创建ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd-k8s
spec:
  jobLabel: k8s-app
  endpoints:
  - port: port
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  selector:
    matchLabels:
      k8s-app: etcd
  namespaceSelector:
    matchNames:
    - kube-system
5、修改etcd的yaml文件
$ vim /etc/kubernetes/manifests/etcd.yaml
listen-client-urls=https://0.0.0.0:2379,https://192.168.1.243:2379

在prometheus operator中实现服务自动发现
1、创建实现服务自动发现的文件
$ vim prometheus-additional.yaml
- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
2、创建secret
$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
3、在名字为k8s的prometheus资源类型中使用secret
$ vim prometheus-prometheus.yaml
或者
$ kubectl edit prometheus k8s -n monitoring
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
  serviceAccountName: prometheus-k8s
4、增加发现服务的权限
$ cat prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch

为prometheus operator实现数据持久化
1、创建一个StorageClass对象
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-data-db
provisioner: fuseim.pri/ifs
2、在pvc模板中使用存储类prometheus-data-db
$ cat prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: prometheus-data-db
        resources:
          requests:
            storage: 10Gi
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
更新完成后会自动生成两个 PVC 和 PV 资源对象,因为副本数是2


你可能感兴趣的:(有关prometheus的总结)