第十一周记录

一.kubernetes 二进制部署的prometheus实现服务发现

本例kubernetes集群外的Prometheus Server地址是:172.23.1.12

在monitoring命名空间创建服务发现账号prometheus并授权。

1、创建用户并授权

这个文件实现了在 Kubernetes 集群中创建一个名为 prometheus 的服务账户(ServiceAccount),用于后续配置监控工具 Prometheus 访问 Kubernetes API 获取监控数据。同时,也创建了相应的权限配置和授权,包括:

  • 一个 ClusterRole 对象,名为 prometheus,授予 getlistwatch 权限来访问 Kubernetes API 中的 nodesservicesendpointspodsconfigmaps 资源,以及 ingresses 资源(在 extensions API 组中)。此外,还授权 get 权限来访问 nodes/metrics/metrics(非资源 URL)。
  • 一个 ClusterRoleBinding 对象,将上面创建的 prometheus ClusterRole 授权给前面创建的 prometheus ServiceAccount,使其具有相应的访问权限。这个 ClusterRoleBinding 对象将 ServiceAccount 和 ClusterRole 绑定在一起,使得 ServiceAccount 拥有了 ClusterRole 中定义的权限。
root@deploy:/yaml/promethrus-case# cat case4-prom-rbac.yaml 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring

---
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: monitoring-token
  namespace: monitoring
  annotations:
    kubernetes.io/service-account.name: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
#apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring

2、获取token

root@deploy:/yaml/promethrus-case# kubectl get secrets -n monitoring 
NAME               TYPE                                  DATA   AGE
monitoring-token   kubernetes.io/service-account-token   3      14m
root@deploy:/yaml/promethrus-case# kubectl describe secrets monitoring-token -n monitoring 
Name:         monitoring-token
Namespace:    monitoring
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: prometheus
              kubernetes.io/service-account.uid: cff3f380-8d51-4d25-a71b-1d6d5cf1a39c

Type:  kubernetes.io/service-account-token

Data
====
namespace:  10 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6Ik4zUTREdWdUMUp5Wk9KSmczbnBFdUk3eXVHYW53THRQVFpsSzhsbVcyS2MifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtb25pdG9yaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3JpbmctdG9rZW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImNmZjNmMzgwLThkNTEtNGQyNS1hNzFiLTFkNmQ1Y2YxYTM5YyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDptb25pdG9yaW5nOnByb21ldGhldXMifQ.g8SAI74UbRs7wUN-2xKWsO_G_3grvBjXCSsGk2_7Te5W2No0jXkD4g57ofWFnLYI7QKQE9XfiE2cn3X0Rq8RJdqBQrZWXBc1jubDViv71ktDGeHtooJFeul4v9IXn5y2wowhl3VLGDEtMyXTb7bk8E6Q5akTupsJ_aw_DtAsuLiVEX51Ldl8FBrXXB453xyCyKWgcSv5dW5J7BJ4wrWZHAIaYXx7QNmF88wennsx5RXeTZ41o378zSfTc0yVKUbSggU-9_kkROdESKbqwGG7zhaWGvOA_OHaKI9ULfMr-Q-Uqw5BMJEs313m_fU4lozHNcSVU9AJexTqn1toW06j3w
ca.crt:     1302 bytes

将token保存到Prometheus Server节点的k8s.token文件中,后期用于权限验证

root@prometheus-server:~# vim /apps/prometheus/k8s.token
eyJhbGciOiJSUzI1NiIsImtpZCI6Ik4zUTREdWdUMUp5Wk9KSmczbnBFdUk3eXVHYW53THRQVFpsSzhsbVcyS2MifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtb25pdG9yaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3JpbmctdG9rZW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImNmZjNmMzgwLThkNTEtNGQyNS1hNzFiLTFkNmQ1Y2YxYTM5YyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDptb25pdG9yaW5nOnByb21ldGhldXMifQ.g8SAI74UbRs7wUN-2xKWsO_G_3grvBjXCSsGk2_7Te5W2No0jXkD4g57ofWFnLYI7QKQE9XfiE2cn3X0Rq8RJdqBQrZWXBc1jubDViv71ktDGeHtooJFeul4v9IXn5y2wowhl3VLGDEtMyXTb7bk8E6Q5akTupsJ_aw_DtAsuLiVEX51Ldl8FBrXXB453xyCyKWgcSv5dW5J7BJ4wrWZHAIaYXx7QNmF88wennsx5RXeTZ41o378zSfTc0yVKUbSggU-9_kkROdESKbqwGG7zhaWGvOA_OHaKI9ULfMr-Q-Uqw5BMJEs313m_fU4lozHNcSVU9AJexTqn1toW06j3w

3、添加job

在Prometheus的配置文件:/apps/prometheus/prometheus.yml 里面添加 job 配置。

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
    
    # api-server节点发现
  - job_name: "kubernetes-apiserver"
    kubernetes_sd_configs:
    - role: endpoints
      api_server: https://172.23.0.11:6443
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /apps/prometheus/k8s.token
    scheme: https
   # tls_config: 
   #   insecure_skip_verify: true
   # bearer_token_file: /apps/prometheus/k8s.token
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https
      
    # 自定义替换发现服务器端口、协议等
    - source_labels: [__address__]
      regex: '(.*):6443'
      replacement: '${1}:9100'
      target_label: __address__
      action: replace
      
    - source_labels: [__scheme__]
      regex: https
      replacement: http
      target_label: __scheme__
      action: replace
  
  # node节点发现
  - job_name: 'kubernetes-node-monitor'
    scheme: http
    tls_config: 
      insecure_skip_verify: true
    bearer_token_file: /apps/prometheus/k8s.token      
    kubernetes_sd_configs:
    - role:  node
      api_server: https://172.23.0.11:6443
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /apps/prometheus/k8s.token 
    relabel_configs:
    - source_labels: [__address__]
      regex: '(.*):10250'
      replacement: '${1}:9100'
      target_label: __address__
      action: replace
    - source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region] 
      regex: '(.*)'  
      replacement: '${1}'
      action: replace
      target_label: LOC        
  
    - source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region] 
      regex: '(.*)'    
      replacement: 'NODE'
      action: replace
      target_label: Type   

    - source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region] 
      regex: '(.*)'    
      replacement: 'k8s-test'
      action: replace
      target_label: Env
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
      

    # 指定namespace的Pod
  - job_name: 'k8s-发现指定namespace的所有Pod'
    kubernetes_sd_configs:
    - role: pod
      api_server: https://172.23.0.11:6443
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /apps/prometheus/k8s.token
      namespaces:
        names:
        - magedu
        - monitoring
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name
      
      
  # 指定Pod发现条件
  - job_name: 'k8s-指定发现条件的Pod'
    kubernetes_sd_configs:
    - role: pod
      api_server: https://172.23.0.11:6443
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /apps/prometheus/k8s.token
    relabel_configs:
    
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]        
      action: keep
      regex: true  # 这段代码中的source_labels指定了要从Pod注解中抽取的标签名称[__meta_kubernetes_pod_annotation_prometheus_io_scrape],它的值表示一个布尔类型的字符串,指示当前Pod是否应该被Prometheus抓取。如果该标签值为true,则表示需要抓取该Pod的指标;否则,该Pod将被忽略。如果这段代码没有成功匹配任何Pod,或者没有符合条件的Pod存在,那么后面的代码将不会执行。这是因为Prometheus的抽取规则(relabel_config)是按顺序执行的,如果前面的规则没有匹配到任何目标,那么后面的规则就不会被执行。      
      
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name
    - source_labels: [__meta_kubernetes_pod_label_pod_template_hash] 
      regex: '(.*)'    
      replacement: 'k8s-test'
      action: replace
      target_label: Env
    - source_labels: [__meta_kubernetes_pod_ip]
      action: replace
      target_label: pod_ip        

加载配置,可以在任一节点执行(只要它跟Prometheus的网络是通的)。

前提是在 /etc/systemd/system/prometheus.service 需要添加参数:--web.enable-lifecycle

curl -X POST http://172.23.1.12:9090/-/reload

4、验证数据

第十一周记录_第1张图片

第十一周记录_第2张图片

4、static_configs

从这往后的实验环境:k8s集群外的Prometheus Server(172.23.1.12)

1、Prometheus配置文件

root@prometheus-server:/apps/prometheus# cat prometheus.yml
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "prometheus-k8s-node"
    static_configs:
      - targets: ["172.23.0.20:9100","172.23.0.21:9100","172.23.0.11:9100"]
  - job_name: "prometheus-worknode"
    static_configs:
      - targets: ["172.23.1.11:9100","172.23.1.13:9100"]

  - job_name: "prometheus-cadvisor"
    static_configs:
      - targets: ["172.23.0.10:8080","172.23.0.20:8080","172.23.0.21:8080"]


二.prometheus 基于consul、file实现服务发现

1、consul_sd_configs

官网:Consul by HashiCorp

consul是分布式 key/value 数据存储集群,目前常用于服务的注册和发现。

1、部署consul集群

二进制可执行文件:Consul Versions | HashiCorp Releases

环境:node01-192.168.0.122,node02-192.168.0.123,node03-192.168.0.124。其中将node01作为集群的Leader。

# 在所有节点安装consul
root@consul-node01:/usr/local/src# unzip consul_1.15.1_linux_amd64.zip 
Archive:  consul_1.15.1_linux_amd64.zip
  inflating: consul                  
  
root@consul-node01:/usr/local/src# cp consul /usr/local/bin/
root@consul-node01:/usr/local/src# consul -v
Consul v1.15.1
Revision 7c04b6a0
Build Date 2023-03-07T20:35:33Z
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

# 创建数据目录
root@consul-node01:/usr/local/src# mkdir -p /data/consul
# 参数
consul ageent -server  # 以server模式运行consul
-bootstarp  # 首次部署使用初始化模式
-bind  # 设置集群通信的监听地址
-client  # 设置客户端访问的监听地址
-data-dir  # 指定数据保存路径
-ui  # 启动内置静态web UI服务器,就是能让你登录web界面
-node  # 此节点的名称,集群中必须唯一
-datacenter=dc1  # 集群名称,默认是dc1
-join  # 加入到已有的consul环境

启动服务

# node01
root@consul-node01:~# nohup consul agent -server -bootstrap -bind=192.168.0.122 -client=192.168.0.122 -data-dir=/data/consul -ui -node=192.168.0.122 &
[1] 3114

第十一周记录_第3张图片

# 将node02加入集群
root@consul-node02:~# nohup consul agent -bind=192.168.0.123 -client=192.168.0.123 -data-dir=/data/consul -node=192.168.0.123 -join=192.168.0.122 &
[1] 31855

# 将node03加入集群
root@consul-node03:~# nohup consul agent -bind=192.168.0.124 -client=192.168.0.124 -data-dir=/data/consul -node=192.168.0.124 -join=192.168.0.122 &
[1] 32195

2、验证集群

查看日志:你在哪个路径执行命令,日志就保存在哪个路径

root@consul-node01:~# tail -f nohup.out 
2023-03-09T15:18:48.578+0800 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: 192.168.0.123 192.168.0.123
2023-03-09T15:18:48.579+0800 [INFO]  agent.server: member joined, marking health alive: member=192.168.0.123 partition=default
2023-03-09T15:19:26.034+0800 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: 192.168.0.124 192.168.0.124
2023-03-09T15:19:26.035+0800 [INFO]  agent.server: member joined, marking health alive: member=192.168.0.124 partition=default

第十一周记录_第4张图片

登录web页面查看:http://192.168.0.122:8500/

第十一周记录_第5张图片

3、测试写入数据

通过consul的api写入数据,将服务注册到Services,前提是要部署node-exporter。所以我们现在三个node节点上部署node-exporter。

# 在任意一台服务器执行都可以,只要它能连上本例的Leader。
# 先将node01和node02注册进去

curl -X PUT -d '{"id": "node-exporter122","name": "node-exporter122","address": "192.168.0.122","port": 9100,"tags": ["node-exporter"],"checks": [{"http": "http://192.168.0.122:9100/","interval": "5s"}]}' http://192.168.0.122:8500/v1/agent/service/register

curl -X PUT -d '{"id": "node-exporter123","name": "node-exporter123","address": "192.168.0.123","port": 9100,"tags": ["node-exporter"],"checks": [{"http": "http://192.168.0.123:9100/","interval": "5s"}]}' http://192.168.0.122:8500/v1/agent/service/register

如何删除注册的service?

# 这将从Consul中删除名为"node-exporter122"的服务。请注意,这将永久删除服务,无法恢复。如果您不确定,请务必先备份数据。

curl -X PUT http://192.168.0.122:8500/v1/agent/service/deregister/node-exporter122

4、consul验证数据

第十一周记录_第6张图片

5、Prometheus配置consul发现服务

1、主要配置字段
static_configs:  # 配置数据源
consul_sd_configs:  # 指定是基于consul的服务发现
relabel_configs:  # 重新标记
services: []  # 表示匹配consul中的所有service
2、k8s集群内的Prometheus配置文件

编辑Prometheus的configmap文件:case3-1-prometheus-cfg.yaml,追加下面的配置

    - job_name: 'consul'
      honor_labels: true
      metrics_path: /metrics
      scheme: http
      consul_sd_configs:
        - server: 192.168.0.122:8500
          services: []  # 发现的目标服务名称,空表示发现所有服务,也可以写指定服务名称(如本例的node-exporter122、node-exporter123)
        - server: 192.168.0.123:8500
          services: []
        - server: 192.168.0.124:8500
          services: []
      relabel_configs:
      - source_labels: [

你可能感兴趣的:(学习记录,kubernetes,docker,容器)