【Prometheus部署及排障】

安装部署

一创建StorageClass

因为需要持久化存储Prometheus的监控数据,且Prometheus不能直接使用pvc,所以需要创建StorageClass

  1. 设置存储分配器权限
    创建nfs-client-provisioner-authority.yaml文件,其中所有的namespace需要修改

     apiVersion: v1
     kind: ServiceAccount
     metadata:
       name: nfs-client-provisioner
       # replace with namespace where provisioner is deployed
       namespace: default      #需要修改
     ---
     kind: ClusterRole
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: nfs-client-provisioner-runner
     rules:
       - apiGroups: [""]
         resources: ["persistentvolumes"]
         verbs: ["get", "list", "watch", "create", "delete"]
       - apiGroups: [""]
         resources: ["persistentvolumeclaims"]
         verbs: ["get", "list", "watch", "update"]
       - apiGroups: ["storage.k8s.io"]
         resources: ["storageclasses"]
         verbs: ["get", "list", "watch"]
       - apiGroups: [""]
         resources: ["events"]
         verbs: ["create", "update", "patch"]
     ---
     kind: ClusterRoleBinding
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: run-nfs-client-provisioner
     subjects:
       - kind: ServiceAccount
         name: nfs-client-provisioner
         # replace with namespace where provisioner is deployed
         namespace: default           #需要修改
     roleRef:
       kind: ClusterRole
       name: nfs-client-provisioner-runner
       apiGroup: rbac.authorization.k8s.io
     ---
     kind: Role
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: leader-locking-nfs-client-provisioner
       # replace with namespace where provisioner is deployed
       namespace: default         #需要修改
     rules:
       - apiGroups: [""]
         resources: ["endpoints"]
         verbs: ["get", "list", "watch", "create", "update", "patch"]
     ---
     kind: RoleBinding
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: leader-locking-nfs-client-provisioner
       # replace with namespace where provisioner is deployed
       namespace: default         #需要修改
     subjects:
       - kind: ServiceAccount
         name: nfs-client-provisioner
         # replace with namespace where provisioner is deployed
         namespace: default        #需要修改
     roleRef:
       kind: Role
       name: leader-locking-nfs-client-provisioner
       apiGroup: rbac.authorization.k8s.io
    
  2. 创建NFS存储分配器

    创建nfs-client-provisioner.yaml文件,其中nfs需要提前创建好

     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: nfs-client-provisioner
       namespace: monitor
     spec:
       replicas: 1
       strategy:
         type: Recreate
       selector:
         matchLabels:
           app: nfs-client-provisioner
       template:
         metadata:
           labels:
             app: nfs-client-provisioner
         spec:
           serviceAccountName: nfs-client-provisioner
           containers:
             - name: nfs-client-provisioner
               image: quay.io/external_storage/nfs-client-provisioner:latest
               volumeMounts:
                 - name: nfs-client-root
                   mountPath: /persistentvolumes
               env:
                 # 存储分配器名称
                 - name: PROVISIONER_NAME
                   value: nfs-provisioner
                 # NFS服务器地址,设置为自己的IP
                 - name: NFS_SERVER
                   value: 172.31.1.64
                 # NFS共享目录地址
                 - name: NFS_PATH
                   value: /home/nfs
           volumes:
             - name: nfs-client-root
               nfs:
                 # 设置为自己的IP
                 server: 172.31.1.64
                 # 对应NFS上的共享目录
                 path: /home/nfs
    
  3. 创建storage-class

     # nfs-storage-class.yaml
     apiVersion: storage.k8s.io/v1
     kind: StorageClass
     metadata:
       name: nfs-data
       namespace: monitor
      
     # 存储分配器的名称
     # 对应“nfs-client-provisioner.yaml”文件中env.PROVISIONER_NAME.value
     provisioner: nfs-provisioner
      
     # 允许pvc创建后扩容
     allowVolumeExpansion: True
      
     parameters:
       # 资源删除策略,“true”表示删除PVC时,同时删除绑定的PV
       archiveOnDelete: "true"
    

二、安装Prometheus

为了方便,采用helm的方式

  1. 添加repo

     helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
     helm repo update prometheus-community
    
  2. 安装Prometheus

     helm install prometheus prometheus-community/prometheus -n monitor --create-namespace --set server.persistentVolume.storageClass=nfs-data --set alertmanager.persistence.storageClass=nfs-data --set server.service.type=NodePort --set alertmanager.service.type=NodePort --set server.service.nodePort=31203 --set alertmanager.service.nodePort=31757
    

故障排查

  1. prometheus-server pvc一直处于pending状态,或者会显示Waiting for a volume to be created either by the external provisioner ‘fuseim.pri/ifs’ or manually by the system administrator.

    原因:在 k8s 1.20 之后,出于对性能和统一 apiserver 调用方式的初衷,k8s 移除了对 SelfLink 的支持,而默认指定的 provisioner 版本需要 SelfLink 功能,因此 PVC 无法进行自动制备

    解决方法:将NFS存储分配器的镜像改为不需要SelfLink的版本,比如quay.io/external_storage/nfs-client-provisioner:latest。 或者启用SelfLink(不推荐)。

  2. Prometheus中status/target_health中所有metrics相关的endpoint都是403,所有pod相关的数据获取不到

    排查:

    首先在k8s中执行

     kubectl top pod
    

    如果返回error: Metrics API not available说明metrics没安装

    解决方法:安装metrics-server

你可能感兴趣的:(k8s,prometheus,prometheus,kubernetes,windows)