Helm 安装 Prometheus 详解 以及 问题解决

Helm install prometheus


kubectl create ns monitor

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo prometheus

# prometheus
helm show values prometheus-community/prometheus > prometheus.yaml
helm install prometheus prometheus-community/prometheus  -f prometheus.yaml -n monitor

uninstall

helm  uninstall prometheus  -n monitor

Helm install ingress

1.  添加ingress的helm仓库
01.# helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
02.# helm search repo ingress-nginx
#要使用APP VERSION大于0.4.2的版本
2.  下载ingress的helm包至本地
# helm pull ingress-nginx/ingress-nginx --version 3.6.0
3.  更改对应的配置
tar xvf ingress-nginx-3.6.0.tgz
cd ingress-nginx
vim values.yaml
4.  需要修改的位置
a)  Controller和admissionWebhook的镜像地址,需要将公网镜像同步至公司内网镜像仓库(和文档不一致的版本,需要自行同步gcr镜像的,可以百度查一下使用阿里云同步gcr的镜像,也可以参考这个链接https://blog.csdn.net/weixin_39961559/article/details/80739352,
或者参考这个链接: https://blog.csdn.net/sinat_35543900/article/details/103290782)
修改repository为地址registry.cn-beijing.aliyuncs.com/dotbalo/controller,并注释掉哈希值;
    ////Controller和admissionWebhook的镜像备选的地址
image:
    registry: registry.aliyuncs.com  #修改镜像仓库地址
    image: google_containers/nginx-ingress-controller #修改镜像仓库和镜像名 ////
     
b)  镜像的hash值注释;
c)  hostNetwork设置为true;
d)  dnsPolicy设置为 ClusterFirstWithHostNet;
e)  nodeSelector添加ingress: "true"部署至指定节点;
f)  默认的类型是Deployment,更改为kind: DaemonSet;
g)  type: 默认是LoadBalancer(云环境使用这个) ,修改为ClusterIP;
h)  建议根据生产实际环境修改requests;
i)  建议根据生产实际环境修改admissionWebhooks;
要使用APP VERSION大于0.4.2的版本,大于这个版本,这个enabled不需要修改
j)  image修改镜像地址为registry.cn-beijing.aliyuncs.com/dotbalo/kube-webhook-certgen
    //此项的备用地址参考a项目的备用地址//
 
5.  部署ingress
给需要部署ingress的节点上打标签
  01.//创建命名空间叫ingress-nginx
# kubectl create ns ingress-nginx
  02.//获取所有namespace;
# kubectl get ns
  //查看到ingress-nginx创建完成;//
  03.//取所有工作节点
# kubectl get node
  04.//比如我们给部署在master03上ingress的节点上打标签
# kubectl label node k8s-master03 ingress=true
node/k8s-master03 labeled
  05.//注意末尾的 . (点)
# helm install ingress-nginx -n ingress-nginx .
  06.//镜像拉取快慢取决于镜像地址,国内的阿里云比较快(多次刷新看到结果Ready 1/1,STATUS:Running为止)
[root@k8s-master01 ingress-nginx]# kubectl get pod -n ingress-nginx
 
6.  将ingress controller部署至Node节点(ingress controller不能部署在master节点,需要安装视频中的步骤将ingress controller部署至Node节点,生产环境最少三个ingress controller,并且最好是独立的节点)
kubectl label node k8s-node01 ingress=true
kubectl label node k8s-master03 ingress-

image.png

ingress-nginx 配置使用

image.png


[root@vm2 ~]# cat ingress_alertmanager.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress  
metadata:
  name: alertmanager-ingress
  namespace: monitor
spec:
  ingressClassName: nginx
  rules:
  - host: "alertmanager.test.com"
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: prometheus-alertmanager
            port:
              number: 9093
[root@vm2 ~]# 
[root@vm2 ~]# cat ingress_prometheus.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress  
metadata:
  name: prometheus-ingress
  namespace: monitor
spec:
  ingressClassName: nginx
  rules:
  - host: "prometheus.test.com"
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: prometheus-server
            port:
              number: 80
生效:kubectl apply -f ingress_alertmanager.yaml

image.png

prometheus 配置

[root@vm2 ~]# kubectl -n monitor get cm
NAME                      DATA   AGE
prometheus-alertmanager   1      18h
prometheus-server         6      18h


kubectl -n monitor edit cm prometheus-server
.
.
- job_name: node-instance
      honor_timestamps: true
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: http
      static_configs:
      - targets:
        - 192.168.1x.11:9100
        - 192.168.1x.16:9100
   

验证

image.png

image.png

问题解决

pod ImagePullBackOff

kubectl  describe  pod prometheus-kube-state-metrics-xxxx  -n monitor
kubectl edit pod  prometheus-kube-state-metrics-xxxx  -n monitor
同样的,我们通过docker仓库找一下相同的,然后通过​​kubectl edit pod​​修改一下
k8s.gcr.io/kube-state-metrics/kube-state-metrics 替换为: docker.io/dyrnq/kube-state-metrics:v2.3.0

pod pending 问题

image.png

  • 解决问题思路方法:

    1. 使用logs或者describe 查看定位问题
 # kubectl describe pod prometheus-server-6d4664d595-pch8q -n monitor
。
。
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  117s (x619 over 15h)  default-scheduler  0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
  1. 报错“6 pod has unbound immediate PersistentVolumeClaims”:没立即绑定pvc
  2. 来查看 namespace下的pvc实际情况

     [root@vm2 ~]# kubectl get pvc -n monitor
    NAME                                STATUS    VOLUME                    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    prometheus-server                   Pending                                                                      15h
    storage-prometheus-alertmanager-0   Bound     pv-volume-alertmanager1   2Gi        RWO                           15h
    
    
    1. pvc是pending状态未 bound pv,现在需要创建一个pv

image.png

  1. 查看需要创建相应pv的信息
使用: helm show values prometheus-community/prometheus

这里我们已经把配置重定向到本地的prometheus.yaml 文件中,直接去文件中查看即可

image.png

image.png

  1. 创建 一个pv

    
    [root@vm2 ~]# cat prometheus_pv.yaml 
    kind: PersistentVolume
    apiVersion: v1
    metadata:
      namespace: monitor
      name: pv-volume-prometheus
      labels:
     type: local
    spec:
      capacity:
     storage: 8Gi
      accessModes:
     - ReadWriteOnce
      hostPath:
     path: "/home/pv/prometheus/prometheus-server"
记得在节点上创建 目录
  1. 验证

image.png

CrashLoopBackOff 问题

1.describe 查看pod 问题

image.png

  1. 发现问题描述不清晰 只能去 node上 查看容器日志(目测在vm3上)

image.png

# 在vm3上
docker ps -a  #寻找exit的容器

# 查看日志
docker logs xxxxID

  1. 问题看起来是 权限导致的

    msg="Error opening query log file" file=/data/queries.active err="open /data/queries.active: permission denied"

权限问题,监控套件基于kube-prometheus构建,prometheus的镜像中文件/prometheus/queries.active属主为1000这个用户,当前nfs路径prometheus-k8s-db-prometheus-k8s-0属主是root用户(有权限风险),从而导致写入失败。

image.png

修改PV的路径权限为777,确保后续pod中属主为1000的用户也可以对文件进行操作

4.验证

image.png

问题 Error: INSTALLATION FAILED: chart requires kubeVersion: >=1.20.0-0 which is incompatible with Kubernetes v1.19.8

安装ingress 碰到了 k8s版本过低的问题
helm pull ingress-nginx/ingress-nginx --version 3.6.0

image.png

你可能感兴趣的:(kuberneteshelm)