kubernetes1.13集群监控方案

八、集群监控方案

1、安装heapster

在1.13中已经弃用,功能被Metrics-Server和prometheus代替。

2、安装Metrics-Server

kubernetesv1.11以后不再支持通过heaspter采集监控数据,你应该使用新的监控数据采集组件metrics-server,比heaspter轻量很多,也不做数据的持久化存储,提供实时的监控数据查询还是很好用的。

k8s的hpa功能也是通过metrics-server会采集cpu,内存指标来实现的。

官方文档:
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/
https://github.com/kubernetes-incubator/metrics-server

下载metrics-server,修改metrics-server-deployment.yaml中的image和command块

git clone https://github.com/kubernetes-incubator/metrics-server.git

vim metrics-server/deploy/1.8+/metrics-server-deployment.yaml

      containers:
      - name: metrics-server
        image: cloudnil/metrics-server-amd64:v0.3.1   #修改
        imagePullPolicy: Always
        command:                                 #增加
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        - --v=4                   
        - --logtostderr
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

执行部署命令:


[root@master] ~$ kubectl apply  -f metrics-server/deploy/1.8+/

稍等一会,查看kubectl top pods是否成功

[root@master] ~$ kubectl top pods
NAME                     CPU(cores)   MEMORY(bytes)   
curl-66959f6557-r4crd    0m           1Mi             
nginx-58db6fdb58-5wt7p   0m           2Mi             
nginx-58db6fdb58-bhmcv   0m           2Mi             

[root@master] ~$ kubectl top nodes
NAME               CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
master.hanli.com   126m         12%    1344Mi          70%       
slave1.hanli.com   44m          4%     487Mi           55%       
slave2.hanli.com   41m          4%     467Mi           53%       
slave3.hanli.com   46m          4%     692Mi           78%       

通过命令行向api发送请求

[root@master] ~$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes
> 

网页请求地址

https://192.168.255.130:6443/apis/metrics.k8s.io/v1beta1/nodes

3、安装Prometheus

官方文档:https://github.com/coreos/prometheus-operator
参考文档:https://blog.csdn.net/networken/article/details/85620793

克隆prometheus-operator仓库到本地:

[root@master] ~$ git clone https://github.com/coreos/prometheus-operator.git

所有yaml文件位于prometheus-operator/contrib/kube-prometheus/manifests,指定目录一键部署:

[root@master] ~$ kubectl apply -f prometheus-operator/contrib/kube-prometheus/manifests/

查看pod

[root@master] ~$ kubectl get pod -n monitoring  -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP                NODE               NOMINATED NODE   READINESS GATES
alertmanager-main-0                    2/2     Running   0          144m   10.244.1.10       slave1.hanli.com              
alertmanager-main-1                    2/2     Running   0          143m   10.244.1.11       slave1.hanli.com              
alertmanager-main-2                    2/2     Running   0          142m   10.244.0.8        master.hanli.com              
grafana-777cf74b98-v9czp               1/1     Running   0          153m   10.244.3.6        slave3.hanli.com              
kube-state-metrics-66c5b5b6d4-cwgkq    0/4     Pending   0          16s                                        
kube-state-metrics-6784748c86-j5mnz    4/4     Running   0          153m   10.244.3.7        slave3.hanli.com              
node-exporter-klgfj                    2/2     Running   0          153m   192.168.255.130   master.hanli.com              
node-exporter-tgh4f                    2/2     Running   0          153m   192.168.255.123   slave3.hanli.com              
node-exporter-z24dz                    2/2     Running   0          153m   192.168.255.121   slave1.hanli.com              
node-exporter-z9pb8                    2/2     Running   0          153m   192.168.255.122   slave2.hanli.com              
prometheus-adapter-66fc7797fd-hhwms    1/1     Running   0          153m   10.244.2.11       slave2.hanli.com              
prometheus-k8s-0                       3/3     Running   1          144m   10.244.2.12       slave2.hanli.com              
prometheus-k8s-1                       3/3     Running   0          144m   10.244.0.7        master.hanli.com              
prometheus-operator-7df4c46d5b-826gp   1/1     Running   0          153m   10.244.3.5        slave3.hanli.com              

所有pod状态为running说明部署成功。

修改访问方式为NodePort

修改grafana-service.yaml文件

vim grafana-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
spec:
  type: NodePort      #添加内容
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 30100   #添加内容
  selector:
    app: grafana

修改prometheus-service.yaml

vim prometheus-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 30200
  selector:
    app: prometheus
    prometheus: k8s

修改alertmanager-service.yaml

vim alertmanager-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9093
    targetPort: web
    nodePort: 30300
  selector:
    alertmanager: main
    app: alertmanager

访问prometheus: http://192.168.255.130:30200
kubernetes1.13集群监控方案_第1张图片
Prometheus自己的指标http://192.168.255.130:30200/metrics
kubernetes1.13集群监控方案_第2张图片

prometheus的WEB界面上提供了基本的查询K8S集群中每个POD的CPU使用情况,查询条件如下:
sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )
kubernetes1.13集群监控方案_第3张图片

上述的查询有出现数据,说明node-exporter往prometheus中写入数据正常

访问grafana http://192.168.255.130:30100
用户名密码默认admin/admin

你可能感兴趣的:(kubernetes)