K8S集群监控—cAdvisor+Heapster+InfluxDB+Grafana

一、集群监控原理
cAdvisor:容器数据收集。
Heapster:集群监控数据收集,汇总所有节点监控数据。
InfluxDB:时序数据库,存储监控数据。
Grafana:可视化展示。
二、搭建cAdvisor+Heapster+InfluxDB+Grafana
1.从官网上拉取安装包
获取v1.5.2heapster+influxdb+grafana安装yaml文件到 heapster release 页面下载最新版本的 heapster:
wget https://github.com/kubernetes/heapster/archive/v1.5.2.zip
unzip v1.5.2.zip
解压后 
cd heapster-1.5.2/deploy/kube-config/influxdb/
里面有3个配置文件,我们依次修改一下就可以使用了。
具体代码如下:
vi influxdb.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: monitoring-influxdb
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: influxdb
    spec:
      containers:
      - name: influxdb
        image: registry.cn-hangzhou.aliyuncs.com/google-containers/heapster-influxdb-amd64:v1.1.1
        volumeMounts:
        - mountPath: /data
          name: influxdb-storage
      volumes:
      - name: influxdb-storage
        emptyDir: {}

---
apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-influxdb
  name: monitoring-influxdb
  namespace: kube-system
spec:
  type: NodePort
  ports:
  - nodePort: 31001
    port: 8086
    targetPort: 8086
  selector:
    k8s-app: influxdb
    
执行:kubectl create -f influxdb.yaml

vi heapster.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: heapster
  namespace: kube-system

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: heapster
subjects:
  - kind: ServiceAccount
    name: heapster
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: heapster
    spec:
      serviceAccountName: heapster
      containers:
      - name: heapster
        image: registry.cn-hangzhou.aliyuncs.com/google-containers/heapster-amd64:v1.4.2
        imagePullPolicy: IfNotPresent
        command:
        - /heapster
        - --source=kubernetes:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true 
        - --sink=influxdb:http://monitoring-influxdb.kube-system.svc.cluster.local:8086

---

apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: Heapster
  name: heapster
  namespace: kube-system
spec:
  ports:
  - port: 80
    targetPort: 8082
  selector:
    k8s-app: heapster
执行: kubectl create -f heapster.yaml

vi grafana.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: monitoring-grafana
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      containers:
      - name: grafana
        image: registry.cn-hangzhou.aliyuncs.com/google-containers/heapster-grafana-amd64:v4.4.1
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ca-certificates
          readOnly: true
        - mountPath: /var
          name: grafana-storage
        env:
        - name: INFLUXDB_HOST
          value: monitoring-influxdb
        - name: GF_SERVER_HTTP_PORT
          value: "3000"
          # The following env variables are required to make Grafana accessible via
          # the kubernetes api-server proxy. On production clusters, we recommend
          # removing these env variables, setup auth for grafana, and expose the grafana
          # service using a LoadBalancer or a public IP.
        - name: GF_AUTH_BASIC_ENABLED
          value: "false"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          value: Admin
        - name: GF_SERVER_ROOT_URL
          # If you're only using the API Server proxy, set this value instead:
          # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
          value: /
      volumes:
      - name: ca-certificates
        hostPath:
          path: /etc/ssl/certs
      - name: grafana-storage
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
  name: monitoring-grafana
  namespace: kube-system
spec:
  # In a production setup, we recommend accessing Grafana through an external Loadbalancer
  # or through a public IP.
  # type: LoadBalancer
  # You could also use NodePort to expose the service at a randomly-generated port
  # type: NodePort
  type: NodePort
  ports:
  - nodePort: 30108
    port: 80
    targetPort: 3000
  selector:
    k8s-app: grafana
    
执行:kubectl create -f grafana.yaml 


查看grafana 所在的节点:
kubectl get pod -n kube-system -o wide|grep grafana
经查看在node3上 ip为:172.18.0.25
在浏览器中访问grafana地址 http://172.18.0.25:30108
登录用户名:admin 密码:admin


报错以及解决方案:
一、如遇到Error from server (AlreadyExists): error when creating "influxdb.yaml": deployments.extensions "monitoring-influxdb" already exists
解决方案:
kubectl delete -f influxdb.yaml  即删除yaml文件


二、如在Cluster里看不到节点信息,检查heapster的日志
kubectl get pods -n kube-system  #查找pod
kubectl logs heapster-77fb88dbfc-k4tr9   #查看日志
报错信息如下:
http://172.17.0.26:10255/stats/container/": Post http://172.17.0.26:10255/stats/container/: dial tcp 172.17.0.26:10255: getsockopt: connection refused
E0710 06:04:05.031893       1 kubelet.go:231] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://172.18.0.25:10255/stats/container/": Post http://172.18.0.25:10255/stats/container/: dial tcp 172.18.0.25:10255: getsockopt: connection refused
E0710 06:04:05.037431       1 kubelet.go:231] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://172.19.0.22:10255/stats/container/": Post http://172.19.0.22:10255/stats/container/: dial tcp 172.19.0.22:10255: getsockopt: connection refused


通过kubectl top 命令也获取不到结果:
kubectl top pod
输出:error: Metrics not available for pod default/nginx-64f497f8fd-4c6h2, age: 1148h17m14.701943925s
kubectl top node
输出:error: metrics not available yet

解决方法:
在heapster.yaml 清单文件中进行如下修改
- --source=kubernetes:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true 
- --sink=influxdb:http://monitoring-influxdb.kube-system.svc.cluster.local:8086
然后删除heapster重建
kubectl delete -f heapster.yaml 
kubectl apply -f  heapster.yaml 

heapster默认30秒检查一次,因此需要等上30s才会收集到数据
再次登录到dashboard,可以单独内存、CPU信息
 

你可能感兴趣的:(k8s相关技术文档)