背景:最近看马哥的k8s教程,在《容器资源需求、资源限制及HeapSter》章节中,教程里kubectl top和grafana图形最终也没有显示出来;heapster会在后面的版本中废弃,所以不必纠结。我只是比较好奇而已。下面把遇到的问题及解决过程讲一下,我安装的k8s版本是v1.13.3。


查看版本

[ryuser@cdh-master metrics]$ kubectl get nodes
NAME                    STATUS   ROLES    AGE   VERSION
cdh-master.rongyi.com   Ready    master   41d   v1.13.3
cdh-slave.rongyi.com    Ready       41d   v1.13.3
cdh-slave2.rongyi.com   Ready       39d   v1.13.3


 

1、  创建heapster时,查看日志总是下面的错误

[ryuser@cdh-master metrics]$ kubectl logs heapster-f64999bc-25tvv -n kube-system
I0326 06:23:03.317063       1 heapster.go:78] /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
I0326 06:23:03.317170       1 heapster.go:79] Heapster version v1.5.4
I0326 06:23:03.317421       1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I0326 06:23:03.317437       1 configs.go:62] Using kubelet port 10255
I0326 06:23:03.341940       1 influxdb.go:312] created influxdb sink with options: host:monitoring-influxdb.kube-system.svc:8086 user:root db:k8s
I0326 06:23:03.341976       1 heapster.go:202] Starting with InfluxDB Sink
I0326 06:23:03.341985       1 heapster.go:202] Starting with Metric Sink
I0326 06:23:03.364225       1 heapster.go:112] Starting heapster on port 8082
E0326 06:24:05.006245       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.73:10255: failed to get all container stats from Kubelet URL "http://192.168.10.73:10255/stats/container/": Post http://192.168.10.73:10255/stats/container/: dial tcp 192.168.10.73:10255: getsockopt: connection refused
E0326 06:24:05.006326       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.77:10255: failed to get all container stats from Kubelet URL "http://192.168.10.77:10255/stats/container/": Post http://192.168.10.77:10255/stats/container/: dial tcp 192.168.10.77:10255: getsockopt: connection refused
E0326 06:24:05.006827       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.74:10255: failed to get all container stats from Kubelet URL "http://192.168.10.74:10255/stats/container/": Post http://192.168.10.74:10255/stats/container/: dial tcp 192.168.10.74:10255: getsockopt: connection refused
W0326 06:24:25.002576       1 manager.go:152] Failed to get all responses in time (got 0/3)
I0326 06:24:25.033246       1 influxdb.go:274] Created database "k8s" on influxDB server at "monitoring-influxdb.kube-system.svc:8086"
E0326 06:25:05.009902       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.77:10255: failed to get all container stats from Kubelet URL "http://192.168.10.77:10255/stats/container/": Post http://192.168.10.77:10255/stats/container/: dial tcp 192.168.10.77:10255: getsockopt: connection refused
E0326 06:25:05.010317       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.73:10255: failed to get all container stats from Kubelet URL "http://192.168.10.73:10255/stats/container/": Post http://192.168.10.73:10255/stats/container/: dial tcp 192.168.10.73:10255: getsockopt: connection refused
E0326 06:25:05.024937       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.74:10255: failed to get all container stats from Kubelet URL "http://192.168.10.74:10255/stats/container/": Post http://192.168.10.74:10255/stats/container/: dial tcp 192.168.10.74:10255: getsockopt: connection refused
W0326 06:25:25.002198       1 manager.go:152] Failed to get all responses in time (got 0/3)
E0326 06:26:05.011184       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.77:10255: failed to get all container stats from Kubelet URL "http://192.168.10.77:10255/stats/container/": Post http://192.168.10.77:10255/stats/container/: dial tcp 192.168.10.77:10255: getsockopt: connection refused
E0326 06:26:05.014660       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.73:10255: failed to get all container stats from Kubelet URL "http://192.168.10.73:10255/stats/container/": Post http://192.168.10.73:10255/stats/container/: dial tcp 192.168.10.73:10255: getsockopt: connection refused
E0326 06:26:05.021066       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.74:10255: failed to get all container stats from Kubelet URL "http://192.168.10.74:10255/stats/container/": Post http://192.168.10.74:10255/stats/container/: dial tcp 192.168.10.74:10255: getsockopt: connection refused

 

2、kubectl top 命令也获取不到想要的结果

[ryuser@cdh-master metrics]$ kubectl top pod
W0326 15:13:19.303263 20846 top_pod.go:259] Metrics not available for pod default/client, age: 980h4m21.303224766s
error: Metrics not available for pod default/client, age: 980h4m21.303224766s
[ryuser@cdh-master metrics]$ kubectl top node
error: metrics not available yet


解决办法:

#在heapster.yaml清单文件中进行如下修改
- --source=kubernetes:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true 
- --sink=influxdb:http://monitoring-influxdb.kube-system.svc.cluster.local:8086

 

然后删除heapster重建

kubectl delete -f heapster.yaml

kubectl apply -f heapster.yaml

 

 

continue


3、 又遇到403错误

 

403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"

[ryuser@cdh-master metrics]$ kubectl logs -f heapster-5fcf457b-zq99c  -n kube-system
I0326 07:36:23.229287       1 heapster.go:78] /heapster --source=kubernetes:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true --sink=influxdb:http://monitoring-influxdb.kube-system.svc.cluster.local:8086
I0326 07:36:23.229348       1 heapster.go:79] Heapster version v1.5.4
I0326 07:36:23.229602       1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I0326 07:36:23.229618       1 configs.go:62] Using kubelet port 10250
I0326 07:36:23.334904       1 influxdb.go:312] created influxdb sink with options: host:monitoring-influxdb.kube-system.svc.cluster.local:8086 user:root db:k8s
I0326 07:36:23.334946       1 heapster.go:202] Starting with InfluxDB Sink
I0326 07:36:23.334955       1 heapster.go:202] Starting with Metric Sink
I0326 07:36:23.347573       1 heapster.go:112] Starting heapster on port 8082
E0326 07:37:05.028341       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.74:10250: failed to get all container stats from Kubelet URL "https://192.168.10.74:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:37:05.096629       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.73:10250: failed to get all container stats from Kubelet URL "https://192.168.10.73:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:37:05.157683       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.77:10250: failed to get all container stats from Kubelet URL "https://192.168.10.77:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
W0326 07:37:25.003226       1 manager.go:152] Failed to get all responses in time (got 0/3)
I0326 07:37:25.037245       1 influxdb.go:274] Created database "k8s" on influxDB server at "monitoring-influxdb.kube-system.svc.cluster.local:8086"
E0326 07:38:05.013221       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.77:10250: failed to get all container stats from Kubelet URL "https://192.168.10.77:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:38:05.019540       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.74:10250: failed to get all container stats from Kubelet URL "https://192.168.10.74:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:38:05.022849       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.73:10250: failed to get all container stats from Kubelet URL "https://192.168.10.73:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
W0326 07:38:25.003081       1 manager.go:152] Failed to get all responses in time (got 0/3)
E0326 07:39:05.010246       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.73:10250: failed to get all container stats from Kubelet URL "https://192.168.10.73:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:39:05.019238       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.74:10250: failed to get all container stats from Kubelet URL "https://192.168.10.74:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:39:05.024794       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.77:10250: failed to get all container stats from Kubelet URL "https://192.168.10.77:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
W0326 07:39:25.004236       1 manager.go:152] Failed to get all responses in time (got 0/3)
E0326 07:40:05.016757       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.77:10250: failed to get all container stats from Kubelet URL "https://192.168.10.77:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:40:05.020030       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.74:10250: failed to get all container stats from Kubelet URL "https://192.168.10.74:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
E0326 07:40:05.020763       1 manager.go:101] Error in scraping containers from kubelet:192.168.10.73:10250: failed to get all container stats from Kubelet URL "https://192.168.10.73:10250/stats/container/": request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=create, resource=nodes, subresource=stats)"
W0326 07:40:25.002318       1 manager.go:152] Failed to get all responses in time (got 0/3)

 

 

解决办法

查看ClusterRole: system:heapster的权限,发现的确没有针对Resource: nodes/stats 的create权限

[ryuser@cdh-master metrics]$ kubectl describe clusterrole system:heapster
Name:         system:heapster
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{"rbac.authorization.kubernetes.io/autoupdate"...
              rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
  Resources               Non-Resource URLs  Resource Names  Verbs
  ---------               -----------------  --------------  -----
  events                  []                 []              [get list watch]
  namespaces              []                 []              [get list watch]
  nodes/stats             []                 []              [get list watch]
  nodes                   []                 []              [get list watch]
  pods                    []                 []              [get list watch]
  deployments.extensions  []                 []              [get list watch]



修改ClusterRole: system:heapster的权限


生成清单文件

kubectl get clusterrole system:heapster -o yaml > heapster_modify.yaml


修改文件,增加verbs:create权限,增加resources:nodes/stats

vim  heapster_modify.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{"rbac.authorization.kubernetes.io/autoupdate":"true"},"creationTimestamp":"2019-02-12T10:41:33Z","labels":{"kubernetes.io/bootstrapping":"rbac-defaults"},"name":"system:heapster","resourceVersion":"70","selfLink":"/apis/rbac.authorization.k8s.io/v1/clusterroles/system%3Aheapster","uid":"c3bd303a-2eb2-11e9-9c98-005056be639a"},"rules":[{"apiGroups":[""],"resources":["events","namespaces","nodes","pods"],"verbs":["create","get","list","watch"]},{"apiGroups":["extensions"],"resources":["deployments"],"verbs":["get","list","watch"]}]}
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: "2019-02-12T10:41:33Z"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:heapster
  resourceVersion: "4109335"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/system%3Aheapster
  uid: c3bd303a-2eb2-11e9-9c98-005056be639a
rules:
- apiGroups:
  - ""
  resources:
  - events
  - namespaces
  - nodes
  - pods
  - nodes/stats  # 增加
  verbs:
  - create   #增加
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - deployments
  verbs:
  - get
  - list
  - watch

 

删除heapster重新部署

kubectl delete -f heapster.yaml

kubectl apply -f heapster.yaml

 

 


终于不报错了。

[ryuser@cdh-master metrics]$ kubectl logs -f heapster-5fcf457b-vhrxf  -n kube-system
I0326 07:47:00.574138       1 heapster.go:78] /heapster --source=kubernetes:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true --sink=influxdb:http://monitoring-influxdb.kube-system.svc.cluster.local:8086
I0326 07:47:00.574204       1 heapster.go:79] Heapster version v1.5.4
I0326 07:47:00.574470       1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I0326 07:47:00.574487       1 configs.go:62] Using kubelet port 10250
I0326 07:47:00.639292       1 influxdb.go:312] created influxdb sink with options: host:monitoring-influxdb.kube-system.svc.cluster.local:8086 user:root db:k8s
I0326 07:47:00.639338       1 heapster.go:202] Starting with InfluxDB Sink
I0326 07:47:00.639354       1 heapster.go:202] Starting with Metric Sink
I0326 07:47:00.670576       1 heapster.go:112] Starting heapster on port 8082
I0326 07:48:05.366442       1 influxdb.go:274] Created database "k8s" on influxDB server at "monitoring-influxdb.kube-system.svc.cluster.local:8086"



kubectl top

[ryuser@cdh-master metrics]$ kubectl top nodes
NAME                    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
cdh-master.rongyi.com   158m         3%     2550Mi          69%       
cdh-slave.rongyi.com    79m          1%     2386Mi          64%       
cdh-slave2.rongyi.com   820m         41%    3136Mi          84%       
[ryuser@cdh-master metrics]$ kubectl top pods
NAME                         CPU(cores)   MEMORY(bytes)   
curl-66959f6557-bvn9r        0m           0Mi             
dep-httpd-5b774f45df-vtv59   0m           21Mi            
dep-httpd-5b774f45df-wd5kf   0m           15Mi            
myapp-0                      0m           1Mi             
myapp-1                      0m           3Mi             
myapp-2                      0m           1Mi             
myapp-3                      0m           1Mi             
myapp-4                      0m           1Mi             
pod-demo                     499m         138Mi

 

另外还有一个问题,就是grafana里面的dashboard是不显示数据。 经过上面的折腾有数据了。

 


附:dashboard的下载地址:

“Kubernetes Node Statistics”dashabord : https://grafana.com/dashboards/3646

“Kubernetes Pod Statistics”dashabord:https://grafana.com/dashboards/3649