k8s 提供了 top 命令可用于统计资源使用情况,它包含有 node 和 pod 两个⼦命令,分别显⽰ node 节点和 Pod 对象的资源使⽤信息。
kubectl top 命令依赖于 metrics 接口。k8s 系统默认未安装该接口,需要单独部署:
[[email protected] k8s-install]# kubectl top pod
error: Metrics API not available
安装过程
一、下载部署文件
下载 metrics 接口的部署文件 metrics-server-components.yaml
[[email protected] k8s-install]# wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server-components.yaml
--2022-10-11 00:13:01-- https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
正在解析主机 github.com (github.com)... 20.205.243.166
正在连接 github.com (github.com)|20.205.243.166|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Found
位置:https://github.com/kubernetes-sigs/metrics-server/releases/download/metrics-server-helm-chart-3.8.2/components.yaml [跟随至新的 URL]
--2022-10-11 00:13:01-- https://github.com/kubernetes-sigs/metrics-server/releases/download/metrics-server-helm-chart-3.8.2/components.yaml
再次使用存在的到 github.com:443 的连接。
已发出 HTTP 请求,正在等待回应... 302 Found
位置:https://objects.githubusercontent.com/github-production-release-asset-2e65be/92132038/d85e100a-2404-4c5e-b6a9-f3814ad4e6e5?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20221010%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221010T161303Z&X-Amz-Expires=300&X-Amz-Signature=efa1ff5dd16b6cd86b6186adb3b4c72afed8197bdf08e2bffcd71b9118137831&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=92132038&response-content-disposition=attachment%3B%20filename%3Dcomponents.yaml&response-content-type=application%2Foctet-stream [跟随至新的 URL]
--2022-10-11 00:13:02-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/92132038/d85e100a-2404-4c5e-b6a9-f3814ad4e6e5?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20221010%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221010T161303Z&X-Amz-Expires=300&X-Amz-Signature=efa1ff5dd16b6cd86b6186adb3b4c72afed8197bdf08e2bffcd71b9118137831&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=92132038&response-content-disposition=attachment%3B%20filename%3Dcomponents.yaml&response-content-type=application%2Foctet-stream
正在解析主机 objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
正在连接 objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:4181 (4.1K) [application/octet-stream]
正在保存至: “metrics-server-components.yaml”
100%[============================================================================================================================>] 4,181 --.-K/s 用时 0.01s
2022-10-11 00:13:10 (385 KB/s) - 已保存 “metrics-server-components.yaml” [4181/4181])
二、修改镜像地址
将部署文件中镜像地址修改为国内的地址。大概在部署文件的第 140 行。
原配置是:
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
修改后的配置是:
image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.1
可使用如下命令实现修改:
sed -i 's/k8s.gcr.io/metrics-server/registry.cn-hangzhou.aliyuncs.com/google_containers/g' metrics-server-components.yaml
三、部署 metrics 接口
[[email protected] k8s-install]# kubectl create -f metrics-server-components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
查看该 metric pod 的运行情况:
[[email protected] k8s-install]# kubectl get pods --all-namespaces | grep metrics
kube-system metrics-server-6ffc8966f5-84hbb 0/1 Running 0 2m23s
查看该 pod 的情况,发现是探针问题:Readiness probe failed: HTTP probe failed with statuscode: 500
[[email protected] k8s-install]# kubectl describe pod metrics-server-6ffc8966f5-84hbb -n kube-system
Name: metrics-server-6ffc8966f5-84hbb
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: k8s-slave2/192.168.100.22
Start Time: Tue, 11 Oct 2022 00:27:33 +0800
Labels: k8s-app=metrics-server
pod-template-hash=6ffc8966f5
Annotations:
Status: Running
IP: 10.244.2.9
IPs:
IP: 10.244.2.9
Controlled By: ReplicaSet/metrics-server-6ffc8966f5
Containers:
metrics-server:
Container ID: docker://e913a075e0381b98eabfb6e298f308ef69dfbd7c672bdcfb75bb2ff3e4b5a0a4
Image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.1
Image ID: docker-pullable://registry.cn-hangzhou.aliyuncs.com/google_containers/[email protected]:5ddc6458eb95f5c70bd13fdab90cbd7d6ad1066e5b528ad1dcb28b76c5fb2f00
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Running
Started: Tue, 11 Oct 2022 00:27:45 +0800
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment:
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x2spb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-x2spb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m27s default-scheduler Successfully assigned kube-system/metrics-server-6ffc8966f5-84hbb to k8s-slave2
Normal Pulling 7m26s kubelet Pulling image "registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.1"
Normal Pulled 7m15s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.1" in 10.976606194s
Normal Created 7m15s kubelet Created container metrics-server
Normal Started 7m15s kubelet Started container metrics-server
Warning Unhealthy 2m17s (x31 over 6m47s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
进而查看 pod 的日志:
[[email protected] k8s-install]# kubectl logs metrics-server-6ffc8966f5-84hbb -n kube-system
I1010 16:27:46.228594 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I1010 16:27:46.633494 1 secure_serving.go:266] Serving securely on [::]:4443
I1010 16:27:46.633585 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1010 16:27:46.633616 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1010 16:27:46.633653 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I1010 16:27:46.634221 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W1010 16:27:46.634296 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I1010 16:27:46.634365 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I1010 16:27:46.634370 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1010 16:27:46.634409 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1010 16:27:46.634415 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E1010 16:27:46.641663 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.22:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.22 because it doesn't contain any IP SANs" node="k8s-slave2"
E1010 16:27:46.645389 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.20:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.20 because it doesn't contain any IP SANs" node="k8s-master"
E1010 16:27:46.652261 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.21:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.21 because it doesn't contain any IP SANs" node="k8s-slave1"
I1010 16:27:46.733747 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I1010 16:27:46.735167 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1010 16:27:46.735194 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E1010 16:28:01.643646 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.22:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.22 because it doesn't contain any IP SANs" node="k8s-slave2"
E1010 16:28:01.643805 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.21:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.21 because it doesn't contain any IP SANs" node="k8s-slave1"
E1010 16:28:01.646721 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.100.20:10250/metrics/resource\": x509: cannot validate certificate for 192.168.100.20 because it doesn't contain any IP SANs" node="k8s-master"
I1010 16:28:13.397373 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
基本可以确定 pod 异常是因为:Readiness Probe 探针检测到 Metris 容器启动后对 http Get 探针存活没反应,具体原因是:cannot validate certificate for 192.168.100.22 because it doesn't contain any IP SANs" node="k8s-slave2"
查看 metrics-server 的文档(https://github.com/kubernetes...),有如下一段说明:
Kubelet certificate needs to be signed by cluster Certificate Authority (or disable certificate validation by passing
--kubelet-insecure-tls to Metrics Server)
意思是:kubelet 证书需要由集群证书颁发机构签名(或者通过向 Metrics Server 传递参数 --kubelet-insecure-tls 来禁用证书验证)。
由于是测试环境,我们选择使用参数禁用证书验证,生产环境不推荐这样做!!!
在大概 139 行的位置追加参数:
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
apply 部署文件:
[[email protected] k8s-install]# kubectl apply -f metrics-server-components.yaml
Warning: resource serviceaccounts/metrics-server is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
serviceaccount/metrics-server configured
Warning: resource clusterroles/system:aggregated-metrics-reader is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader configured
Warning: resource clusterroles/system:metrics-server is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrole.rbac.authorization.k8s.io/system:metrics-server configured
Warning: resource rolebindings/metrics-server-auth-reader is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader configured
Warning: resource clusterrolebindings/metrics-server:system:auth-delegator is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator configured
Warning: resource clusterrolebindings/system:metrics-server is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server configured
Warning: resource services/metrics-server is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
service/metrics-server configured
Warning: resource deployments/metrics-server is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
deployment.apps/metrics-server configured
Warning: resource apiservices/v1beta1.metrics.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
metrics pod 已经正常运行:
[[email protected] k8s-install]# kubectl get pod -A | grep metrics
kube-system metrics-server-fd9598766-8zphn 1/1 Running 0 89s
再次执行 kubectl top 命令成功:
[[email protected] k8s-install]# kubectl top pod
NAME CPU(cores) MEMORY(bytes)
front-end-59bc6df748-699vb 0m 3Mi
front-end-59bc6df748-r7pkr 0m 3Mi
kucc4 1m 2Mi
legacy-app 1m 1Mi
my-demo-nginx-998bbf8f5-9t9pw 0m 0Mi
my-demo-nginx-998bbf8f5-lfgvw 0m 0Mi
my-demo-nginx-998bbf8f5-nfn7r 1m 0Mi
nginx-kusc00401 0m 3Mi
[[email protected] k8s-install]#
[[email protected] k8s-install]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master 232m 5% 1708Mi 46%
k8s-slave1 29m 1% 594Mi 34%
k8s-slave2 25m 1% 556Mi 32%