Metrics-Server是集群核心监控数据的聚合器,用来替换之前的heapster。
容器相关的 Metrics 主要来自于 kubelet 内置的 cAdvisor 服务,有了MetricsServer之后,用户就可以通过标准的 Kubernetes API 来访问到这些监控数据。
Metrics API 只可以查询当前的度量数据,并不保存历史数据。
Metrics API URI 为 /apis/metrics.k8s.io/,在 k8s.io/metrics 维护。
必须部署 metrics-server 才能使用该 API,metrics-server 通过调用 Kubelet Summary API 获取数据。
Metrics Server 并不是 kube-apiserver 的一部分,而是通过 Aggregator 这种插件机制,在独立部署的情况下同 kube-apiserver 一起统一对外服务的。
kube-aggregator 其实就是一个根据 URL 选择具体的 API 后端的代理服务器。
Metrics-server属于Core metrics(核心指标),提供API metrics.k8s.io,仅提供Node和Pod的 CPU和内存使用情况。而其他Custom Metrics(自定义指标)由Prometheus等组件来完成。
(1)Metrics-Server安装
[root@server2 ~]# mkdir ms
[root@server2 ~]# cd ms
[root@server2 ms]# wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
--2020-07-10 08:45:51-- https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
Resolving github.com (github.com)... 13.229.188.59
Connecting to github.com (github.com)|13.229.188.59|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/92132038/cf781b00-7752-11ea-9faf-0b397e7b8445?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200710%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200710T004553Z&X-Amz-Expires=300&X-Amz-Signature=789a7d64440556513b5a85483ba5d3aac7d562fafd68c8d4e5f4dd2c7cbf81a9&X-Amz-SignedHeaders=host&actor_id=0&repo_id=92132038&response-content-disposition=attachment%3B%20filename%3Dcomponents.yaml&response-content-type=application%2Foctet-stream [following]
--2020-07-10 08:45:53-- https://github-production-release-asset-2e65be.s3.amazonaws.com/92132038/cf781b00-7752-11ea-9faf-0b397e7b8445?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200710%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200710T004553Z&X-Amz-Expires=300&X-Amz-Signature=789a7d64440556513b5a85483ba5d3aac7d562fafd68c8d4e5f4dd2c7cbf81a9&X-Amz-SignedHeaders=host&actor_id=0&repo_id=92132038&response-content-disposition=attachment%3B%20filename%3Dcomponents.yaml&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.230.155
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.230.155|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3335 (3.3K) [application/octet-stream]
Saving to: ‘components.yaml’
100%[=============================>] 3,335 --.-K/s in 0s
2020-07-10 08:45:55 (7.17 MB/s) - ‘components.yaml’ saved [3335/3335]
[root@server2 ms]# ls
components.yaml
[root@server2 ms]# vim components.yaml
[root@server2 ms]# kubectl apply -f components.yaml
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
[root@server2 ms]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bd97f9cd9-8lqqc 1/1 Running 1 47h
coredns-bd97f9cd9-cdk94 1/1 Running 1 46h
etcd-server2 1/1 Running 5 7d17h
kube-apiserver-server2 1/1 Running 5 7d17h
kube-controller-manager-server2 1/1 Running 5 7d17h
kube-flannel-ds-amd64-g8hg7 1/1 Running 5 7d16h
kube-flannel-ds-amd64-tlmh2 1/1 Running 1 46h
kube-proxy-dkr49 1/1 Running 4 7d16h
kube-proxy-dw7lv 1/1 Running 5 7d17h
kube-proxy-z7r6x 1/1 Running 1 46h
kube-scheduler-server2 1/1 Running 5 7d17h
metrics-server-7cdfcc6666-ptv48 1/1 Running 0 13s
[root@server2 ms]# kubectl describe pod -n kube-system metrics-server-7cdfcc6666-ptv48
Name: metrics-server-7cdfcc6666-ptv48
Namespace: kube-system
Priority: 0
Node: server4/172.25.12.4
Start Time: Fri, 10 Jul 2020 08:56:15 +0800
Labels: k8s-app=metrics-server
pod-template-hash=7cdfcc6666
Annotations:
Status: Running
IP: 10.244.2.44
IPs:
IP: 10.244.2.44
Controlled By: ReplicaSet/metrics-server-7cdfcc6666
Containers:
metrics-server:
Container ID: docker://1f24dc0545b0150912ecb6766c4ac08f2a9a0d338642ccd28dfffa0928b5f871
Image: metrics-server-amd64:v0.3.6
Image ID: docker-pullable://metrics-server-amd64@sha256:c9c4e95068b51d6b33a9dccc61875df07dc650abbf4ac1a19d58b4628f89288b
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
State: Running
Started: Fri, 10 Jul 2020 08:56:18 +0800
Ready: True
Restart Count: 0
Environment:
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-8t6zs (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
metrics-server-token-8t6zs:
Type: Secret (a volume populated by a Secret)
SecretName: metrics-server-token-8t6zs
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/arch=amd64
kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 42s default-scheduler Successfully assigned kube-system/metrics-server-7cdfcc6666-ptv48 to server4
Normal Pulling 41s kubelet, server4 Pulling image "metrics-server-amd64:v0.3.6"
Normal Pulled 39s kubelet, server4 Successfully pulled image "metrics-server-amd64:v0.3.6"
Normal Created 39s kubelet, server4 Created container metrics-server
Normal Started 39s kubelet, server4 Started container metrics-server
[root@server2 ms]#
下载yaml文件
下载并上传实验所需镜像metrics-server到私有仓库
应用yaml文件,metrics-server成功运行
(2)解决Metrics-server的Pod日志报错
部署后查看Metrics-server的Pod日志:
kubectl logs -n kube-system metrics-server-7cdfcc6666-ptv48
报错1:dial tcp: lookup server2 on 10.96.0.10:53: no such host
这是因为没有内网的DNS服务器,所以metrics-server无法解析节点名字。可以直接修改 coredns的configmap,将各个节点的主机名加入到hosts中,这样所有Pod都可以从 CoreDNS中解析各个节点的名字。
解决:
kubectl edit configmap coredns -n kube-system
作内部地址解析
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
hosts {
172.25.12.2 server2
172.25.12.3 server3
172.25.12.4 server4
fallthrough
}
报错2:x509: certificate signed by unknown authority
Metric Server 支持一个参数 --kubelet-insecure-tls,可以跳过这一检查,然而官 方也明确说了,这种方式不推荐生产使用
解决:
启用TLS Bootstrap 证书签发(在master和各个节点中)
vim /var/lib/kubelet/config.yaml
[root@server2 ms]# vim /var/lib/kubelet/config.yaml ##添加参数
serverTLSBootstrap: true
[root@server2 ms]# systemctl restart kubelet
[root@server2 ms]# kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-57blk 13s kubernetes.io/kubelet-serving system:node:server2 Pending
[root@server2 ms]# kubectl certificate approve csr-57blk
certificatesigningrequest.certificates.k8s.io/csr-57blk approved
[root@server2 ms]# ubectl get csr
-bash: ubectl: command not found
[root@server2 ms]# kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-57blk 76s kubernetes.io/kubelet-serving system:node:server2 Approved,Issued
报错3: Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
如果metrics-server正常启动,没有错误,应该就是网络问题。修改metricsserver的Pod 网络模式:hostNetwork: true
Dashboard可以给用户提供一个可视化的 Web 界面来查看当前集群的各种信息。用户可以 用 Kubernetes Dashboard 部署容器化的应用、监控应用的状态、执行故障排查任务以及管 理 Kubernetes 各种资源。
(1)下载yaml文件:
拉取并推送实验所需镜像到私有仓库
(2)应用yaml文件:
(3) 修改为NodePort类型,供外部访问
kubectl edit svc -n kubernetes-dashboard
(4)获取dashboard pod的token
访问测试https://172.25.12.2:31416
复制token值,成功登陆