K8S资源指标获取工具:metrics-server
自定义指标的监控工具:prometheus,k8s-prometheus-adapter
prometheus:prometheus能够收集各种维度的资源指标,比如CPU利用率,网络连接的数量,网络报文的收发速率,包括进程的新建及回收速率等等,能够监控许许多多的指标,而这些指标K8S早期是不支持的,所以需要把prometheus能采集到的各种指标整合进k8s里,能让K8S根据这些指标来判断是否需要根据这些指标来进行pod的伸缩。
prometheus既作为监控系统来使用,也作为某些特殊的资源指标的提供者来使用。但是这些指标不是标准的K8S内建指标,称之为自定义指标,但是prometheus要想将监控采集到的数据作为指标来展示,则需要一个插件,这个插件叫k8s-prometheus-adapter,这些指标判断pod是否需要伸缩的基本标准,例如根据cpu的利用率、内存使用量去进行伸缩。
随着prometheus和k8s-prometheus-adapter的引入,新一代的k8s架构也就形成了。
核心指标流水线:由kubelet、metrics-server以及由API server提供的api组成;CPU累积使用率、内存的实时使用率、pod的资源占用率及容器的磁盘占用率;
监控流水线:用于从系统收集各种指标数据并提供给终端用户、存储系统以及HPA,包含核心指标以及其他许多非核心指标。非核心指标本身不能被K8S所解析。所以需要k8s-prometheus-adapter将prometheus采集到的数据转化为k8s能理解的格式,为k8s所使用。
核心指标监控
之前使用的是heapster,但是1.12后就废弃了,之后使用的替代者是metrics-server;metrics-server是由用户开发的一个api server,用于服务资源指标,而不是服务pod,deploy的。metrics-server本身不是k8s的组成部分,是托管运行在k8s上的一个pod,那么如果想要用户在k8s上无缝的使用metrics-server提供的api服务,因此在新一代的架构中需要这样去组合它们。如图,使用一个聚合器去聚合k8s的api server与metrics-server,然后由群组/apis/metrics.k8s.io/v1beta1来获取。
之后如果用户还有其他的api server都可以整合进aggregator,由aggregator来提供服务,如图。
查看k8s默认的api-version,可以看到是没有metrics.k8s.io这个组的
当你部署好metrics-server后再查看api-versions就可以看到metrics.k8s.io这个组了。
部署metrics-server
进到kubernetes项目下的cluster下的addons,找到对应的项目下载下来
[root@master bcia]# mkdir metrics-server -p
[root@master bcia]# cd metrics-server/
[root@master metrics-server]# for file in auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml ; do wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/$file;done
//一次性下载所有文件
--2019-11-02 10:18:10-- https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/auth-delegator.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.228.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.228.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 398 [text/plain]
Saving to: ‘auth-delegator.yaml’
100%[==========================================================================>] 398 --.-K/s in 0s
...省略...
[root@master metrics-server]# ls
auth-delegator.yaml metrics-apiservice.yaml metrics-server-service.yaml
auth-reader.yaml metrics-server-deployment.yaml resource-reader.yaml
[root@master metrics-server]# kubectl apply -f . //一次性运行所有文件
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
configmap/metrics-server-config created
deployment.apps/metrics-server-v0.3.6 created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
运行后发现报错,一次性删除所有,修改几处地方,如图
1、metrics-server-deployment.yaml
metrics-server的command中加上 - --kubelet-insecure-tls 表示不验证客户端的证书,注释掉端口10255,注释后会使用10250,通过https通信
addon-resizer的command中写上具体的cpu、memory、extra-memory的值,注释掉minClusterSize={{ metrics_server_min_cluster_size }}
2、resource-reader.yaml
加上nodes/stats,如图
修改后的metrics-server-deployment.yaml和resource-reader.yaml文件内容放在了本文的最后。
//查看pods是否正常运行
[root@master metrics-server]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-8686dcc4fd-bzgss 1/1 Running 0 9d
coredns-8686dcc4fd-xgd49 1/1 Running 0 9d
etcd-master 1/1 Running 0 9d
kube-apiserver-master 1/1 Running 0 9d
kube-controller-manager-master 1/1 Running 0 9d
kube-flannel-ds-amd64-52d6n 1/1 Running 0 9d
kube-flannel-ds-amd64-k8qxt 1/1 Running 0 8d
kube-flannel-ds-amd64-lnss4 1/1 Running 0 9d
kube-proxy-4s5mf 1/1 Running 0 8d
kube-proxy-b6szk 1/1 Running 0 9d
kube-proxy-wsnfz 1/1 Running 0 9d
kube-scheduler-master 1/1 Running 0 9d
kubernetes-dashboard-76f6bf8c57-rncvn 1/1 Running 0 8d
metrics-server-v0.3.6-677d79858c-75vk7 2/2 Running 0 18m
tiller-deploy-57c977bff7-tcnrf 1/1 Running 0 7d20h
查看api-versions,会看到多出了metrics.k8s.io/v1beta1
查看node及pod监控指标
//查看node及pod监控指标
[root@master metrics-server]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 145m 3% 1801Mi 11%
node2 697m 17% 12176Mi 77%
node3 838m 20% 12217Mi 77%
[root@master metrics-server]# kubectl top pods
NAME CPU(cores) MEMORY(bytes)
account-deploy-6d86f9df74-khv4v 5m 444Mi
admin-deploy-55dcf4bc4d-srw8m 2m 317Mi
backend-deploy-6f7bdd9bf4-w4sqc 4m 497Mi
crm-deploy-7879694578-cngzp 4m 421Mi
device-deploy-77768bf87c-ct5nc 5m 434Mi
elassandra-0 168m 4879Mi
gateway-deploy-68c988676d-wnqsz 4m 379Mi
jhipster-alerter-74fc8984c4-27bx8 1m 46Mi
jhipster-console-85556468d-kjfg6 3m 119Mi
jhipster-curator-67b58477b9-5f8br 1m 11Mi
jhipster-logstash-74878f8b49-mpn62 59m 860Mi
jhipster-zipkin-5b5ff7bdbc-bsxhk 1m 1571Mi
order-deploy-c4c846c54-2gxkp 5m 440Mi
pos-registry-76bbd6c689-q5w2b 442m 474Mi
recv-deploy-5dd686c947-v4qqh 5m 424Mi
store-deploy-54c994c9b6-82b8z 6m 493Mi
task-deploy-64c9984d88-fqxqq 6m 461Mi
wiggly-cat-redis-ha-sentinel-655f7b5f9d-bbrz6 4m 4Mi
wiggly-cat-redis-ha-sentinel-655f7b5f9d-bj4bq 4m 5Mi
wiggly-cat-redis-ha-sentinel-655f7b5f9d-f9pdd 4m 5Mi
wiggly-cat-redis-ha-server-b58c8d788-6xlwk 3m 11Mi
wiggly-cat-redis-ha-server-b58c8d788-r949h 3m 8Mi
wiggly-cat-redis-ha-server-b58c8d788-w2gtb 3m 22Mi
至此,metrics-server部署结束。下一篇写Prometheus
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metrics-server-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
NannyConfiguration: |-
apiVersion: nannyconfig/v1alpha1
kind: NannyConfiguration
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server-v0.3.3
namespace: kube-system
labels:
k8s-app: metrics-server
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.3.3
spec:
selector:
matchLabels:
k8s-app: metrics-server
version: v0.3.3
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
version: v0.3.3
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
containers:
- name: metrics-server
image: gcr.azk8s.cn/google-containers/metrics-server-amd64:v0.3.3
command:
- /metrics-server
- --metric-resolution=30s
# - --kubeconfig=/key/kubeconfig
# These are needed for GKE, which doesn't support secure communication yet.
# Remove these lines for non-GKE clusters, and when GKE supports token-based auth.
#- --kubelet-port=10255
# - --deprecated-kubelet-completely-insecure=true
#- --source=kubernetes.summary_api:https://kubernetes.default.svc?kubeletHttps=true&kubeletPort=10250&useServiceAccount=true&insecure=true
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
ports:
- containerPort: 443
name: https
protocol: TCP
- name: metrics-server-nanny
image: gcr.azk8s.cn/google-containers/addon-resizer:1.8.4
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 5m
memory: 50Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: metrics-server-config-volume
mountPath: /etc/config
command:
- /pod_nanny
- --config-dir=/etc/config
- --cpu=80m
- --extra-cpu=0.5m
- --memory=80Mi
- --extra-memory=8Mi
- --threshold=5
- --deployment=metrics-server-v0.3.3
- --container=metrics-server
- --poll-period=300000
- --estimator=exponential
# Specifies the smallest cluster (defined in number of nodes)
# resources will be scaled to.
# - --minClusterSize={{ metrics_server_min_cluster_size }}
volumes:
- name: metrics-server-config-volume
configMap:
name: metrics-server-config
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- deployments
verbs:
- get
- list
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system