metrices-server从
api-server
中获取cpu、内存使用率等监控指标
1、收集HPA控制下所有Pod最近的cpu使用情况(CPU utilization)
2、对比在扩容条件里记录的cpu限额(CPUUtilization)
3、调整实例数(必须要满足不超过最大/最小实例数)
4、每隔30s做一次自动扩容的判断
CPU utilization的计算方法是用cpu usage(最近一分钟的平均值,通过metrics可以直接获取到)除以cpu request(这里cpu request就是我们在创建容器时制定的cpu使用核心数)得到一个平均值,这个平均值可以理解为:平均每个Pod CPU核心的使用占比。
k8s中的某个Metrics Server(Heapster或自定义Metrics Server)持续采集所有Pod副本的指标数据。
HPA控制器通过Metrics Server的API(Heapster的API或聚合API)获取这些数据,基于用户定义的扩缩容规则进行计算,得到目标Pod副本数量。
当目标Pod副本数量与当前副本数量不同时,HPA控制器就访问Pod的副本控制器(Deployment 、RC或者ReplicaSet)发起scale操作,调整Pod的副本数量,完成扩缩容操作。
总结:HPA 通过监控分析一些控制器控制的所有 Pod 的负载变化情况来确定是否需要调整 Pod 的副本数量。
Master的kube-controller-manager服务持续监测目标Pod的某种性能指标,以计算是否需要调整副本数量。目前k8s支持的指标类型如下。
◎ Pod资源使用率:Pod级别的性能指标,通常是一个比率值,例如CPU使用率。
◎ Pod自定义指标:Pod级别的性能指标,通常是一个数值,例如接收的请求数量。
◎ Object自定义指标或外部自定义指标:通常是一个数值,需要容器应用以某种方式提供,例如通过HTTP URL“/metrics”提供,或者使用外部服务提供的指标采集URL。
k8s从1.11版本开始,弃用基于Heapster组件完成Pod的CPU使用率采集的机制,全面转向基于Metrics Server完成数据采集。Metrics Server将采集到的Pod性能指标数据通过聚合API(Aggregated API)如metrics.k8s.io、custom.metrics.k8s.io和external.metrics.k8s.io提供给HPA控制器进行查询。关于聚合API和聚合器(API Aggregator)的概念我们后面详细讲解。
通过 伸缩系数 判断是否要进行扩容或缩容。
HPA会根据获得的指标数值,应用相应的算法算出一个伸缩系数,此系数是指标的期望值与目前值的比值,如果大于1表示扩容,小于1表示缩容。
容忍度
--horizontal-pod-autoscaler-tolerance:容忍度
它允许一定范围内的使用量的不稳定,现在默认为0.1,这也是出于维护系统稳定性的考虑。
例如,设定HPA调度策略为cpu使用率高于50%触发扩容,那么只有当使用率大于55%或者小于45%才会触发伸缩活动,HPA会尽力把Pod的使用率控制在这个范围之间。
算法
具体的每次扩容或者缩容的多少Pod的算法为: 期望副本数 = ceil[当前副本数 * ( 当前指标 / 期望指标 )]
举个栗子
当前metric值是200m,期望值是100m,那么pod副本数将会翻一倍,因为 比率为 200.0 / 100.0 = 2.0;
如果当前值是 50m ,我们会将pod副本数减半,因为 50.0 / 100.0 == 0.5
如果比率接近1.0,如0.9或1.1(即容忍度是0.1),将不会进行缩放(取决于内置的全局参数容忍度,–horizontal-pod-autoscaler-tolerance,默认值为0.1)。
此外,存在几种Pod异常的情况,如下所述。
◎ Pod正在被删除(设置了删除时间戳):将不会计入目标Pod副本数量。
◎ Pod的当前指标值无法获得:本次探测不会将这个Pod纳入目标Pod副本数量,后续的探测会被重新纳入计算范围。
◎ 如果指标类型是CPU使用率,则对于正在启动但是还未达到Ready状态的Pod,也暂时不会纳入目标副本数量范围。可以通过kube-controller-manager服务的启动参数--horizontal-pod-autoscaler-initial-readiness-delay设置首次探测Pod是否Ready的延时时间,默认值为30s。另一个启动参数--horizontal-pod-autoscaler-cpuinitialization-period设置首次采集Pod的CPU使用率的延时时间。
注意:
冷却和延迟机制
使用HPA管理一组副本时,有可能因为metrics动态变化而导致副本数频繁波动,这种现象叫做 “颠簸”。
想象一种场景:
当pod所需要的CPU负荷过大,从而在创建一个新pod的过程中,系统的CPU使用量可能会同样在有一个攀升的过程。所以,在每一次作出决策后的一段时间内,将不再进行扩展决策。对于扩容而言,这个时间段为3分钟,缩容为5分钟
Pod延迟探测机制
如果指标类型是CPU使用率,则对于正在启动但是还未达到Ready状态的Pod,也暂时不会纳入目标副本数量范围。
Kubernetes是借助Agrregator APIServer扩展机制来实现Custom Metrics。Custom Metrics APIServer是一个提供查询Metrics指标的API服务(Prometheus的一个适配器),这个服务启动后,kubernetes会暴露一个叫custom.metrics.k8s.io的API,当请求这个URL时,请求通过Custom Metics APIServer去Prometheus里面去查询对应的指标,然后将查询结果按照特定格式返回。
要支持最新的custom(包括external)的metrics,也需要使用新版本的HPA:autoscaling/v2beta1,里面增加四种类型的Metrics:Resource、Pods、Object、External,每种资源对应不同的场景,下面分别说明:
在HPA最新的版本 autoscaling/v2beta2 中又对metrics的配置和HPA扩缩容的策略做了完善,特别是对 metrics 数据的目标指标值的类型定义更通用灵活:包括AverageUtilization、AverageValue和Value,但是不是所有的类型的Metrics都支持三种目标值的,具体对应关系如下表。
HPA里的各种类型的Metrics和Metrics Target Type的对应支持关系表
Metrics Type \ Target Type | AverageUtilization | AverageValue | Value | 备注(query metrics) |
Resource(pod’s cpu/memory etc.) | Yes | Yes | No | pods metrics list |
Pods(pod’s other metrics) | No | Yes | No | pods metrics list |
Object(k8s object) | No | Yes | Yes | object metrics |
External(not k8s object) | No | Yes | Yes | external metrics list |
先看个最简单的HPA的定义的例子
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
从上面的例子可以看出,HPA的spec定义由三个必填部分组成:
完整的HPA的定义可参考k8s的官方API文档。
默认HPA spec里不配置任何metrics的话k8s会默认设置cpu的Resouce,且目标类型是AverageUtilization value为80%。
查看HPA所有版本
[root@master C]# kubectl api-versions |grep autoscaling
autoscaling/v1 #只支持通过CPU为参考依据来改变Pod的副本数
autoscaling/v2beta1 #支持通过CPU、内存、连接数或者自定义规则为参考依据
autoscaling/v2beta2 #和v2beta1差不多
查看当前版本
[root@master C]# kubectl explain hpa
KIND: HorizontalPodAutoscaler
VERSION: autoscaling/v1 #可以看到使用的默认版本是v1
DESCRIPTION:
configuration of a horizontal pod autoscaler.
FIELDS:
apiVersion
APIVersion defines the versioned schema of this representation of an
object. Servers should convert recognized schemas to the latest internal
value, and may reject unrecognized values. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
kind
Kind is a string value representing the REST resource this object
represents. Servers may infer this from the endpoint the client submits
requests to. Cannot be updated. In CamelCase. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
metadata
指定使用版本,这里并不是修改,相当于执行这条命令时,指定了下版本
[root@master C]# kubectl explain hpa --api-version=autoscaling/v2beta1
KIND: HorizontalPodAutoscaler
VERSION: autoscaling/v2beta1
DESCRIPTION:
HorizontalPodAutoscaler is the configuration for a horizontal pod
autoscaler, which automatically manages the replica count of any resource
implementing the scale subresource based on the metrics specified.
FIELDS:
apiVersion
APIVersion defines the versioned schema of this representation of an
object. Servers should convert recognized schemas to the latest internal
value, and may reject unrecognized values. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
kind
Kind is a string value representing the REST resource this object
represents. Servers may infer this from the endpoint the client submits
requests to. Cannot be updated. In CamelCase. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
metadata
metadata is the standard object metadata. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
spec
spec is the specification for the behaviour of the autoscaler. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status.
status
status is the current information about the autoscaler.
[root@master kube-system]# kubectl top nodes #查看节点状态,因为没有安装,所以会报错
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
[root@master kube-system]# vim components-v0.5.0.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
image: registry.cn-shenzhen.aliyuncs.com/zengfengjin/metrics-server:v0.5.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
[root@master kube-system]# kubectl apply -f components-v0.5.0.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
#查看创建的pod
[root@master kube-system]# kubectl get pods -n kube-system| egrep 'NAME|metrics-server'
NAME READY STATUS RESTARTS AGE
metrics-server-5944675dfb-q6cdd 0/1 ContainerCreating 0 6s
#查看日志
[root@master kube-system]# kubectl logs metrics-server-5944675dfb-q6cdd -n kube-system
I0718 03:06:39.064633 1 serving.go:341] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0718 03:06:39.870097 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0718 03:06:39.870122 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0718 03:06:39.870159 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0718 03:06:39.870160 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0718 03:06:39.870105 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0718 03:06:39.871166 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0718 03:06:39.872804 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0718 03:06:39.875741 1 secure_serving.go:197] Serving securely on [::]:4443
I0718 03:06:39.876050 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0718 03:06:39.970469 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0718 03:06:39.970575 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0718 03:06:39.971610 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
#如果报错的化,可以修改apiserver的yaml文件,这是k8s的yaml文件
[root@master kube-system]# vim /etc/kubernetes/manifests/kube-apiserver.yaml
40 - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
41 - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
42 - --enable-aggregator-routing=true #添加这行
43 image: registry.aliyuncs.com/google_containers/kube-apiserver:v1.18.0
44 imagePullPolicy: IfNotPresent
#保存退出
[root@master kube-system]# systemctl restart kubelet #修改后重启kubelet
#再次查看节点信息
[root@master kube-system]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 327m 4% 3909Mi 23%
node 148m 1% 1327Mi 8%
[root@master test]# cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
run: nginx
replicas: 1
template:
metadata:
labels:
run: nginx
spec:
containers:
- name: nginx
image: nginx:1.15.2
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests: #想要HPA生效,必须添加requests声明
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
run: nginx
spec:
ports:
- port: 80
selector:
run: nginx
[root@master test]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-9cb8d65b5-tq9v4 1/1 Running 0 14m 10.244.1.22 node
[root@master test]# kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 172.16.169.27 80/TCP 15m
[root@master test]# kubectl describe svc nginx
Name: nginx
Namespace: default
Labels: run=nginx
Annotations: Selector: run=nginx
Type: ClusterIP
IP: 172.16.169.27
Port: 80/TCP
TargetPort: 80/TCP
Endpoints: 10.244.1.22:80
Session Affinity: None
Events:
[root@node test]# curl 172.16.169.27 #访问成功
Welcome to nginx!
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
#创建一个cpu利用率达到20,最大10个pod,最小1个,这里没有指定版本所以默认是v1版本,而v1版本只能以CPU为标准
[root@master test]# kubectl autoscale deployment nginx --cpu-percent=20 --min=1 --max=10
horizontalpodautoscaler.autoscaling/nginx autoscaled
#TARGETS可以看到使用率
[root@master test]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/20% 1 10 1 86s
#创建一个测试pod增加负载,访问地址要和pod的svc地址相同
[root@master ~]# kubectl run busybox -it --image=busybox -- /bin/sh -c 'while true; do wget -q -O- http://10.244.1.22; done'
#过一分钟后看hap的使用率,REPLICAS是当前pod的数量
[root@master test]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 27%/20% 1 10 5 54m
[root@master test]# kubectl get pods #再看pod数量,发现已经增加到了5个
NAME READY STATUS RESTARTS AGE
bustbox 1/1 Running 0 119s
nginx-9cb8d65b5-24dg2 1/1 Running 0 57s
nginx-9cb8d65b5-c6n98 1/1 Running 0 87s
nginx-9cb8d65b5-ksjzv 1/1 Running 0 57s
nginx-9cb8d65b5-n77fm 1/1 Running 0 87s
nginx-9cb8d65b5-tq9v4 1/1 Running 0 84m
[root@master test]# kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 5/5 5 5 84m
#此时,停止压测,过好几分钟后再次查看pod数量和使用率
[root@master test]# kubectl delete pod busybox #终止后,删除pod
[root@master test]# kubectl get hpa #虽然使用率已经降到0了,但是可以看到当前REPLICAS的数量还5,这个需要等一会就会缩容
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/20% 1 10 5 58m
#过了几分钟后,可以看到pod数量已经回到了1
[root@master test]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/20% 1 10 1 64m
[root@master test]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-9cb8d65b5-tq9v4 1/1 Running 0 95m
#先把上面创建的资源删除
[root@master test]# kubectl delete horizontalpodautoscalers.autoscaling nginx
horizontalpodautoscaler.autoscaling "nginx" deleted
[root@master test]# kubectl delete -f nginx.yaml
deployment.apps "nginx" deleted
service "nginx" deleted
[root@master test]# cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
run: nginx
replicas: 1
template:
metadata:
labels:
run: nginx
spec:
containers:
- name: nginx
image: nginx:1.15.2
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
memory: 60Mi
requests:
cpu: 200m
memory: 25Mi
---
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
run: nginx
spec:
ports:
- port: 80
selector:
run: nginx
[root@master test]# kubectl apply -f nginx.yaml
deployment.apps/nginx created
service/nginx created
[root@master test]# vim hpa-nginx.yaml
apiVersion: autoscaling/v2beta1 #上面的hpa版本有提到过,使用基于内存的hpa需要换个版本
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
maxReplicas: 10 #1-10的pod数量限制
minReplicas: 1
scaleTargetRef: #指定使用hpa的资源对象,版本、类型、名称要和上面创建的相同
apiVersion: apps/v1
kind: Deployment
name: nginx
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 50 #限制%50的内存
[root@master test]# kubectl apply -f hpa-nginx.yaml
horizontalpodautoscaler.autoscaling/nginx-hpa created
[root@master test]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-hpa Deployment/nginx 7%/50% 1 10 1 59s
#在pod中执行命令,增加内存负载
[root@master ~]# kubectl exec -it nginx-78f4944bb8-2rz7j -- /bin/sh -c 'dd if=/dev/zero of=/tmp/file1'
[root@master test]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-hpa Deployment/nginx 137%/50% 1 10 1 12m
[root@master test]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-hpa Deployment/nginx 14%/50% 1 10 3 12m
[root@master test]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-78f4944bb8-2rz7j 1/1 Running 0 21m
nginx-78f4944bb8-bxh78 1/1 Running 0 34s
nginx-78f4944bb8-g8w2h 1/1 Running 0 34s
#与CPU相同,内存上去了也会自动创建pod