kubectl create -f integration/custom-metrics-api/namespace.yaml
kubectl create -f integration/custom-metrics-api
注意:因为 custom-metrics-api 对接集群中的 Prometheous 的访问地址,请替换 prometheous url 为你真正使用的 Prometheous 地址。
$ kubectl get --raw “/apis/custom.metrics.k8s.io/v1beta1” | jq
“kind”: “APIResourceList”,
“apiVersion”: “v1”,
“groupVersion”: “custom.metrics.k8s.io/v1beta1”,
“resources”: [
“name”: “pods/capacity_used_rate”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“name”: “datasets.data.fluid.io/capacity_used_rate”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“name”: “namespaces/capacity_used_rate”,
“singularName”: “”,
“namespaced”: false,
“kind”: “MetricValueList”,
“verbs”: [
7. 提交测试使用的 Dataset。
$ cat apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: spark spec: mounts: name: spark apiVersion: data.fluid.io/v1alpha1 kind: AlluxioRuntime metadata: name: spark spec: replicas: 1 tieredstore: levels: path: /dev/shm quota: 1Gi high: “0.99” low: “0.7” properties: alluxio.user.streaming.data.timeout: 300sec EOF $ kubectl create -f dataset.yaml dataset.data.fluid.io/spark created alluxioruntime.data.fluid.io/spark created 8. 查看这个 Dataset 是否处于可用状态。 $ kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE spark 2.71GiB 0.00B 1.00GiB 0.0% Bound 7m38s 9. 当该 Dataset 处于可用状态后,查看是否已经可以从 custom-metrics-api 获得监控指标。 kubectl get --raw “/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate” | jq { “kind”: “MetricValueList”, “apiVersion”: “custom.metrics.k8s.io/v1beta1”, “metadata”: { “selfLink”: “/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate” }, “items”: [ { “describedObject”: { “kind”: “Dataset”, “namespace”: “default”, “name”: “spark”, “apiVersion”: “data.fluid.io/v1alpha1” }, “metricName”: “capacity_used_rate”, “timestamp”: “2021-04-04T07:24:52Z”, “value”: “0” } ] } 10. 创建 HPA 任务。 $ cat< hpa.yaml apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: spark spec: scaleTargetRef: apiVersion: data.fluid.io/v1alpha1 kind: AlluxioRuntime name: spark minReplicas: 1 maxReplicas: 4 metrics: object: metric: name: capacity_used_rate describedObject: apiVersion: data.fluid.io/v1alpha1 kind: Dataset name: spark target: type: Value value: “90” behavior: scaleUp: policies: value: 2 periodSeconds: 600 scaleDown: selectPolicy: Disabled EOF 首先,我们解读一下从样例配置,这里主要有两部分一个是扩缩容的规则,另一个是扩缩容的灵敏度: 规则:触发扩容行为的条件为 Dataset 对象的缓存数据量占总缓存能力的 90%;扩容对象为AlluxioRuntime,最小副本数为 1,最大副本数为 4;而 Dataset 和 AlluxioRuntime 的对象需要在同一个 namespace。 策略:可以 K8s 1.18 以上的版本,可以分别针对扩容和缩容场景设置稳定时间和一次扩缩容步长比例。比如在本例子, 一次扩容周期为 10 分钟(periodSeconds),扩容时新增 2 个副本数,当然这也不可以超过 maxReplicas 的限制;而完成一次扩容后,冷却时间(stabilizationWindowSeconds)为 20 分钟;而缩容策略可以选择直接关闭。 11. 查看 HPA 配置, 当前缓存空间的数据占比为 0。远远低于触发扩容的条件。 $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE spark AlluxioRuntime/spark 0/90 1 4 1 33s $ kubectl describe hpa Name: spark Namespace: 《一线大厂Java面试题解析+后端开发学习笔记+最新架构讲解视频+实战项目源码讲义》 【docs.qq.com/doc/DSmxTbFJ1cmN1R2dB】 完整内容开源分享 Labels: Annotations: CreationTimestamp: Wed, 07 Apr 2021 17:36:39 +0800 Reference: AlluxioRuntime/spark Metrics: ( current / target ) “capacity_used_rate” on Dataset/spark (target value): 0 / 90 Min replicas: 1 Max replicas: 4 Behavior: Scale Up: Stabilization Window: 0 seconds Select Policy: Max Policies: Scale Down: Select Policy: Disabled Policies: AlluxioRuntime pods: 1 current / 1 desired Conditions: Type Status Reason Message AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate ScalingLimited False DesiredWithinRange the desired count is within the acceptable range Events: 12. 创建数据预热任务。 $ cat< dataload.yaml apiVersion: data.fluid.io/v1alpha1 kind: DataLoad metadata: name: spark spec: dataset: name: spark namespace: default EOF $ kubectl create -f dataload.yaml $ kubectl get dataload NAME DATASET PHASE AGE DURATION spark spark Executing 15s Unfinished 13. 此时可以发现缓存的数据量接近了 Fluid 可以提供的缓存能力(1GiB)同时触发了弹性伸缩的条件。 $ kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE spark 2.71GiB 1020.92MiB 1.00GiB 36.8% Bound 5m15s 从 HPA 的监控,可以看到 Alluxio Runtime 的扩容已经开始, 可以发现扩容的步长为 2。 $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE spark AlluxioRuntime/spark 100/90 1 4 2 4m20s $ kubectl describe hpa Name: spark Namespace: default Labels: Annotations: CreationTimestamp: Wed, 07 Apr 2021 17:56:31 +0800 Reference: AlluxioRuntime/spark Metrics: ( current / target ) “capacity_used_rate” on Dataset/spark (target value): 100 / 90 Min replicas: 1 Max replicas: 4 Behavior: Scale Up: Stabilization Window: 0 seconds Select Policy: Max Policies: Scale Down: Select Policy: Disabled Policies: AlluxioRuntime pods: 2 current / 3 desired Conditions: Type Status Reason Message AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 3 ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate ScalingLimited False DesiredWithinRange the desired count is within the acceptable range Events: Type Reason Age From Message Normal SuccessfulRescale 21s horizontal-pod-autoscaler New size: 2; reason: Dataset metric capacity_used_rate above target Normal SuccessfulRescale 6s horizontal-pod-autoscaler New size: 3; reason: Dataset metric capacity_used_rate above target 14. 在等待一段时间之后发现数据集的缓存空间由 1GiB 提升到了 3GiB,数据缓存已经接近完成。 $ kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE spark 2.71GiB 2.59GiB 3.00GiB 95.6% Bound 12m 同时观察 HPA 的状态,可以发现此时 Dataset 对应的 runtime 的 replicas 数量为 3, 已经使用的缓存空间比例 capacity_used_rate 为 85%,已经不会触发缓存扩容。
可以看到该数据集的数据总量为 2.71GiB, 目前 Fluid 提供的缓存节点数为 1,可以提供的最大缓存能力为 1GiB。此时数据量是无法满足全量数据缓存的需求。