vertical-pod-autoscaler源码解析——recommender

首先clone代码

git clone https://github.com/kubernetes/autoscaler.git
image

recommender

根据以上的图,我们先来看一下recommender的源码,recommender推荐器,通过metrics或者prometheus获取资源,并生成request的推荐者。
进入目录 vertical-pod-autoscaler/pkg/recommender 查看main.go

发现创建了一个定时器,默认1分钟执行一次,执行2个方法recommender.RunOnce()和healthCheck.UpdateLastActivity()

    ticker := time.Tick(*metricsFetcherInterval)
    for range ticker {
        recommender.RunOnce()
        healthCheck.UpdateLastActivity()
    }

进入RunOnce方法,也就是recommender的核心方法,一共是以下流程

  1. 加载VPA
  2. 加载pod
  3. 加载实时的metric
  4. 更新VPA
  5. 写入检查点
  6. 垃圾回收
func (r *recommender) RunOnce() {
    timer := metrics_recommender.NewExecutionTimer()
    defer timer.ObserveTotal()

    ctx := context.Background()
    ctx, cancelFunc := context.WithDeadline(ctx, time.Now().Add(*checkpointsWriteTimeout))
    defer cancelFunc()

    klog.V(3).Infof("Recommender Run")
    r.clusterStateFeeder.LoadVPAs()
    timer.ObserveStep("LoadVPAs")

    r.clusterStateFeeder.LoadPods()
    timer.ObserveStep("LoadPods")

    r.clusterStateFeeder.LoadRealTimeMetrics()
    timer.ObserveStep("LoadMetrics")
    klog.V(3).Infof("ClusterState is tracking %v PodStates and %v VPAs", len(r.clusterState.Pods), len(r.clusterState.Vpas))

    r.UpdateVPAs()
    timer.ObserveStep("UpdateVPAs")

    r.MaintainCheckpoints(ctx, *minCheckpointsPerRun)
    timer.ObserveStep("MaintainCheckpoints")

    r.GarbageCollect()
    timer.ObserveStep("GarbageCollect")
    klog.V(3).Infof("ClusterState is tracking %d aggregated container states", r.clusterState.StateMapSize())
}


我们看一下所有的数据结构,这里我将所有的关键结构体以及关系图列在下面


VPA recommender.png

关键结构体 ClusterState

type ClusterState struct {
    // pod的集合
    Pods map[PodID]*PodState
    // VPA 对象集合
    Vpas map[VpaID]*Vpa
    // VPA objects in the cluster that have no recommendation mapped to the first
    // time we've noticed the recommendation missing or last time we logged
    // a warning about it.
    EmptyVPAs map[VpaID]time.Time
    // VpasWithMatchingPods contains information if there exist live pods that
    // this VPAs selector matches.
    VpasWithMatchingPods map[VpaID]bool
    // Observed VPAs. Used to check if there are updates needed.
    ObservedVpas []*vpa_types.VerticalPodAutoscaler

    //存储使用情况样本的所有容器聚合。
    aggregateStateMap aggregateContainerStatesMap
    // Map with all label sets used by the aggregations. It serves as a cache
    // that allows to quickly access labels.Set corresponding to a labelSetKey.
    labelSetMap labelSetMap
}

1.加载VPA,将vpa添加到ClusterState.Vpas

func (feeder *clusterStateFeeder) LoadVPAs() {
    // 获取VPA对象
    vpaCRDs, err := feeder.vpaLister.List(labels.Everything())
    for _, vpaCRD := range vpaCRDs {
        vpaID := model.VpaID{
            Namespace: vpaCRD.Namespace,
            VpaName:   vpaCRD.Name}
                // 获取select标签
        selector, conditions := feeder.getSelector(vpaCRD)
                //添加对应的vpa到
        if feeder.clusterState.AddOrUpdateVpa(vpaCRD, selector) == nil {
            // Successfully added VPA to the model.
            vpaKeys[vpaID] = true
            for _, condition := range conditions {
                if condition.delete {
                    delete(feeder.clusterState.Vpas[vpaID].Conditions, condition.conditionType)
                } else {
                    feeder.clusterState.Vpas[vpaID].Conditions.Set(condition.conditionType, true, "", condition.message)
                }
            }
        }
    }
}

2.同理加载pod,放入 ClusterState.Pods

func (feeder *clusterStateFeeder) LoadPods() {
    podSpecs, err := feeder.specClient.GetPodSpecs()
    for _, pod := range pods {
        feeder.clusterState.AddOrUpdatePod(pod.ID, pod.PodLabels, pod.Phase)
        for _, container := range pod.Containers {
            feeder.clusterState.AddOrUpdateContainer(container.ID, container.Request)
        }
    }
}

3.加载实时的metric,根据下面的方法调用链,最后数据会保存在clusterState.Pods.Containers.aggregator结构体下面的AggregateCPUUsage util.Histogram和AggregateMemoryPeaks util.Histogram

LoadRealTimeMetrics->
clusterState.AddSample->
containerState.AddSample->
container.addCPUSample/container.addMemorySample->
container.aggregator.AddSample->
AggregateContainerState.addCPUSample/AggregateContainerState.AggregateMemoryPeaks.AddSample->
histogram.AddSample->

#这个方法是添加聚合CPU样例
func (a *AggregateContainerState) addCPUSample(sample *ContainerUsageSample) {
    cpuUsageCores := CoresFromCPUAmount(sample.Usage)
    cpuRequestCores := CoresFromCPUAmount(sample.Request)
    // Samples are added with the weight equal to the current request. This means that
    // whenever the request is increased, the history accumulated so far effectively decays,
    // which helps react quickly to CPU starvation.
    a.AggregateCPUUsage.AddSample(
        cpuUsageCores, math.Max(cpuRequestCores, minSampleWeight), sample.MeasureStart)
    if sample.MeasureStart.After(a.LastSampleStart) {
        a.LastSampleStart = sample.MeasureStart
    }
    if a.FirstSampleStart.IsZero() || sample.MeasureStart.Before(a.FirstSampleStart) {
        a.FirstSampleStart = sample.MeasureStart
    }
    a.TotalSamplesCount++
}

#这条代码是添加内存峰值
a.AggregateMemoryPeaks.AddSample(BytesFromMemoryAmount(sample.Usage), 1.0, sample.MeasureStart)


#以及最底层histogram.AddSample, weight内存是1,cpu最小是0.1
func (h *histogram) AddSample(value float64, weight float64, time time.Time) {
    if weight < 0.0 {
        panic("sample weight must be non-negative")
    }
    bucket := h.options.FindBucket(value)
    h.bucketWeight[bucket] += weight
    h.totalWeight += weight
    if bucket < h.minBucket && h.bucketWeight[bucket] >= h.options.Epsilon() {
        h.minBucket = bucket
    }
    if bucket > h.maxBucket && h.bucketWeight[bucket] >= h.options.Epsilon() {
        h.maxBucket = bucket
    }
}

4.更新VPA

func (r *recommender) UpdateVPAs() {
        // 遍历clusterState.ObservedVpas
    for _, observedVpa := range r.clusterState.ObservedVpas {
//GetContainerNameToAggregateStateMap(vpa) 获取历史数据,GetRecommendedPodResources对资源进行推荐
        resources := 
r.podResourceRecommender.GetRecommendedPodResources(GetContainerNameToAggregateStateMap(vpa))
        had := vpa.HasRecommendation()
        vpa.UpdateRecommendation(getCappedRecommendation(vpa.ID, resources, observedVpa.Spec.ResourcePolicy))
    }
}

# 获取推荐的pod资源
func (r *podResourceRecommender) GetRecommendedPodResources(containerNameToAggregateStateMap model.ContainerNameToAggregateStateMap) RecommendedPodResources {
    var recommendation = make(RecommendedPodResources)
    if len(containerNameToAggregateStateMap) == 0 {
        return recommendation
    }

    fraction := 1.0 / float64(len(containerNameToAggregateStateMap))
    minResources := model.Resources{
        model.ResourceCPU:    model.ScaleResource(model.CPUAmountFromCores(*podMinCPUMillicores*0.001), fraction),
        model.ResourceMemory: model.ScaleResource(model.MemoryAmountFromBytes(*podMinMemoryMb*1024*1024), fraction),
    }
        # 设置最大值
    recommender := &podResourceRecommender{
        WithMinResources(minResources, r.targetEstimator),
        WithMinResources(minResources, r.lowerBoundEstimator),
        WithMinResources(minResources, r.upperBoundEstimator),
    }

    for containerName, aggregatedContainerState := range containerNameToAggregateStateMap {
#获取AggregateContainerState并返回容器建议。
        recommendation[containerName] = recommender.estimateContainerResources(aggregatedContainerState)
    }
    return recommendation
}

#获取AggregateContainerState并返回容器建议
func (r *podResourceRecommender) estimateContainerResources(s *model.AggregateContainerState) RecommendedContainerResources {
    return RecommendedContainerResources{
        r.targetEstimator.GetResourceEstimation(s),
        r.lowerBoundEstimator.GetResourceEstimation(s),
        r.upperBoundEstimator.GetResourceEstimation(s),
    }
}

#target 的GetResourceEstimation方法
func (e *marginEstimator) GetResourceEstimation(s *model.AggregateContainerState) model.Resources {
    originalResources := e.baseEstimator.GetResourceEstimation(s)
    newResources := make(model.Resources)
    for resource, resourceAmount := range originalResources {
        margin := model.ScaleResource(resourceAmount, e.marginFraction)
        newResources[resource] = originalResources[resource] + margin
    }
    return newResources
}

# up和low的GetResourceEstimation方法
func (e *confidenceMultiplier) GetResourceEstimation(s *model.AggregateContainerState) model.Resources {
    confidence := getConfidence(s)
    originalResources := e.baseEstimator.GetResourceEstimation(s)
    scaledResources := make(model.Resources)
    for resource, resourceAmount := range originalResources {
        scaledResources[resource] = model.ScaleResource(
            resourceAmount, math.Pow(1.+e.multiplier/confidence, e.exponent))
    }
    return scaledResources
}

这里有一个重要的结构体,实际就是填充Target、LowerBound、UpperBound这几个值作为最后request参考

// 对容器资源的推荐
type RecommendedContainerResources struct {
    //建议的最佳资源量。
    Target model.Resources
    //建议的最小资源量。
    LowerBound model.Resources
    //建议的最大资源量。
    UpperBound model.Resources
}

#也可以直接通过vpa的status查看到相应的属性,uncappedTarget也就等于Target,最后推荐的值
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  generation: 2
  name: stress-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: stress
  updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2020-01-02T08:54:40Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: stress
      lowerBound:
        cpu: 25m
        memory: 262144k
      target:
        cpu: 49m
        memory: "628694953"
      uncappedTarget:
        cpu: 49m
        memory: "628694953"
      upperBound:
        cpu: 70609m
        memory: "905949427273"

5.写入检查点

func (r *recommender) MaintainCheckpoints(ctx context.Context, minCheckpointsPerRun int) {
    now := time.Now()
    if r.useCheckpoints {
        if err := r.checkpointWriter.StoreCheckpoints(ctx, now, minCheckpointsPerRun); err != nil {
            klog.Warningf("Failed to store checkpoints. Reason: %+v", err)
        }
        if time.Now().Sub(r.lastCheckpointGC) > r.checkpointsGCInterval {
            r.lastCheckpointGC = now
            r.clusterStateFeeder.GarbageCollectCheckpoints()
        }
    }

}

6.垃圾回收,也就是清空ClusterState.aggregateStateMap

func (r *recommender) GarbageCollect() {
    gcTime := time.Now()
        // 如果立上一次GC的时间大于的GC频率,那么进行GC
    if gcTime.Sub(r.lastAggregateContainerStateGC) > AggregateContainerStateGCInterval {
        r.clusterState.GarbageCollectAggregateCollectionStates(gcTime)
        r.lastAggregateContainerStateGC = gcTime
    }
}

开启调试模式

由于默认连接方式是pod内部serveraccount的方式读取认证信息,这里修改一下,手动增加config配置,连接集群

func createKubeConfig(kubeApiQps float32, kubeApiBurst int) *rest.Config {
    //config, err := rest.InClusterConfig()
    config := &rest.Config{
        TLSClientConfig: rest.TLSClientConfig{
            Insecure: true,
        },
        Host: "https://192.168.56.100:6443",
        BearerToken: "xxxxx",
    }
    //if err != nil {
    //  klog.Fatalf("Failed to create config: %v", err)
    //}
    config.QPS = kubeApiQps
    config.Burst = kubeApiBurst
    return config
}

运行发现报错

E0102 12:46:47.019970   10664 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1.VerticalPodAutoscaler: the server could not find the requested resource (get verticalpodautoscalers.autoscaling.k8s.io)

发现是版本的问题,将CRD切换为v1版本, 就跑起来了

kubectl apply -f vpa-v1-crd.yaml

你可能感兴趣的:(vertical-pod-autoscaler源码解析——recommender)