通过prometheus实现k8s hpa自定义指标 (四)

在本系列文章的上一节通过prometheus实现k8s hpa自定义指标 (三),我们介绍了编写一个最基础的custom metrics API server所需要的库,该库作为prometheus adapter的基础。在这一节中,我们主要分析prometheus adapter。
由于我安装的k8s-prometheus-adapter版本为v0.2.0,这里主要是分析v0.2.0的原理和源码。

配置

默认情况下,adapter插件使用 Kubernetes in-cluster config连接k8s apiserver。它需要以下额外的参数配置,与prometheus和k8s集群通信。

--lister-kubeconfig=<path-to-kubeconfig>: This configures how the adapter talks to a Kubernetes API server in order to list objects when operating with label selectors. By default, it will use in-cluster config.

--metrics-relist-interval=<duration>: This is the interval at which to update the cache of available metrics from Prometheus.

--rate-interval=<duration>: This is the duration used when requesting rate metrics from Prometheus. It must be larger than your Prometheus collection interval.

--prometheus-url=<url>: This is the URL used to connect to Prometheus. It will eventually contain query parameters to configure the connection.

metrics格式

adapter以指定的时间间隔从prometheus收集可用的metrics,只考虑以下形式的指标:

  • “container” metrics(cAdvisor container metrics): 以container_开头的series,以及非空namespace和pod_name标签。
  • “namespaced” metrics (metrics describing namespaced Kubernetes objects): 带有非空namespace标签的series(不以container_开头)。

注意:目前,不支持non-namespaced对象(除namespaces本身)的度量。

prometheus中的metrics在custom-metrics-API中会转换如下:

  • metric名称和类型已经被确定:
    • 对属于容器的metrics,将去除container_前缀
    • 如果metric有_total后缀,它被标记为counter metric,比去掉后缀
    • 如果metric有_seconds_total后缀,被标记为seconds counter metric,并去掉后缀
    • 如果metric没有以上后缀,被标记为gauge metric,meitric名称将保持原样
  • 关联资源与metric:
    • 容器metric和pod关联
    • 对于非容器metric,series中的每个label将被考虑。如果该标签表示的是一个可用resource(没有group),metric可以和该resource关联。一个metric可以和多个resource相关联。

当检索counter和seconds-counter metrics时,适配器会在配置的时间内以特定速率请求metrics。对于具有多个关联resource的metric,适配器请求的metric在所有未请求的聚合metrics。

适配器不考虑"POD"的容器,POD的存在容器只是支持容器的共享网络命名空间。

源码分析

func (o PrometheusAdapterServerOptions) RunCustomMetricsAdapterServer(stopCh <-chan struct{}) error {
	config, err := o.Config()
	if err != nil {
		return err
	}

	config.GenericConfig.EnableMetrics = true

	var clientConfig *rest.Config
	if len(o.RemoteKubeConfigFile) > 0 {
		loadingRules := &clientcmd.ClientConfigLoadingRules{ExplicitPath: o.RemoteKubeConfigFile}
		loader := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, &clientcmd.ConfigOverrides{})

		clientConfig, err = loader.ClientConfig()
	} else {
		clientConfig, err = rest.InClusterConfig()
	}
	if err != nil {
		return fmt.Errorf("unable to construct lister client config to initialize provider: %v", err)
	}

	discoveryClient, err := discovery.NewDiscoveryClientForConfig(clientConfig)
	if err != nil {
		return fmt.Errorf("unable to construct discovery client for dynamic client: %v", err)
	}

	dynamicMapper, err := dynamicmapper.NewRESTMapper(discoveryClient, apimeta.InterfacesForUnstructured, o.DiscoveryInterval)
	if err != nil {
		return fmt.Errorf("unable to construct dynamic discovery mapper: %v", err)
	}

	clientPool := dynamic.NewClientPool(clientConfig, dynamicMapper, dynamic.LegacyAPIPathResolverFunc)
	if err != nil {
		return fmt.Errorf("unable to construct lister client to initialize provider: %v", err)
	}

	// TODO: actually configure this client (strip query vars, etc)
	baseURL, err := url.Parse(o.PrometheusURL)
	if err != nil {
		return fmt.Errorf("invalid Prometheus URL %q: %v", baseURL, err)
	}
	genericPromClient := prom.NewGenericAPIClient(http.DefaultClient, baseURL)
	instrumentedGenericPromClient := mprom.InstrumentGenericAPIClient(genericPromClient, baseURL.String())
	promClient := prom.NewClientForAPI(instrumentedGenericPromClient)

	cmProvider := cmprov.NewPrometheusProvider(dynamicMapper, clientPool, promClient, o.MetricsRelistInterval, o.RateInterval, stopCh)

	server, err := config.Complete().New("prometheus-custom-metrics-adapter", cmProvider)
	if err != nil {
		return err
	}
	return server.GenericAPIServer.PrepareRun().Run(stopCh)
}

我们进入RunCustomMetricsAdapterServer函数,discoveryClient和k8s apiserver初始化有关,并生成dynamicMapper,保存着k8s resources和kinds的映射关系。同时初始化promClient,是prometheus的客户端,给adapter提供metrics。随后再初始化provider,如下所示:

func NewPrometheusProvider(mapper apimeta.RESTMapper, kubeClient dynamic.ClientPool, promClient prom.Client, updateInterval time.Duration, rateInterval time.Duration, stopChan <-chan struct{}) provider.CustomMetricsProvider {
	lister := &cachingMetricsLister{
		updateInterval: updateInterval,
		promClient:     promClient,

		SeriesRegistry: &basicSeriesRegistry{
			namer: metricNamer{
				// TODO: populate the overrides list
				overrides: nil,
				mapper:    mapper,
			},
		},
	}

	lister.RunUntil(stopChan)

	return &prometheusProvider{
		mapper:     mapper,
		kubeClient: kubeClient,
		promClient: promClient,

		SeriesRegistry: lister,

		rateInterval: rateInterval,
	}
}

这里我们关注lister变量,它会运行一个RunUntil函数,以我们设置的adapter其定参数metrics-relist-interval的时间间隔每次执行一次,目的是缓存metrics列表

func (l *cachingMetricsLister) RunUntil(stopChan <-chan struct{}) {
	go wait.Until(func() {
		if err := l.updateMetrics(); err != nil {
			utilruntime.HandleError(err)
		}
	}, l.updateInterval, stopChan)
}

func (l *cachingMetricsLister) updateMetrics() error {
	startTime := pmodel.Now().Add(-1 * l.updateInterval)

	// container-specific metrics from cAdvsior have their own form, and need special handling
	containerSel := prom.MatchSeries("", prom.NameMatches("^container_.*"), prom.LabelNeq("container_name", "POD"), prom.LabelNeq("namespace", ""), prom.LabelNeq("pod_name", ""))
	namespacedSel := prom.MatchSeries("", prom.LabelNeq("namespace", ""), prom.NameNotMatches("^container_.*"))
	// TODO: figure out how to determine which metrics on non-namespaced objects are kubernetes-related

	// TODO: use an actual context here
	series, err := l.promClient.Series(context.Background(), pmodel.Interval{startTime, 0}, containerSel, namespacedSel)
	if err != nil {
		return fmt.Errorf("unable to update list of all available metrics: %v", err)
	}

	glog.V(10).Infof("Set available metric list from Prometheus to: %v", series)

	l.SetSeries(series)

	return nil
}

这里它只收集容器的metric和namespaced的metric,具体收集方式如下:

  1. 容器metrics,metric名称以container_为前缀的,并且标签中包含"container_name:POD"且key为namespace和pod_name的值为空的metrics将被过滤掉。
  2. 具备namespace的metrics,metric名称不以container_为前缀且metric的标签中包含namespace且为空的将被过滤掉。

从prometheus中获取的series,再经SetSeries函数完成series的缓存工作

func (r *basicSeriesRegistry) SetSeries(newSeries []prom.Series) error {
	newInfo := make(map[provider.MetricInfo]seriesInfo)
	for _, series := range newSeries {
		if strings.HasPrefix(series.Name, "container_") {
			r.namer.processContainerSeries(series, newInfo)
		} else if namespaceLabel, hasNamespaceLabel := series.Labels["namespace"]; hasNamespaceLabel && namespaceLabel != "" {
			// we also handle namespaced metrics here as part of the resource-association logic
			if err := r.namer.processNamespacedSeries(series, newInfo); err != nil {
				glog.Errorf("Unable to process namespaced series %q: %v", series.Name, err)
				continue
			}
		} else {
			if err := r.namer.processRootScopedSeries(series, newInfo); err != nil {
				glog.Errorf("Unable to process root-scoped series %q: %v", series.Name, err)
				continue
			}
		}
	}

	newMetrics := make([]provider.MetricInfo, 0, len(newInfo))
	for info := range newInfo {
		newMetrics = append(newMetrics, info)
	}

	r.mu.Lock()
	defer r.mu.Unlock()

	r.info = newInfo
	r.metrics = newMetrics

	return nil
}

对于每条series,主要将其分为3中类型,分别为processContainerSeries、processNamespacedSeries和processRootScopedSeries。

processContainerSeries

如果series的名称以container_为前缀,则将该指标位容器series,将放进pod类型

// processContainerSeries performs special work to extract metric definitions
// from cAdvisor-sourced container metrics, which don't particularly follow any useful conventions consistently.
func (n *metricNamer) processContainerSeries(series prom.Series, infos map[provider.MetricInfo]seriesInfo) {

	originalName := series.Name

	var name string
	metricKind := GaugeSeries
	if override, hasOverride := n.overrides[series.Name]; hasOverride {
		name = override.metricName
		metricKind = override.kind
	} else {
		// chop of the "container_" prefix
		series.Name = series.Name[10:]
		name, metricKind = n.metricNameFromSeries(series)
	}

	info := provider.MetricInfo{
		GroupResource: schema.GroupResource{Resource: "pods"},
		Namespaced:    true,
		Metric:        name,
	}

	infos[info] = seriesInfo{
		kind:        metricKind,
		baseSeries:  prom.Series{Name: originalName},
		isContainer: true,
	}
}

processContainerSeries为从cadvisor中获取的series的分类函数,这里series的名称会转换,如果配置overrides则覆盖series名称(v0.2.0并没有提供覆盖series name配置,新版本有提供),否则去除container_前缀,然后再判断series名称的类型

// metricNameFromSeries extracts a metric name from a series name, and indicates
// whether or not that series was a counter.  It also has special logic to deal with time-based
// counters, which general get converted to milli-unit rate metrics.
func (n *metricNamer) metricNameFromSeries(series prom.Series) (name string, kind SeriesType) {
	kind = GaugeSeries
	name = series.Name
	if strings.HasSuffix(name, "_total") {
		kind = CounterSeries
		name = name[:len(name)-6]

		if strings.HasSuffix(name, "_seconds") {
			kind = SecondsCounterSeries
			name = name[:len(name)-8]
		}
	}

	return
}

如果series名称包含_total后缀,则该series类型为GaugeSeries,并且series名称再去掉_total后缀,如果series名称包含_seconds后缀,则该series类型为_seconds,并且series名称再去掉_seconds后缀,得到的metric名称为新的metric名称。

processNamespacedSeries

具有namespace标签并且namespace的值不为空的series

// processNamespacedSeries adds the metric info for the given generic namespaced series to
// the map of metric info.
func (n *metricNamer) processNamespacedSeries(series prom.Series, infos map[provider.MetricInfo]seriesInfo) error {
	// NB: all errors must occur *before* we save the series info
	name, metricKind := n.metricNameFromSeries(series)
	resources, err := n.groupResourcesFromSeries(series)
	if err != nil {
		return fmt.Errorf("unable to process prometheus series %s: %v", series.Name, err)
	}

	// we add one metric for each resource that this could describe
	for _, resource := range resources {
		info := provider.MetricInfo{
			GroupResource: resource,
			Namespaced:    true,
			Metric:        name,
		}

		// metrics describing namespaces aren't considered to be namespaced
		if resource == (schema.GroupResource{Resource: "namespaces"}) {
			info.Namespaced = false
		}

		infos[info] = seriesInfo{
			kind:       metricKind,
			baseSeries: prom.Series{Name: series.Name},
		}
	}

	return nil
}

这里关注两个函数,metricNameFromSeries和groupResourcesFromSeries,metricNameFromSeries函数在上述介绍processContainerSeries的时候介绍了,这里介绍groupResourcesFromSeries函数:

// groupResourceFromSeries collects the possible group-resources that this series could describe by
// going through each label, checking to see if it corresponds to a known resource.  For instance,
// a series `ingress_http_hits_total{pod="foo",service="bar",ingress="baz",namespace="ns"}`
// would return three GroupResources: "pods", "services", and "ingresses".
// Returned MetricInfo is equilavent to the "normalized" info produced by metricInfo.Normalized.
func (n *metricNamer) groupResourcesFromSeries(series prom.Series) ([]schema.GroupResource, error) {
	var res []schema.GroupResource
	for label := range series.Labels {
		// TODO: figure out a way to let people specify a fully-qualified name in label-form
		gvr, err := n.mapper.ResourceFor(schema.GroupVersionResource{Resource: string(label)})
		if err != nil {
			if apimeta.IsNoMatchError(err) {
				continue
			}
			return nil, err
		}
		res = append(res, gvr.GroupResource())
	}

	return res, nil
}

groupResourcesFromSeries通过series的标签返回该series所属的resource集合,最后再根据返回的resource集合创建info对象。

processRootScopedSeries

不满足processContainerSeries和processNamespacedSeries的series被称为rootScoped seeries。

// processesRootScopedSeries adds the metric info for the given generic namespaced series to
// the map of metric info.
func (n *metricNamer) processRootScopedSeries(series prom.Series, infos map[provider.MetricInfo]seriesInfo) error {
	// NB: all errors must occur *before* we save the series info
	name, metricKind := n.metricNameFromSeries(series)
	resources, err := n.groupResourcesFromSeries(series)
	if err != nil {
		return fmt.Errorf("unable to process prometheus series %s: %v", series.Name, err)
	}

	// we add one metric for each resource that this could describe
	for _, resource := range resources {
		info := provider.MetricInfo{
			GroupResource: resource,
			Namespaced:    false,
			Metric:        name,
		}

		infos[info] = seriesInfo{
			kind:       metricKind,
			baseSeries: prom.Series{Name: series.Name},
		}
	}

	return nil
}

乍一看,processRootScopedSeries函数和processNamespacedSeries函数代码几乎一样,区别在于processNamespacedSeries函数多了resource为namespace的判断,当然processRootScopedSeries的series都是不具备namespace属性的:

// metrics describing namespaces aren't considered to be namespaced
	if resource == (schema.GroupResource{Resource: "namespaces"}) {
		info.Namespaced = false
	}

介绍完SetSeries的设置之后,我们来看下怎么去查询metric,这里先附上SeriesRegistry接口中的方法:

// SeriesRegistry provides conversions between Prometheus series and MetricInfo
type SeriesRegistry interface {
	// SetSeries replaces the known series in this registry
	SetSeries(series []prom.Series) error
	// ListAllMetrics lists all metrics known to this registry
	ListAllMetrics() []provider.MetricInfo
	// SeriesForMetric looks up the minimum required series information to make a query for the given metric
	// against the given resource (namespace may be empty for non-namespaced resources)
	QueryForMetric(info provider.MetricInfo, namespace string, resourceNames ...string) (kind SeriesType, query prom.Selector, groupBy string, found bool)
	// MatchValuesToNames matches result values to resource names for the given metric and value set
	MatchValuesToNames(metricInfo provider.MetricInfo, values pmodel.Vector) (matchedValues map[string]pmodel.SampleValue, found bool)
}

以QueryForMetric为例,该函数主要给定metricInfo的series类型,prometheus query语句以及分租资源:

func (r *basicSeriesRegistry) QueryForMetric(metricInfo provider.MetricInfo, namespace string, resourceNames ...string) (kind SeriesType, query prom.Selector, groupBy string, found bool) {
	r.mu.RLock()
	defer r.mu.RUnlock()

	if len(resourceNames) == 0 {
		glog.Errorf("no resource names requested while producing a query for metric %s", metricInfo.String())
		return 0, "", "", false
	}

	metricInfo, singularResource, err := metricInfo.Normalized(r.namer.mapper)
	if err != nil {
		glog.Errorf("unable to normalize group resource while producing a query: %v", err)
		return 0, "", "", false
	}

	// TODO: support container metrics
	if info, found := r.info[metricInfo]; found {
		targetValue := resourceNames[0]
		matcher := prom.LabelEq
		if len(resourceNames) > 1 {
			targetValue = strings.Join(resourceNames, "|")
			matcher = prom.LabelMatches
		}

		var expressions []string
		if info.isContainer {
			expressions = []string{matcher("pod_name", targetValue), prom.LabelNeq("container_name", "POD")}
			groupBy = "pod_name"
		} else {
			// TODO: copy base series labels?
			expressions = []string{matcher(singularResource, targetValue)}
			groupBy = singularResource
		}

		if metricInfo.Namespaced {

			expressions = append(expressions, prom.LabelEq("namespace", namespace))
		}

		return info.kind, prom.MatchSeries(info.baseSeries.Name, expressions...), groupBy, true
	}

	glog.V(10).Infof("metric %v not registered", metricInfo)
	return 0, "", "", false
}

其中query为:

// MatchSeries takes a series name, and optionally some label expressions, and returns a series selector.
// TODO: validate series name and expressions?
func MatchSeries(name string, labelExpressions ...string) Selector {
	if len(labelExpressions) == 0 {
		return Selector(name)
	}

	return Selector(fmt.Sprintf("%s{%s}", name, strings.Join(labelExpressions, ",")))
}

我们在看下QueryForMetric函数是在哪里调用的,通过查找,我们会发现在buildQuery函数中,从函数名称我们可以看到,其实就是在创建prometheus query语句,并返回相应的查询结果,具体代码如下:

func (p *prometheusProvider) buildQuery(info provider.MetricInfo, namespace string, names ...string) (pmodel.Vector, error) {
	kind, baseQuery, groupBy, found := p.QueryForMetric(info, namespace, names...)
	if !found {
		return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)
	}

	fullQuery := baseQuery
	switch kind {
	case CounterSeries:
		fullQuery = prom.Selector(fmt.Sprintf("rate(%s[%s])", baseQuery, pmodel.Duration(p.rateInterval).String()))
	case SecondsCounterSeries:
		// TODO: futher modify for seconds?
		fullQuery = prom.Selector(prom.Selector(fmt.Sprintf("rate(%s[%s])", baseQuery, pmodel.Duration(p.rateInterval).String())))
	}

	// NB: too small of a rate interval will return no results...

	// sum over all other dimensions of this query (e.g. if we select on route, sum across all pods,
	// but if we select on pods, sum across all routes), and split by the dimension of our resource
	// TODO: return/populate the by list in SeriesForMetric
	fullQuery = prom.Selector(fmt.Sprintf("sum(%s) by (%s)", fullQuery, groupBy))

	// TODO: use an actual context
	queryResults, err := p.promClient.Query(context.Background(), pmodel.Now(), fullQuery)
	if err != nil {
		glog.Errorf("unable to fetch metrics from prometheus: %v", err)
		// don't leak implementation details to the user\
		return nil, apierr.NewInternalError(fmt.Errorf("unable to fetch metrics"))
	}

	if queryResults.Type != pmodel.ValVector {
		glog.Errorf("unexpected results from prometheus: expected %s, got %s on results %v", pmodel.ValVector, queryResults.Type, queryResults)
		return nil, apierr.NewInternalError(fmt.Errorf("unable to fetch metrics"))
	}

	return *queryResults.Vector, nil
}

buildQuery函数这里就不一一介绍了,我们通过代码就能明显的看出该函数的处理逻辑。

hpa Controller的rest metrics client

我们在配置hpa yaml文件时,关于type有三种选项,Resource、Pods和Object,这三种type对应于hpa Controller的处理函数分别为GetResourceMetric、GetRawMetric和GetObjectMetric。他们的处理函数分别如下所示:

// GetResourceMetric gets the given resource metric (and an associated oldest timestamp)
// for all pods matching the specified selector in the given namespace
func (c *resourceMetricsClient) GetResourceMetric(resource v1.ResourceName, namespace string, selector labels.Selector) (PodMetricsInfo, time.Time, error) {
	metrics, err := c.client.PodMetricses(namespace).List(metav1.ListOptions{LabelSelector: selector.String()})
	if err != nil {
		return nil, time.Time{}, fmt.Errorf("unable to fetch metrics from API: %v", err)
	}

	if len(metrics.Items) == 0 {
		return nil, time.Time{}, fmt.Errorf("no metrics returned from heapster")
	}

	res := make(PodMetricsInfo, len(metrics.Items))

	for _, m := range metrics.Items {
		podSum := int64(0)
		missing := len(m.Containers) == 0
		for _, c := range m.Containers {
			resValue, found := c.Usage[v1.ResourceName(resource)]
			if !found {
				missing = true
				glog.V(2).Infof("missing resource metric %v for container %s in pod %s/%s", resource, c.Name, namespace, m.Name)
				break // containers loop
			}
			podSum += resValue.MilliValue()
		}

		if !missing {
			res[m.Name] = int64(podSum)
		}
	}

	timestamp := metrics.Items[0].Timestamp.Time

	return res, timestamp, nil
}

// customMetricsClient implements the custom-metrics-related parts of MetricsClient,
// using data from the custom metrics API.
type customMetricsClient struct {
	client customclient.CustomMetricsClient
}

// GetRawMetric gets the given metric (and an associated oldest timestamp)
// for all pods matching the specified selector in the given namespace
func (c *customMetricsClient) GetRawMetric(metricName string, namespace string, selector labels.Selector) (PodMetricsInfo, time.Time, error) {
	metrics, err := c.client.NamespacedMetrics(namespace).GetForObjects(schema.GroupKind{Kind: "Pod"}, selector, metricName)
	if err != nil {
		return nil, time.Time{}, fmt.Errorf("unable to fetch metrics from API: %v", err)
	}

	if len(metrics.Items) == 0 {
		return nil, time.Time{}, fmt.Errorf("no metrics returned from custom metrics API")
	}

	res := make(PodMetricsInfo, len(metrics.Items))
	for _, m := range metrics.Items {
		res[m.DescribedObject.Name] = m.Value.MilliValue()
	}

	timestamp := metrics.Items[0].Timestamp.Time

	return res, timestamp, nil
}

// GetObjectMetric gets the given metric (and an associated timestamp) for the given
// object in the given namespace
func (c *customMetricsClient) GetObjectMetric(metricName string, namespace string, objectRef *autoscaling.CrossVersionObjectReference) (int64, time.Time, error) {
	gvk := schema.FromAPIVersionAndKind(objectRef.APIVersion, objectRef.Kind)
	var metricValue *customapi.MetricValue
	var err error
	if gvk.Kind == "Namespace" && gvk.Group == "" {
		// handle namespace separately
		// NB: we ignore namespace name here, since CrossVersionObjectReference isn't
		// supposed to allow you to escape your namespace
		metricValue, err = c.client.RootScopedMetrics().GetForObject(gvk.GroupKind(), namespace, metricName)
	} else {
		metricValue, err = c.client.NamespacedMetrics(namespace).GetForObject(gvk.GroupKind(), objectRef.Name, metricName)
	}

	if err != nil {
		return 0, time.Time{}, fmt.Errorf("unable to fetch metrics from API: %v", err)
	}

	return metricValue.Value.MilliValue(), metricValue.Timestamp.Time, nil
}

对于Resource和Pods类型的hpa,其实都是在请求series的resource为pods的metric,而对于Object类型如果对象有namespace,则请求namespace类型的metric,否则请求rootScoped类型的metric,最后根据请求返回的结果再做计算和我们设定的阈值比较决定是否扩缩容。我们将这些代码和我们的adapter结合起来看就能理解这个过程了。

总结

本节通过对k8s-prometheus-adapter的主要代码做分析,主要介绍了series的缓存和对series的处理,并同时介绍了adapter请求metirc与prometheus的交互过程以及hpa controller分类请求custom metric API等。结合前面3节,我们可以做到根据自己的应用自定义指标做扩缩容,同时为了起到理解hpa的工作过程,本系列文章也介绍了custom metric适配器开发和prometheus适配器源码的主要部分做分析,主要是给我们定义hpa yaml文件提供一些帮助和支持。当然纸上得来终觉浅,绝知此事需躬行,请务必要动手实践,多多实践才能更好的帮助我们理解整个过程。

你可能感兴趣的:(kubernetes源码分析)