k8s监控组织架构
指标说明
- 系统指标
分为节点/容器资源使用和DaemonSet运行的资源 - 服务指标
分为Kubernetes基础结构组件产生的和应用pod产生的
kube-stats-metrics
- job_name: kube-state-metrics
honor_timestamps: false
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- kube-state-metrics.kube-admin:8080
k8s apiserver是什么
k8s API Server提供了k8s各类资源对象(pod,RC,Service等)的增删改查及watch等HTTP Rest接口,是整个系统的数据总线和数据中心
采集原理
kube-state-metrics使用client-go
与Kubernetes集群通信,不断轮询api-server
- 初始化metric store family
// E:\go_path\src\k8s.io\kube-state-metrics\internal\store\builder.go
var availableStores = map[string]func(f *Builder) cache.Store{
"certificatesigningrequests": func(b *Builder) cache.Store { return b.buildCsrStore() },
"configmaps": func(b *Builder) cache.Store { return b.buildConfigMapStore() },
"cronjobs": func(b *Builder) cache.Store { return b.buildCronJobStore() },
"daemonsets": func(b *Builder) cache.Store { return b.buildDaemonSetStore() },
"deployments": func(b *Builder) cache.Store { return b.buildDeploymentStore() },
"endpoints": func(b *Builder) cache.Store { return b.buildEndpointsStore() },
"horizontalpodautoscalers": func(b *Builder) cache.Store { return b.buildHPAStore() },
"ingresses": func(b *Builder) cache.Store { return b.buildIngressStore() },
"jobs": func(b *Builder) cache.Store { return b.buildJobStore() },
"leases": func(b *Builder) cache.Store { return b.buildLeases() },
"limitranges": func(b *Builder) cache.Store { return b.buildLimitRangeStore() },
"mutatingwebhookconfigurations": func(b *Builder) cache.Store { return b.buildMutatingWebhookConfigurationStore() },
"namespaces": func(b *Builder) cache.Store { return b.buildNamespaceStore() },
"networkpolicies": func(b *Builder) cache.Store { return b.buildNetworkPolicyStore() },
"nodes": func(b *Builder) cache.Store { return b.buildNodeStore() },
"persistentvolumeclaims": func(b *Builder) cache.Store { return b.buildPersistentVolumeClaimStore() },
"persistentvolumes": func(b *Builder) cache.Store { return b.buildPersistentVolumeStore() },
"poddisruptionbudgets": func(b *Builder) cache.Store { return b.buildPodDisruptionBudgetStore() },
"pods": func(b *Builder) cache.Store { return b.buildPodStore() },
"replicasets": func(b *Builder) cache.Store { return b.buildReplicaSetStore() },
"replicationcontrollers": func(b *Builder) cache.Store { return b.buildReplicationControllerStore() },
"resourcequotas": func(b *Builder) cache.Store { return b.buildResourceQuotaStore() },
"secrets": func(b *Builder) cache.Store { return b.buildSecretStore() },
"services": func(b *Builder) cache.Store { return b.buildServiceStore() },
"statefulsets": func(b *Builder) cache.Store { return b.buildStatefulSetStore() },
"storageclasses": func(b *Builder) cache.Store { return b.buildStorageClassStore() },
"validatingwebhookconfigurations": func(b *Builder) cache.Store { return b.buildValidatingWebhookConfigurationStore() },
"volumeattachments": func(b *Builder) cache.Store { return b.buildVolumeAttachmentStore() },
"verticalpodautoscalers": func(b *Builder) cache.Store { return b.buildVPAStore() },
}
- 初始化watchfunc 接收结果
// E:\go_path\src\k8s.io\kube-state-metrics\internal\store\builder.go
// reflectorPerNamespace creates a Kubernetes client-go reflector with the given
// listWatchFunc for each given namespace and registers it with the given store.
func (b *Builder) reflectorPerNamespace(
expectedType interface{},
store cache.Store,
listWatchFunc func(kubeClient clientset.Interface, ns string) cache.ListerWatcher,
) {
lwf := func(ns string) cache.ListerWatcher { return listWatchFunc(b.kubeClient, ns) }
lw := listwatch.MultiNamespaceListerWatcher(b.namespaces, nil, lwf)
instrumentedListWatch := watch.NewInstrumentedListerWatcher(lw, b.metrics, reflect.TypeOf(expectedType).String())
reflector := cache.NewReflector(sharding.NewShardedListWatch(b.shard, b.totalShards, instrumentedListWatch), expectedType, store, 0)
go reflector.Run(b.ctx.Done())
}
指标列举
- ConfigMap指标: ConfigMap是什么
eg: configmap信息
kube_configmap_info{configmap="xxx",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="xxx"}
- CronJob指标 CronJob是什么
eg: cronjob下次调度时间
kube_cronjob_next_schedule_time{cronjob="abc",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="abc"} 1594306800
- DaemonSet指标 DaemonSet是什么
eg: ready daemonset
kube_daemonset_status_number_ready{daemonset="npd",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-admin"} 6
- Deployment Metrics Deployment是什么
eg : 不健康的pod
kube_deployment_status_replicas_unavailable{deployment="coredns",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-system"}
- Endpoints Metrics : service向其发送流量的对象的IP地址
eg: nginx avaiable eps
kube_endpoint_address_available{endpoint="nginx",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="xxx"}
- Horizontal Pod Autoscaler(HPA) Metrics: HPA是什么
eg: 第三方hpa依据metric_name
kube_horizontalpodautoscaler_spec_target_metric{metric_name="xxxx"}
- Ingress Metrics Ingress是什么
eg: ingress info
kube_ingress_info{ingress="xxxx",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="xxx"}
- Lease Metrics Lease是什么
- Namespace Metrics Namespace是什么
eg:
kube_namespace_status_phase{instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-system",phase="Active"}
- Node Metrics 使用node-problem-detector探测node的问题
其中节点不健康状态有:MemoryPressure DiskPressure PIDPressure KernelDeadlock ReadonlyFilesystem
-
eg: node节点不健康
kube_node_status_condition{condition="Ready",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",node="xxxx.xxx.xxx.xx",status="unknown"}
- PersistentVolume PersistentVolumeClaim Metrics pv pvc是什么
- PodDisruptionBudget Metrics PDB是什么
- Pod Metrics
eg: pod重启
idelta(kube_pod_container_status_restarts_total{}[1m]) > 0
eg: 代表pod在waiting状态
kube_pod_container_status_waiting_reason==1
其中状态有
ImagePullBackOff
CrashLoopBackOff
ErrImagePull
CreateContainerConfigError
CreateContainerError
InvalidImageName
eg: pod分配cpu
kube_pod_container_resource_requests_cpu_cores
eg: pod分配内存
kube_pod_container_resource_requests_memory_bytes
eg: pod pending
kube_pod_status_phase{phase=~"Pending|Unknown"}
状态有
Pending
Succeeded
Failed
Running
Unknown
- ReplicaSet metrics
- ResourceQuota Metrics RQ是什么
资源分cpu和memory ,对象分pod和namespace,类型分used和hard
- Secret Metrics Secret 是什么
- Service Metrics Service是什么
- Stateful Set Metrics Statfulset是什么
eg : ready statfulset pod
kube_statefulset_status_replicas_ready{instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-admin",statefulset="prometheus"}