kubelet 指标监控,/metrics 接口暴露出来的指标列表和说明,

现状

这段时间在研究 kubernetes 的性能测试,需要观察当前 kubernetes 的各种运行指标,但是苦于英语渣,百度又找不到别人的分享,或者搜索词不对。

过程

没有办法,硬着头皮使用 Goole 翻译加上看代码对 kubelet 的指标内容进行了翻译,所以整理一下,希望可以帮到有需要的人。

结果

// 审计事件生成并发送到审计后端的数量
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.

// apiserver 请求由于审计日志后端发生错误而被拒绝的数量
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.

// 验证请求使用的证书的剩余时间的分布情况
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.

// 用于缓存 DEK 而使用掉的缓存百分比
# HELP apiserver_envelope_encryption_dek_cache_fill_percent [ALPHA] Percent of the cache slots currently occupied by cached DEKs.

// 生成加密key的操作耗时,以秒为单位
# HELP apiserver_storage_data_key_generation_duration_seconds [ALPHA] Latencies in seconds of data encryption key(DEK) generation operations.

// 生成加密key失败的数量
# HELP apiserver_storage_data_key_generation_failures_total [ALPHA] Total number of failed data encryption key(DEK) generation operations.

// 访问解密key时没有命中缓存的总数
# HELP apiserver_storage_envelope_transformation_cache_misses_total [ALPHA] Total number of cache misses while accessing key decryption key(KEK).

// 记录了Token()方法请求备用令牌源的次数
# HELP get_token_count [ALPHA] Counter of total Token() requests to the alternate token source 

// 记录了Token()方法请求备用令牌源失败的次数
# HELP get_token_fail_count [ALPHA] Counter of failed Token() requests to the alternate token source 

// 记录了 pod 拥有的 container 容器数量
# HELP kubelet_containers_per_pod_count [ALPHA] The number of containers per pod.

// 正在或者已经发送出去请求,正在等待服务器响应中的请求数量
# HELP kubelet_http_inflight_requests [ALPHA] Number of the inflight http requests 

// HTTP请求的响应时间,以秒为单位
# HELP kubelet_http_requests_duration_seconds [ALPHA] Duration in seconds to serve http requests 

// 服务器启动以来接收到的请求数量
# HELP kubelet_http_requests_total [ALPHA] Number of the http requests received since the server started 

// 如果节点遇到与配置相关的错误,会把指标设置为true(1),否则为false(0)。
# HELP kubelet_node_config_error [ALPHA] This metric is true (1) if the node is experiencing a configuration-related error, false (0) otherwise.

// node节点名称,这里的数量始终为1个
# HELP kubelet_node_name [ALPHA] The node's name. The count is always 1. 

// 丢弃掉的 PLEG 事件数量 - 一般是由于事件队列满了
# HELP kubelet_pleg_discard_events [ALPHA] The number of discard events in PLEG. 

// PLEG 最后一次可用(实际上是健康检查)的秒级时间戳 
# HELP kubelet_pleg_last_seen_seconds [ALPHA] Timestamp in seconds when PLEG was last seen active. 

// PLEG 进行 relist 操作的响应时间 - relist() 操作会向运行时获取pod/容器列表然后和内部的列表进行比较,并生成相应的事件
# HELP kubelet_pleg_relist_duration_seconds [ALPHA] Duration in seconds for relisting pods in PLEG. 

// PLEG 两次 relist 操作之间的间隔时间
# HELP kubelet_pleg_relist_interval_seconds [ALPHA] Interval in seconds between relisting in PLEG. 

// 单个 pod 从pending状态到running状态花费的时间,以秒为单位。
# HELP kubelet_pod_start_duration_seconds [ALPHA] Duration in seconds for a single pod to go from pending to running. 

// 从看到一个pod(pod存在?相似但不完全一样)到开始工作(starting a worker)花费的时间,以秒为单位.
# HELP kubelet_pod_worker_start_duration_seconds [ALPHA] Duration in seconds from seeing a pod to starting a worker. 

// 当前正在运行中的pod数量
# HELP kubelet_running_pods [ALPHA] Number of pods currently running 

// runtime操作的响应时间,按操作类型划分,以秒为单位
# HELP kubelet_runtime_operations_duration_seconds [ALPHA] Duration in seconds of runtime operations. Broken down by operation type. 

// 累计的runtime操作数量,按操作类型划分。
# HELP kubelet_runtime_operations_total [ALPHA] Cumulative number of runtime operations by operation type. 

// 编译信息
# HELP kubernetes_build_info [ALPHA] A metric with a constant '1' value labeled by major, minor, git version, git commit, git tree state, build date, Go version, and compiler from which Kubernetes was built, and platform on which it is running.

// 执行轮换之前最后一个 auth 插件生成的客户端证书的生存时间,秒为单位。如果没有使用插件或者插件不管理证书,这里没数据。
# HELP rest_client_exec_plugin_certificate_rotation_age [ALPHA] Histogram of the number of seconds the last auth exec plugin client certificate lived before being rotated. If auth exec plugin client certificates are unused, histogram will contain no data.

// 由 auth  插件管理的客户端证书最短生存时间,以秒为单位,证书过期则为负数。如果没有使用插件或者插件不管理证书,则该值为+INF
# HELP rest_client_exec_plugin_ttl_seconds [ALPHA] Gauge of the shortest TTL (time-to-live) of the client certificate(s) managed by the auth exec plugin. The value is in seconds until certificate expiry (negative if already expired). If auth exec plugins are unused or manage no TLS certificates, the value will be +INF.

// 请求的延时时间,按URL和动作(varb)划分。
# HELP rest_client_request_duration_seconds [ALPHA] Request latency in seconds. Broken down by verb and URL. 

// HTTP请求数量,按服务器响应代码,操作方式和host进行划分.
# HELP rest_client_requests_total [ALPHA] Number of HTTP requests, partitioned by status code, method, and host. 

你可能感兴趣的:(kubernetes,k8s,kubelet,kubernetes,指标,监控)