prometheus-operator 监控报错

报错alert:

KubeControllerManagerDown

message:

KubeControllerManager has disappeared from Prometheus target discovery.

根据ServiceMonitor—> Service—>endpoints(pod) 服务发现机制查看到KubeControllerManager没有对应的svc 所以我们需要创建svc

通过查看kube-controller-manager的servicemonitor 我们可以看到对应的标签以及port的name如:

# kubectl get servicemonitor kube-controller-manager -n monitoring -o yaml

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

creationTimestamp: 2019-02-27T08:16:09Z

generation: 1

labels:

k8s-app: kube-controller-manager

name: kube-controller-manager

namespace: monitoring

resourceVersion: "15981895"

selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/kube-controller-manager

uid: effe150e-3a67-11e9-a3ab-00163f007b7a

spec:

endpoints:

- interval: 30s

metricRelabelings:

- action: drop

regex: etcd_(debugging|disk|request|server).*

sourceLabels:

- __name__

port: http-metrics

jobLabel: k8s-app

namespaceSelector:

matchNames:

- kube-system

selector:

matchLabels:

k8s-app: kube-controller-manager

那么我们这里的配置就需要修改为一下配置

kind: Service

apiVersion: v1

metadata:

name: kube-controller-manager

labels:

k8s-app: kube-controller-manager

namespace: kube-system

spec:

clusterIP: None

ports:

- protocol: TCP

port: 10252

targetPort: 10252

name: http-metrics

selector:

component: kube-controller-manager

如果出现报错:

Get http://172.20.3.140:10252/metrics/: dial tcp 172.20.3.140:10252: connect: connection refused

通过查看服务本身端口发现启动方式是127.0.0.1 所以我们这边只需要修改KubeControllerManager启动方式

kubeadmn的修改方式

在宿主机的/etc/kubernetes/manifests里找到配置文件 将address修改0.0.0.0之后自动会重载配置不需要重启

prometheus-operator 监控报错_第1张图片

修复结果

prometheus-operator 监控报错_第2张图片

你可能感兴趣的:(自动化运维,linux运维)