主要参考KubeSpray项目对prometheus-operator的部署流程,尝试手工部署prometheus-operator。
kubeproary部署prometheus-opeartor的流程
部署流程:
- 部署promethues-operator deploy;
- 部署prometheus的其它组件, 如node-exporter、kube-state-metrics;
# cat tasks/prometheus.yml
---
- name: Kubernetes Apps | Make sure {{ prometheus_config_dir }} exists
file:
path: "{{ prometheus_config_dir }}"
state: directory
- name: Kubernetes Apps | Render templates for Prometheus-operator-deployment
template:
src: "{{ item}}.yaml.j2"
dest: "{{ prometheus_config_dir }}/{{ item }}.yaml"
with_items:
- prometheus-operator-deployment
- name: copy prometheus operators to {{ kube_config_dir }}
copy:
src: "{{ item }}.yaml"
dest: "{{ prometheus_config_dir }}/{{ item }}.yaml"
with_items:
- 0namespace-namespace
- prometheus-operator-0alertmanagerCustomResourceDefinition
- prometheus-operator-0podmonitorCustomResourceDefinition
- prometheus-operator-0prometheusCustomResourceDefinition
- prometheus-operator-0prometheusruleCustomResourceDefinition
- prometheus-operator-0servicemonitorCustomResourceDefinition
- prometheus-operator-0thanosrulerCustomResourceDefinition
- prometheus-operator-clusterRoleBinding
- prometheus-operator-clusterRole
- prometheus-operator-serviceAccount
- prometheus-operator-service
- prometheus-rules
- name: Kubernetes Apps | apply prometheus-operator
kube:
kubectl: "{{ bin_dir }}/kubectl"
filename: "{{ prometheus_config_dir }}/{{ item }}.yaml"
state: "latest"
register: result
until: result is succeeded
retries: 10
delay: 6
with_items: "{{ prometheus_operators }}"
- name: Kubernetes Apps | Render templates for Prometheus
template:
src: "{{ item}}.yaml.j2"
dest: "{{ prometheus_config_dir }}/{{ item }}.yaml"
register: prometheus_reg
with_items:
- alertmanager-alertmanager
- alertmanager-secret
- alertmanager-serviceAccount
- alertmanager-serviceMonitor
- alertmanager-service
- kube-state-metrics-clusterRoleBinding
- kube-state-metrics-clusterRole
- kube-state-metrics-deployment
- kube-state-metrics-serviceAccount
- kube-state-metrics-serviceMonitor
- kube-state-metrics-service
- node-exporter-clusterRoleBinding
- node-exporter-clusterRole
- node-exporter-daemonset
- node-exporter-serviceAccount
- node-exporter-serviceMonitor
- node-exporter-service
- prometheus-adapter-apiService
- prometheus-adapter-clusterRoleAggregatedMetricsReader
- prometheus-adapter-clusterRoleBindingDelegator
- prometheus-adapter-clusterRoleBinding
- prometheus-adapter-clusterRoleServerResources
- prometheus-adapter-clusterRole
- prometheus-adapter-configMap
- prometheus-adapter-deployment
- prometheus-adapter-roleBindingAuthReader
- prometheus-adapter-serviceAccount
- prometheus-adapter-serviceMonitor
- prometheus-adapter-service
- prometheus-clusterRoleBinding
- prometheus-clusterRole
- prometheus-kubeControllerManagerPrometheusDiscoveryService
- prometheus-kubeSchedulerPrometheusDiscoveryService
- prometheus-operator-serviceMonitor
- prometheus-prometheus
- prometheus-roleBindingConfig
- prometheus-roleBindingSpecificNamespaces
- prometheus-roleConfig
- prometheus-roleSpecificNamespaces
- prometheus-serviceAccount
- prometheus-serviceMonitorApiserver
- prometheus-serviceMonitorCoreDNS
- prometheus-serviceMonitorKubeControllerManager
- prometheus-serviceMonitorKubelet
- prometheus-serviceMonitorKubeScheduler
- prometheus-serviceMonitor
- prometheus-service
- name: Kubernetes Apps | Add policies, roles, bindings for Prometheus
kube:
kubectl: "{{ bin_dir }}/kubectl"
filename: "{{ prometheus_config_dir }}/{{ item.item }}.yaml"
state: "latest"
register: result
until: result is succeeded
retries: 10
delay: 6
with_items: "{{ prometheus_reg.results }}"
手工部署prometheus-operator
- 提前给master-node打tag
因为prometheus选择部署在master节点上
kubectl label nodes k8s-master node-role.kubernetes.io/master=
- 部署prometheus-operator deploy
kubectl create -f .
//文件列表
[root@k8s-master prometheus]# tree ./operator/
./operator/
├── 0namespace-namespace.yaml
├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
├── prometheus-operator-clusterRoleBinding.yaml
├── prometheus-operator-clusterRole.yaml
├── prometheus-operator-deployment.yaml
├── prometheus-operator-serviceAccount.yaml
├── prometheus-operator-service.yaml
└── prometheus-rules.yaml
0 directories, 13 files
- 部署prometheus其它组件
kubectl create -f .
//文件列表
[root@k8s-master prometheus]# tree ./prometheus/
./prometheus/
├── alertmanager-alertmanager.yaml
├── alertmanager-secret.yaml
├── alertmanager-serviceAccount.yaml
├── alertmanager-serviceMonitor.yaml
├── alertmanager-service.yaml
├── kube-state-metrics-clusterRoleBinding.yaml
├── kube-state-metrics-clusterRole.yaml
├── kube-state-metrics-deployment.yaml
├── kube-state-metrics-serviceAccount.yaml
├── kube-state-metrics-serviceMonitor.yaml
├── kube-state-metrics-service.yaml
├── node-exporter-clusterRoleBinding.yaml
├── node-exporter-clusterRole.yaml
├── node-exporter-daemonset.yaml
├── node-exporter-serviceAccount.yaml
├── node-exporter-serviceMonitor.yaml
├── node-exporter-service.yaml
├── prometheus-adapter-apiService.yaml
├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
├── prometheus-adapter-clusterRoleBindingDelegator.yaml
├── prometheus-adapter-clusterRoleBinding.yaml
├── prometheus-adapter-clusterRoleServerResources.yaml
├── prometheus-adapter-clusterRole.yaml
├── prometheus-adapter-configMap.yaml
├── prometheus-adapter-deployment.yaml
├── prometheus-adapter-roleBindingAuthReader.yaml
├── prometheus-adapter-serviceAccount.yaml
├── prometheus-adapter-serviceMonitor.yaml
├── prometheus-adapter-service.yaml
├── prometheus-clusterRoleBinding.yaml
├── prometheus-clusterRole.yaml
├── prometheus-kubeControllerManagerPrometheusDiscoveryService.yaml
├── prometheus-kubeSchedulerPrometheusDiscoveryService.yaml
├── prometheus-operator-serviceMonitor.yaml
├── prometheus-prometheus.yaml
├── prometheus-roleBindingConfig.yaml
├── prometheus-roleBindingSpecificNamespaces.yaml
├── prometheus-roleConfig.yaml
├── prometheus-roleSpecificNamespaces.yaml
├── prometheus-serviceAccount.yaml
├── prometheus-serviceMonitorApiserver.yaml
├── prometheus-serviceMonitorCoreDNS.yaml
├── prometheus-serviceMonitorKubeControllerManager.yaml
├── prometheus-serviceMonitorKubelet.yaml
├── prometheus-serviceMonitorKubeScheduler.yaml
├── prometheus-serviceMonitor.yaml
└── prometheus-service.yaml
0 directories, 47 files
- 问题:alertmanager集群连接失败
上述命令执行完毕后,alertmanager集群启动失败,报错找不到其它节点:
alertmanager-main-0.alertmanager-operated:9094
alertmanager-main-1.alertmanager-operated:9094
alertmanager-main-2.alertmanager-operated:9094
启动busygox,用nslookup解析一下域名:
kubectl run -i --tty --image busybox:1.28.3 dns-test --restart=Never --rm /bin/sh
# nslookup alertmanager-main-1.alertmanager-operated.monitoring
## 解析失败报错
域名解析失败,kubernetes中coredns负责域名解析,kube-proxy负责endpoint的维护;coredns的日志未发现问题,查看kube-proxy的log:
# kubectl logs kube-proxy-krzkc -n kube-system
## 这里有很多错误
Failed to list IPVS destinations, error: parseIP Error ip [...]
Failed to list IPVS destinations, error: parseIP Error ip [...]
Failed to list IPVS destinations, error: parseIP Error ip [...]
- 解决:alertmanager集群,kube-proxy版本降级
- 升级centos至8.2;
- 降低kube-proxy;
这里选择将kube-proxy降级:
# kubectl edit ds kube-proxy -n kube-system
## 修改其镜像
## 由1.18.0修改为1.17.6
image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.6
imagePullPolicy: IfNotPresent
name: kube-proxy