Alertmanager是安装prometheus-operator时默认新增的自定义资源类型(CRD),我们可以直接在K8s中创建这样的资源。
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
generation: 1
labels:
app: prometheus-operator-alertmanager
chart: prometheus-operator-8.2.4
heritage: Tiller
release: prometheus-operator
name: prometheus-operator-alertmanager-test
namespace: monitoring
spec:
baseImage: quay.io/prometheus/alertmanager
version: v0.19.0
portName: web
replicas: 1
retention: 120h
routePrefix: /
serviceAccountName: prometheus-operator-alertmanager
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: alertmanager
alertmanager: prometheus-operator-alertmanager-test
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: alertmanager
alertmanager: prometheus-operator-alertmanager-test
storage:
volumeClaimTemplate:
selector: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-client
针对以上配置文件简要说明:
- 所有配置项可以从stable/prometheus-operator/templates/alertmanager/alertmanager.yaml获取,参考前文,prometheus-operator环境是使用helm安装的,可以通过命令"helm fetch stable/prometheus-operator"将所有配置下载到本地,然后参考。helm安装会默认安装一个Alertmanager服务,也是通过alertmanager.yaml安装的。
- kind类型写Alertmanager,无需多言。
metadata.name指定你这个Alertmanager名称,可以通过命令查询
kubectl get alertmanager -n monitoring
spec.baseImage/version需要指定,不然默认使用的镜像版本可能跟helm安装时使用的版本不一致,导致你需要重新下载,部署就非常慢。
spec.storage指定你新部署的Alertmanager存储,建议指定。
spec.affinity需要指定一些label,Alertmanager对象本质还是一个StatefulSet对象,后面你为Alertmanager对象创建Service时需要通过Label选择。
spec.portName指定你端口的名称,这个后面配置和Prometheus关联的时候需要。建议保持默认。
metadata.namespace指定命名空间,这个后面配置和Prometheus关联的时候需要。建议保持默认。
spec.routePrefix指定路径前缀,这个后面配置和Prometheus关联的时候需要。建议保持默认。
kubectl create -f alert-test.yaml
注意:
现在创建这个,肯定会报错,类似"MountVolume.SetUp failed for volume "config-volume" : secrets "alertmanager-XXXX-xX" not found"
原因:(参考:https://yunlzheng.gitbook.io/prometheus-book/part-iii-prometheus-shi-zhan/operator/use-operator-manage-monitor)
这是由于Prometheus Operator通过Statefulset的方式创建的Alertmanager实例,在默认情况下,会通过
alertmanager-{ALERTMANAGER_NAME}
的命名规则去查找Secret配置并以文件挂载的方式,将Secret的内容作为配置文件挂载到Alertmanager实例当中。因此,需要提前为Alertmanager创建相应的配置内容。
我们创建alertmanager.yaml,template_1.tmpl
然后用命令创建secret,secret名称格式:alertmanager-{ALERTMANAGER_NAME},例如我们前文指定的Alertmanager名称为
prometheus-operator-alertmanager-test,那么这里secret名称为alertmanager-
prometheus-operator-alertmanager-test。
kubectl create secret generic alertmanager-prometheus-operator-alertmanager-test -n monitoring --from-file=alertmanager.yaml --from-file=template_1.tmpl
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus-operator-alertmanager
chart: prometheus-operator-8.2.4
heritage: Tiller
release: prometheus-operator
name: prometheus-operator-alertmanager-test
namespace: monitoring
spec:
ports:
- name: web
port: 9093
protocol: TCP
targetPort: 9093
selector:
alertmanager: prometheus-operator-alertmanager-test
app: alertmanager
sessionAffinity: None
type: NodePort
注意:这里通过selector来选择,和创建Alertmanager配置中保持一致。
kubectl get svc -n monitoring
现在Alertmanager上应该还没有任何通知,原因是还没有将我们创建的Alertmanager和Prometheus关联。
如何关联Prometheus呢?首先查看下prometheus-operator创建的Prometheus的配置,prometheus-operator也是通过自定义资源类型(CRD)prometheus来创建prometheus server的,直接通过命令查看。
kubectl get Prometheus prometheus-operator-prometheus -n monitoring -o yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
creationTimestamp: "2019-11-28T02:42:48Z"
generation: 1
labels:
app: prometheus-operator-prometheus
chart: prometheus-operator-8.2.4
heritage: Tiller
release: prometheus-operator
name: prometheus-operator-prometheus
namespace: monitoring
resourceVersion: "6316434"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/prometheus-operator-prometheus
uid: de60d68f-6818-484d-ba30-4f381e7cb016
spec:
alerting:
alertmanagers:
- name: prometheus-operator-alertmanager
namespace: monitoring
pathPrefix: /
port: web
baseImage: quay.io/prometheus/prometheus
enableAdminAPI: false
externalUrl: http://prom.deri.com/
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
release: prometheus-operator
portName: web
replicas: 1
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: prometheus-operator
release: prometheus-operator
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-operator-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus-operator
storage:
volumeClaimTemplate:
selector: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: nfs-client
version: v2.13.1
Prometheus配置
- spec.alerting.alertmanagers就是指定Prometheus将告警发给哪些alertmanagers。
- spec.ruleSelector.matchLabels通过标签关联用户创建的自定义
PrometheusRule。
- spec.serviceMonitorSelector.matchLabels通过标签关联用户创建的自定义
ServiceMonitor
使用命令编辑已有的Prometheus服务配置
kubectl edit Prometheus prometheus-operator-prometheus -n monitoring
增加一个Alertmanager的Endpoint,其中name、namespace、pathPrefix、port和创建Alertmanager配置保持一致。
spec:
alerting:
alertmanagers:
- name: prometheus-operator-alertmanager
namespace: monitoring
pathPrefix: /
port: web
- name: prometheus-operator-alertmanager-test
namespace: monitoring
pathPrefix: /
port: web
结束!