本篇文章对Flink的指标监控进行讲解,期望能够帮助到大家更好的对Flink进行监控。
整体的流程包括3个部分,指标推送->采集指标->展示指标。我们选择使用PushGateway来推送指标,Prometheus来采集及存储指标,Grafana来配置Dashboard,让指标进行可视化。
我们在测试环搭建已经搭建好了Promethues及Grafana,详细步骤参考Apache Airflow指标监控实践,再搭建一下PushGateway就行了。
-- 下载pushgateway软件包
wget -c
https://github.com/prometheus/pushgateway/releases/download/v0.9.1/pushgateway-0.9.1.linux-amd64.tar.gz
-- 启动pushgateway
nohup ./pushgateway --web.listen-address :9091 > /var/log/pushgateway.log 2>&1 &
再在prometheus.yml加入pushgateway的配置
# scrape_configs 下面加入刚刚启动的pushgateway信息
- job_name: 'pushgateway'
static_configs:
- targets: ['cdh-datanode01:9091']
labels:
instance: pushgateway
重启prometheus,检查配置是否OK
http://prometheus_host:9090/targets
显示Status为Up则说明是OK的
# 打开 9999 socket端口
nc -l 9999
# 启动Flink自带的 SocketWindowWordCount程序
/opt/flink-1.13.0/bin/flink run -m yarn-cluster --yarnqueue root.analysis -ynm socketWindowWordCount -p 1 -yjm 1024m -ytm 3072m /opt/flink-1.13.0/examples/streaming/SocketWindowWordCount.jar -d --host cdh-datanode01 --port 9999
[0] 使用Prometheus+Grafana监控Flink on YARN作业
https://www.jianshu.com/p/886378855cea
[1] Flink Metrics
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/metrics/
[2] Flink Metric Reporters
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/metric_reporters/
[3] 腾讯云Prometheus - Flink 接入
https://cloud.tencent.com/document/product/248/50974