prometheus本身使用本地的tsdb数据库来保存数据,如果prometheus服务部署在容器中,没有做持久化,将会导致数据丢失。我们可以使用prometheus的remote write功能将数据写入远程的时序性数据库,这儿我们使用influxdb2.0
在influxdb1.x中,prometheus可以直接将数据写入influxdb中,influxdb2.0的改动比较大,使用telegraf 插件来收集指标数据到influxdb中。所以这儿我们使用telegraf 来将prometheus的数据同步到influxdb中。在influxdb1.x中,使用Influxql来查询数据,在influxdb2.0中,引入了自己的脚本语言(flux)来做查询,如果在influxdb2.0中想使用influx1.0版本通过influxql来查询数据的话,需要在influxdb中配置database和retention policy,详情见官网。
influxdb2.0,使用tick(telegraf、influxdb2.0,chronnograf、kapacitor)组合来处理指标数据的收集、存储、展示和报警。
influxdb 相关概念
- org:组织,多租户
- bucket:类似influxdb1.x中databse的概念
- measurement:table,类似于表
- field:field key、field value,具体的值数据
- field set:值数据字段的集合
- tag:指标,标签字段,类似索引
- tag set:指标字段的集合(多个指标字段)
- telegraf:数据收集(类似prometheus exporter)
- user:用户
一、安装influxdb2.0
docker pull influxdb
docker run -d -p 8086:8086 influxdb
打开http://localhost:8086就可以本地访问influxdb的页面了,大概长这样。首次登录需要设置org、user、password来登录
可以在页面创建bucket及token,用于telegraf发送数据到influxdb
二、安装telegraf
docker pull telegraf
// 为了便于telegraf和influxdb之间的通信,telegraf容器直接共享influxdb的网络,influxdb-container-name 为influxdb的容器名称或ID
INFLUX_TOKEN 为页面配置的api-token值
docker run --net=container:influxdb-container-name -e INFLUX_TOKEN=iCVATJ7rNxZxnNsWf9-QRL-uiLCTKnZLb0mOuP5eAXeVKrBXuZMB7mKFVkTd7HM0oRessWZ3Q== -v /$HOME/k8s/telegraf/telegraf.conf:/etc/telegraf/telegraf.conf telegraf
telegraf配置文件,在这儿分为三部分
第一部分:output influxdb的配置,可以从influxdb页面把配置粘贴过来
第二部分:input prometheus的配置,从github上找telegraf prometheus插件的配置
https://github.com/influxdata/telegraf/tree/release-1.20/plugins/inputs/prometheus
第三部分:telegraf 开启端口,供prometheus remote write使用
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://localhost:8086"]
## API token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
organization = "rekca"
## Destination bucket to write into.
bucket = "prometheus"
## The value of this tag will be used to determine the bucket. If this
## tag is not set the 'bucket' option is used as the default.
# bucket_tag = ""
## If true, the bucket tag will not be added to the metric.
# exclude_bucket_tag = false
## Timeout for HTTP messages.
# timeout = "5s"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## HTTP User-Agent
# user_agent = "telegraf"
## Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "gzip"
## Enable or disable uint support for writing uints influxdb 2.0.
# influx_uint_support = false
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
# Read metrics from one or many prometheus clients
[[inputs.prometheus]]
## An array of urls to scrape metrics from.
urls = ["http://172.17.0.3:9090/metrics"]
## Metric version controls the mapping from Prometheus metrics into
## Telegraf metrics. When using the prometheus_client output, use the same
## value in both plugins to ensure metrics are round-tripped without
## modification.
##
## example: metric_version = 1;
## metric_version = 2; recommended version
# metric_version = 1
## Url tag name (tag containing scrapped url. optional, default is "url")
# url_tag = "url"
## An array of Kubernetes services to scrape metrics from.
# kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]
## Kubernetes config file to create client from.
# kube_config = "/path/to/kubernetes.config"
## Scrape Kubernetes pods for the following prometheus annotations:
## - prometheus.io/scrape: Enable scraping for this pod
## - prometheus.io/scheme: If the metrics endpoint is secured then you will need to
## set this to 'https' & most likely set the tls config.
## - prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
## - prometheus.io/port: If port is not 9102 use this annotation
# monitor_kubernetes_pods = true
## Get the list of pods to scrape with either the scope of
## - cluster: the kubernetes watch api (default, no need to specify)
## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
# pod_scrape_scope = "cluster"
## Only for node scrape scope: node IP of the node that telegraf is running on.
## Either this config or the environment variable NODE_IP must be set.
# node_ip = "10.180.1.1"
## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
## Default is 60 seconds.
# pod_scrape_interval = 60
## Restricts Kubernetes monitoring to a single namespace
## ex: monitor_kubernetes_pods_namespace = "default"
# monitor_kubernetes_pods_namespace = ""
# label selector to target pods which have the label
# kubernetes_label_selector = "env=dev,app=nginx"
# field selector to target pods
# eg. To scrape pods on a specific node
# kubernetes_field_selector = "spec.nodeName=$HOSTNAME"
## Scrape Services available in Consul Catalog
# [inputs.prometheus.consul]
# enabled = true
# agent = "http://localhost:8500"
# query_interval = "5m"
# [[inputs.prometheus.consul.query]]
# name = "a service name"
# tag = "a service tag"
# url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
# [inputs.prometheus.consul.query.tags]
# host = "{{.Node}}"
## Use bearer token for authorization. ('bearer_token' takes priority)
# bearer_token = "/path/to/bearer/token"
## OR
# bearer_token_string = "abc_123"
## HTTP Basic Authentication username and password. ('bearer_token' and
## 'bearer_token_string' take priority)
# username = ""
# password = ""
## Specify timeout duration for slower prometheus clients (default is 3s)
# response_timeout = "3s"
## Optional TLS Config
# tls_ca = /path/to/cafile
# tls_cert = /path/to/certfile
# tls_key = /path/to/keyfile
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
[[inputs.http_listener_v2]]
## Address and port to host HTTP listener on
service_address = ":1234"
## Path to listen to.
path = "/receive"
## Data format to consume.
data_format = "prometheusremotewrite"
三、安装prometheus
docker pull prom/prometheus
prometheus.yml 配置,remote_write 为前面telegraf第三部分开放的端口
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "mac"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
remote_write:
- url: "http://172.17.0.2:1234/receive"
四、安装grafana
docker pull grafana/grafana
docker run -d -p 3000:3000 grafana/grafana
之后就可以在grafana上配置prometheus相关的dashboard了。
如果想直接在grafana上导入influxdb。需要注意query language
在import influxdb的dashboard id时,需要区分influxdb1.x和2.x的区别(influxql和flux查询语言的区别)
最后我单独启动了一个telegraf容器,来收集了system数据
在grafana上选择id= 14126 的dashboard导入,效果如下
总结
在influxdb2.x上,influxdb发展得越来越像prometheus生态了。influxdb有telegraf类似(exporter)的各种丰富的插件。开发了自己的flux脚本语言来操作数据,与promql功能类似。个人认为使用prometheus + influxdb 远程存储数据,grafana 做数据展示,使用prometheus 生态的alertManager来做监控告警,也是一种比较好的方案。