prometheus+influxdb2.0+grafana打造监控系统

prometheus本身使用本地的tsdb数据库来保存数据,如果prometheus服务部署在容器中,没有做持久化,将会导致数据丢失。我们可以使用prometheus的remote write功能将数据写入远程的时序性数据库,这儿我们使用influxdb2.0

在influxdb1.x中,prometheus可以直接将数据写入influxdb中,influxdb2.0的改动比较大,使用telegraf 插件来收集指标数据到influxdb中。所以这儿我们使用telegraf 来将prometheus的数据同步到influxdb中。在influxdb1.x中,使用Influxql来查询数据,在influxdb2.0中,引入了自己的脚本语言(flux)来做查询,如果在influxdb2.0中想使用influx1.0版本通过influxql来查询数据的话,需要在influxdb中配置database和retention policy,详情见官网。
influxdb2.0,使用tick(telegraf、influxdb2.0,chronnograf、kapacitor)组合来处理指标数据的收集、存储、展示和报警。

influxdb 相关概念

  1. org:组织,多租户
  2. bucket:类似influxdb1.x中databse的概念
  3. measurement:table,类似于表
  4. field:field key、field value,具体的值数据
  5. field set:值数据字段的集合
  6. tag:指标,标签字段,类似索引
  7. tag set:指标字段的集合(多个指标字段)
  8. telegraf:数据收集(类似prometheus exporter)
  9. user:用户

一、安装influxdb2.0

docker pull influxdb
docker run -d -p 8086:8086 influxdb
打开http://localhost:8086就可以本地访问influxdb的页面了,大概长这样。首次登录需要设置org、user、password来登录

image.png

可以在页面创建bucket及token,用于telegraf发送数据到influxdb

二、安装telegraf

docker pull telegraf
// 为了便于telegraf和influxdb之间的通信,telegraf容器直接共享influxdb的网络,influxdb-container-name 为influxdb的容器名称或ID
INFLUX_TOKEN 为页面配置的api-token值

image.png

image.png

docker run --net=container:influxdb-container-name -e INFLUX_TOKEN=iCVATJ7rNxZxnNsWf9-QRL-uiLCTKnZLb0mOuP5eAXeVKrBXuZMB7mKFVkTd7HM0oRessWZ3Q== -v /$HOME/k8s/telegraf/telegraf.conf:/etc/telegraf/telegraf.conf telegraf
telegraf配置文件,在这儿分为三部分
第一部分:output influxdb的配置,可以从influxdb页面把配置粘贴过来

image.png

image.png

第二部分:input prometheus的配置,从github上找telegraf prometheus插件的配置
https://github.com/influxdata/telegraf/tree/release-1.20/plugins/inputs/prometheus
第三部分:telegraf 开启端口,供prometheus remote write使用

 [[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ##   ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
  urls = ["http://localhost:8086"]

  ## API token for authentication.
  token = "$INFLUX_TOKEN"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "rekca"

  ## Destination bucket to write into.
  bucket = "prometheus"

  ## The value of this tag will be used to determine the bucket.  If this
  ## tag is not set the 'bucket' option is used as the default.
  # bucket_tag = ""

  ## If true, the bucket tag will not be added to the metric.
  # exclude_bucket_tag = false

  ## Timeout for HTTP messages.
  # timeout = "5s"

  ## Additional HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## HTTP Proxy override, if unset values the standard proxy environment
  ## variables are consulted to determine which proxy, if any, should be used.
  # http_proxy = "http://corporate.proxy:3128"

  ## HTTP User-Agent
  # user_agent = "telegraf"

  ## Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "gzip"

  ## Enable or disable uint support for writing uints influxdb 2.0.
  # influx_uint_support = false

  ## Optional TLS Config for use on HTTP connections.
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

# Read metrics from one or many prometheus clients
[[inputs.prometheus]]
  ## An array of urls to scrape metrics from.
  urls = ["http://172.17.0.3:9090/metrics"]

  ## Metric version controls the mapping from Prometheus metrics into
  ## Telegraf metrics.  When using the prometheus_client output, use the same
  ## value in both plugins to ensure metrics are round-tripped without
  ## modification.
  ##
  ##   example: metric_version = 1;
  ##            metric_version = 2; recommended version
  # metric_version = 1

  ## Url tag name (tag containing scrapped url. optional, default is "url")
  # url_tag = "url"

  ## An array of Kubernetes services to scrape metrics from.
  # kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]

  ## Kubernetes config file to create client from.
  # kube_config = "/path/to/kubernetes.config"

  ## Scrape Kubernetes pods for the following prometheus annotations:
  ## - prometheus.io/scrape: Enable scraping for this pod
  ## - prometheus.io/scheme: If the metrics endpoint is secured then you will need to
  ##     set this to 'https' & most likely set the tls config.
  ## - prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
  ## - prometheus.io/port: If port is not 9102 use this annotation
  # monitor_kubernetes_pods = true

  ## Get the list of pods to scrape with either the scope of
  ## - cluster: the kubernetes watch api (default, no need to specify)
  ## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
  # pod_scrape_scope = "cluster"

  ## Only for node scrape scope: node IP of the node that telegraf is running on.
  ## Either this config or the environment variable NODE_IP must be set.
  # node_ip = "10.180.1.1"

  ## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
  ## Default is 60 seconds.
  # pod_scrape_interval = 60

  ## Restricts Kubernetes monitoring to a single namespace
  ##   ex: monitor_kubernetes_pods_namespace = "default"
  # monitor_kubernetes_pods_namespace = ""
  # label selector to target pods which have the label
  # kubernetes_label_selector = "env=dev,app=nginx"
  # field selector to target pods
  # eg. To scrape pods on a specific node
  # kubernetes_field_selector = "spec.nodeName=$HOSTNAME"

  ## Scrape Services available in Consul Catalog
  # [inputs.prometheus.consul]
  #   enabled = true
  #   agent = "http://localhost:8500"
  #   query_interval = "5m"

  #   [[inputs.prometheus.consul.query]]
  #     name = "a service name"
  #     tag = "a service tag"
  #     url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
  #     [inputs.prometheus.consul.query.tags]
  #       host = "{{.Node}}"

  ## Use bearer token for authorization. ('bearer_token' takes priority)
  # bearer_token = "/path/to/bearer/token"
  ## OR
  # bearer_token_string = "abc_123"

  ## HTTP Basic Authentication username and password. ('bearer_token' and
  ## 'bearer_token_string' take priority)
  # username = ""
  # password = ""

  ## Specify timeout duration for slower prometheus clients (default is 3s)
  # response_timeout = "3s"

  ## Optional TLS Config
  # tls_ca = /path/to/cafile
  # tls_cert = /path/to/certfile
  # tls_key = /path/to/keyfile

  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

[[inputs.http_listener_v2]]
 ## Address and port to host HTTP listener on
 service_address = ":1234"
 ## Path to listen to.
 path = "/receive"
 ## Data format to consume.
 data_format = "prometheusremotewrite"

三、安装prometheus

docker pull prom/prometheus
prometheus.yml 配置,remote_write 为前面telegraf第三部分开放的端口

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "mac"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
remote_write:
 - url: "http://172.17.0.2:1234/receive"

四、安装grafana

docker pull grafana/grafana
docker run -d -p 3000:3000 grafana/grafana
之后就可以在grafana上配置prometheus相关的dashboard了。

如果想直接在grafana上导入influxdb。需要注意query language


image.png

在import influxdb的dashboard id时,需要区分influxdb1.x和2.x的区别(influxql和flux查询语言的区别)
最后我单独启动了一个telegraf容器,来收集了system数据


image.png

image.png

在grafana上选择id= 14126 的dashboard导入,效果如下


image.png

总结

在influxdb2.x上,influxdb发展得越来越像prometheus生态了。influxdb有telegraf类似(exporter)的各种丰富的插件。开发了自己的flux脚本语言来操作数据,与promql功能类似。个人认为使用prometheus + influxdb 远程存储数据,grafana 做数据展示,使用prometheus 生态的alertManager来做监控告警,也是一种比较好的方案。

你可能感兴趣的:(prometheus+influxdb2.0+grafana打造监控系统)