利用grafana+influxdb+prometheus搭建监控系统

监控系统

  • 数据可视化:Grafana

  • 数据存储:InfluxDB/Prometheus

  • 数据采集:Telegraf/NodeExporter

Grafana

Grafana官方提供了很多dashboard,可以用来呈现操作系统、数据库、应用程序的运行状态。

我选择了以下几个dashboard:

  • 系统dashboard:https://grafana.com/grafana/dashboards/928

  • 数据库dashboard:https://grafana.com/grafana/dashboards/1177

  • java应用dashboard:https://grafana.com/grafana/dashboards/4701

这里选择的系统dashboard和数据库dashboard采用了InfluxDB作为数据源,InfluxDB一般通过Telegraf采集数据。

Java应用dashboard采用了Prometheus作为数据源,Prometheus一般通过NodeExporter采集数据,对于Java应用,可以借助micrometer采集数据。

参考资料:

Grafana安装:

https://grafana.com/docs/grafana/latest/installation/rpm/#install-manually-with-yum

Grafana基本操作,包括创建数据源、创建dashboard等。

https://grafana.com/tutorials/grafana-fundamentals/#1

InfluxDB

InfluxDB概念

概念 数据库 记录 数据保留多久,保留多少份 索引字段 普通字段 记录的时间戳
InfluxDB database measurement point retention policy tag field timestamp
MySQL database table row indexed column column

参考资料:

https://docs.influxdata.com/influxdb/v1.8/concepts/key_concepts/

Sample Data

  • 创建数据库
    CREATE DATABASE NOAA_water_database
  • 下载并写入数据
    curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
    influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database
  • 测试查询
   > SHOW measurements
    name: measurements
    ------------------
    name
    average_temperature
    h2o_feet
    h2o_pH
    h2o_quality
    h2o_temperature
    ​
    > SELECT COUNT("water_level") FROM h2o_feet
    name: h2o_feet
    --------------
    time                        count
    1970-01-01T00:00:00Z  15258
    ​
    > SELECT * FROM h2o_feet LIMIT 2
    name: h2o_feet
    --------------
    time                    level description       location        water_level
    2015-08-18T00:00:00Z    below 3 feet               santa_monica    2.064
    2015-08-18T00:00:00Z    between 6 and 9 feet       coyote_creek    8.12

参考资料:

https://docs.influxdata.com/influxdb/v1.8/query_language/sample-data/

Explore Schema

  • SHOW DATABASES

  • SHOW MEASUREMENTS

  • SHOW TAG KEYS

  • SHOW FIELD KEYS

参考资料:

https://docs.influxdata.com/influxdb/v1.8/query_language/explore-schema/

Explore Data

  • The SELECT statement
    SELECT [,,] FROM [,]
  • The WHERE clause
    SELECT_clause FROM_clause WHERE  [(AND|OR)  [...]]
  • The GROUP By clause
    SELECT_clause FROM_clause [WHERE_clause] GROUP BY [* | [,
  • ORDER BY time DESC

  • The LIMIT and SLIMIT clauses

参考资料:

https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/

Functions

  • 聚合(Aggregations)

  • 选择(Selectors)

  • 转换(Transformations)

参考资料:

https://docs.influxdata.com/influxdb/v1.8/query_language/functions/

Telegraf

telegraf用于采集数据,输出到influxdb中。

telegraf支持采集系统和数据库的指标数据,只需要在/etc/telegraf/telegraf.conf做简单的配置。

telegraf在写入数据时,会为每一条数据加上一个tag[host],用来区分是哪个应用上报的数据。host的值可以在telegraf.conf中配置,也可以修改linux hostname。

### OUTPUT
​
# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
 urls = ["http://localhost:8089"]
 database = "telegraf_metrics"
​
 ## Retention policy to write to. Empty string writes to the default rp.
 retention_policy = ""
 ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
 write_consistency = "any"
​
 ## Write timeout (for the InfluxDB client), formatted as a string.
 ## If not provided, will default to 5s. 0s means no timeout (not recommended).
 timeout = "5s"
​
# Read metrics about cpu usage
[[inputs.cpu]]
 ## Whether to report per-cpu stats or not
 percpu = true
 ## Whether to report total system cpu stats or not
 totalcpu = true
 ## Comment this line if you want the raw CPU time metrics
 fielddrop = ["time_*"]
​
​
# Read metrics about disk usage by mount point
[[inputs.disk]]
 ## By default, telegraf gather stats for all mountpoints.
 ## Setting mountpoints will restrict the stats to the specified mountpoints.
 # mount_points = ["/"]
​
 ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
 ## present on /run, /var/run, /dev/shm or /dev).
 ignore_fs = ["tmpfs", "devtmpfs"]
​
​
# Read metrics about disk IO by device
[[inputs.diskio]]
 ## By default, telegraf will gather stats for all devices including
 ## disk partitions.
 ## Setting devices will restrict the stats to the specified devices.
 # devices = ["sda", "sdb"]
 ## Uncomment the following line if you need disk serial numbers.
 # skip_serial_number = false
​
​
# Get kernel statistics from /proc/stat
[[inputs.kernel]]
 # no configuration
​
​
# Read metrics about memory usage
[[inputs.mem]]
 # no configuration
​
​
# Get the number of processes and group them by status
[[inputs.processes]]
 # no configuration
​
​
# Read metrics about swap memory usage
[[inputs.swap]]
 # no configuration
​
​
# Read metrics about system load & uptime
[[inputs.system]]
 # no configuration
​
# Read metrics about network interface usage
[[inputs.net]]
 # collect data only about specific interfaces
 # interfaces = ["eth0"]
​
[[inputs.netstat]]
 # no configuration

[[inputs.mysql]]
 server = ["root:root@tcp(127.0.0.1:3306)/"]

Prometheus

架构

prometheus_architecture.png

概念

概念 数据库 记录 数据保留多久,保留多少份 索引字段 普通字段 记录的时间戳
Prometheus - metric time series - - label timestamp
InfluxDB database measurement point retention policy tag field timestamp
MySQL database table row indexed column column

Prometheus和InfluxDB差异:

  • Prometheus metric的一条记录由多个label加一个value构成,metric类型包括Counter、Gauge、Histogram、Summary,InfluxDB measurement并没有区分这些类型。

  • Prometheus通过pull的方式拉取数据,InfluxDB通过push的方式推送数据。

  • Prometheus的一条记录一般只有一个value,同样是记录cpu的指标数据,InfluxDB measurement会包含3个field[usage_idle, usage_system, usage_user],1条记录[97, 2, 1],Prometheus table会包含1个label[mode],3条记录['idle', 97], ['system', 2], ['user', 1]。

参考资料:

https://prometheus.io/docs/concepts/metric_types/

查询数据

Prometheus通过网页查询数据,默认地址是http://your_host:9090。

${Prometheus_home}/prometheus.yml文件可以添加需要拉取数据的实例(instance),通过Metric Up 可以查看所有实例的工作状态。

参考资料:

https://prometheus.io/docs/prometheus/latest/querying/examples/

Micrometer

micrometer用于采集java应用的指标数据,可以适配多数主流的监控系统,比如Prometheus、InfluxDB。有点像SLF4J,适配很多日志系统,而micrometer面向的是应用的Metrics。

使用Spring为Prometheus提供指标数据:

@Controller
@RequestMapping(value = "/prometheus")
public class PrometheusController {
​
   @Getter
   private PrometheusMeterRegistry registry;
​
   @PostConstruct
   private void init() {
     PrometheusConfig config = k -> {
       return null;
     };
     this.registry = new PrometheusMeterRegistry(config);
     this.registry.config().commonTags("application", "myAppName");
     new ClassLoaderMetrics().bindTo(this.registry);
     new JvmMemoryMetrics().bindTo(this.registry);
     new JvmGcMetrics().bindTo(this.registry);
     new ProcessorMetrics().bindTo(this.registry);
     new JvmThreadMetrics().bindTo(this.registry);
   }
​
   @RequestMapping(method = { RequestMethod.Get, RequestMethod.POST})
   public void index(HttpServletRequest req, HttpServletResponse resp) {
     resp.getWriter().write(registry.scrape());
     resp.getWriter().flush();
   }
}

参考资料:

https://micrometer.io/docs

你可能感兴趣的:(利用grafana+influxdb+prometheus搭建监控系统)