在应用程序中,通常会记录日志以便事后分析,在很多情况下是产生了问题之后,再去查看日志,是一种事后的静态分析。在很多时候,我们可能需要了解整个系统在当前,或者某一时刻运行的情况,比如一个系统后台服务,我们可能需要了解一些实时监控的数据例如
1、每秒钟的请求数是多少(TPS)?
2、平均每个请求处理的时间?
3、请求处理的最长耗时?
4.请求处理的响应的直方图?
5、请求处理正确响应率?
6、等待处理的请求队列长度?
7、查看整个系统的的CPU使用率、内存占用、jvm运行情况;以及系统运行出错率等等一系列的实时数据采集时,最简单的方法就是在系统的入口、出口和关键位置设置埋点,然后将采集到的信息发送到实时监控平台或者存入到缓存和DB中做进一步的分析和展示。
Metrics作为一款监控指标的度量类库,提供了许多工具帮助开发者来完成各项数据的监控。
详见官方文档:https://metrics.dropwizard.io/3.1.0/manual/core/
一.Metrice 工具类库的介绍
Metrics提供5种基本的度量类型:Meters Gauges Counters Histograms 和 Timers
1.设置maven依赖
io.dropwizard.metrics metrics-core 3.2.6 io.dropwizard.metrics metrics-healthchecks 3.2.6
2.Meters 的介绍与使用
//Meter(测量)是一种只能自增的计数器,通常用来度量一系列事件发生的概率。它提供了平均速率,以及指数平滑平均速率,以及采样后的1分钟,5分钟,15分钟的样例。 public class MetricsExample { //创建注册表 private final static MetricRegistry registry = new MetricRegistry(); //创建tps测量表 private final static Meter requestMeter = registry.meter("tps"); //创建异常测量表 private final static Meter errorMeter = registry.meter("err_request"); public static void main(String[] args) { //数据生成报告(按每分钟来统计) ConsoleReporter report = ConsoleReporter.forRegistry(registry) .convertRatesTo(TimeUnit.MINUTES) .convertDurationsTo(TimeUnit.MINUTES) .build(); report.start(10, TimeUnit.SECONDS); //每10秒将数据打印到控制台上 for(;;){ //模拟一直调用请求 getAsk(); //发送请求 randomSleep(); //间隔的发送请求 } } //处理请求方法 public static void getAsk(){ try { requestMeter.mark(); randomSleep(); int x = 10/ThreadLocalRandom.current().nextInt(6); } catch (Exception e) { System.out.println("Error"); errorMeter.mark(); } } //模拟处理请求耗时 public static void randomSleep(){ try { TimeUnit.SECONDS.sleep(ThreadLocalRandom.current().nextInt(10)); //随机休眠时间 } catch (InterruptedException e) { e.printStackTrace(); } } }
//打印结果如下
19-6-4 16:38:47 ================================================================
-- Meters ----------------------------------------------------------------------
err_request
count = 1
mean rate = 1.50 events/minute
1-minute rate = 0.75 events/minute
5-minute rate = 0.19 events/minute
15-minute rate = 0.07 events/minute
tps
count = 4
mean rate = 5.99 events/minute
1-minute rate = 8.85 events/minute
5-minute rate = 11.24 events/minute
15-minute rate = 11.74 events/minute
3.gauge的介绍与使用
3.1 gauge的使用
/** * @des gauge的使用 * @author zhao * @date 2019年6月14日上午12:08:02 * Gauge是一个最简单的计量,一般用来统计瞬时状态的数据信息 * 例:某一时刻的集合中的大小 */ public class GaugeExample { //定义度量中心 private static MetricRegistry registry = new MetricRegistry(); //定义队列 private static Queuequeue = new LinkedBlockingQueue<>(); public static void main(String[] args) throws InterruptedException { //将信息展示到控制台上 ConsoleReporter reporter = ConsoleReporter.forRegistry(registry).build(); reporter.start(3, TimeUnit.SECONDS); Gauge gauge = new Gauge () { @Override public Integer getValue() { return queue.size(); } }; //将定义过的gauge 注册到注册中心 registry.register(MetricRegistry.name(GaugeExample.class, "queue-size"), gauge); //模拟queue队列中的数据 for (int i = 0; i < 100; i++) { queue.add(i); TimeUnit.MILLISECONDS.sleep(100); } Thread.currentThread().join(); } } // 打印结果 19-6-14 0:39:17 ================================================================ -- Gauges ---------------------------------------------------------------------- com.zpb.gauge.GaugeExample.queue-size value = 31 19-6-14 0:39:20 ================================================================ -- Gauges ---------------------------------------------------------------------- com.zpb.gauge.GaugeExample.queue-size value = 60 19-6-14 0:39:23 ================================================================ -- Gauges ---------------------------------------------------------------------- com.zpb.gauge.GaugeExample.queue-size value = 90
3.2RatioGauge 的使用
作用:度量事件成功率的计算。 例:度量缓存命中率、接口调用率等等。
public class RatioGaugeExample { private static MetricRegistry registry = new MetricRegistry(); private static Meter totalMeter = registry.meter("totalCount"); private static Meter succMeter = registry.meter("succCount"); public static void main(String[] args) { ConsoleReporter reporter = ConsoleReporter.forRegistry(registry).build(); reporter.start(5, TimeUnit.SECONDS); //每5秒发送一次到控制台 registry.gauge("succ-ratio", ()-> new RatioGauge() { @Override protected Ratio getRatio() { return Ratio.of(succMeter.getCount(),totalMeter.getCount()); //第一个参数:分子 第二个参数:分母 } }); //调用 for(;;){ processHandle(); } } public static void processHandle(){ //total count totalMeter.mark(); try { int x = 10/ThreadLocalRandom.current().nextInt(10); TimeUnit.MILLISECONDS.sleep(100); //succ count succMeter.mark(); } catch (Exception e) { System.out.println("================ err"); } } }
//打印结果
19-6-17 9:28:13 ================================================================
-- Gauges ----------------------------------------------------------------------
succ-ratio
value = 0.9607843137254902
-- Meters ----------------------------------------------------------------------
succCount
count = 49
mean rate = 9.52 events/second
1-minute rate = 9.60 events/second
5-minute rate = 9.60 events/second
15-minute rate = 9.60 events/second
totalCount
count = 51
mean rate = 9.90 events/second
1-minute rate = 10.00 events/second
5-minute rate = 10.00 events/second
15-minute rate = 10.00 events/second
19-6-17 9:28:18 ================================================================
-- Gauges ----------------------------------------------------------------------
succ-ratio
value = 0.9423076923076923
-- Meters ----------------------------------------------------------------------
succCount
count = 98
mean rate = 9.71 events/second
1-minute rate = 9.63 events/second
5-minute rate = 9.61 events/second
15-minute rate = 9.60 events/second
totalCount
count = 104
mean rate = 10.31 events/second
1-minute rate = 10.06 events/second
5-minute rate = 10.01 events/second
15-minute rate = 10.00 events/second
4.Counter 的使用
作用:Counter是Gauge的一个特例,维护一个计数器,可以通过inc()和dec()方法对计数器做修改。使用步骤与Gauge基本类似,在MetricRegistry中提供了静态方法可以直接实例化一个Counter。可以用来度量生产者和消费者之间的关系
public class CounterExample { private static final Logger LOG = LoggerFactory.getLogger(CounterExample.class); //度量注册中心 private static final MetricRegistry registry = new MetricRegistry(); //度量计数器 private static final Counter counter = registry.counter(MetricRegistry.name(CounterExample.class, "")); private static final ConsoleReporter report = ConsoleReporter.forRegistry(registry) .convertRatesTo(TimeUnit.MINUTES) .convertDurationsTo(TimeUnit.MINUTES) .build(); private static Queuequeue = new LinkedList (); public static void main(String[] args) throws Exception { report.start(5, TimeUnit.SECONDS); //每5秒将数据打印到控制台上 new Thread(new Runnable() { @Override public void run() { try { production("abc"); } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); new Thread(new Runnable() { @Override public void run() { try { consume(); } catch (InterruptedException e) { e.printStackTrace(); } } }).start();; Thread.currentThread().join(); } public static void production(String s) throws InterruptedException{ for(int i = 0; i < 100;i++){ counter.inc(); queue.offer(s); } } public static void consume() throws InterruptedException{ while(queue.size() != 0){ queue.poll(); //删除第1个元素 counter.dec(); } } }
5.Histograms直方图
作用:主要使用来统计数据的分布情况, 最大值、最小值、平均值、中位数,百分比(75%、90%、95%、98%、99%和99.9%)。
例如,需要统计某个页面的请求、接口方法请求的响应时间
public class HistogramsExample { private static final MetricRegistry registry = new MetricRegistry(); private static ConsoleReporter reporter = ConsoleReporter.forRegistry(registry).build(); //实例化一个Histograms private static final Histogram histogram = registry.histogram(MetricRegistry.name(HistogramsExample.class,"histogram")); public static void main(String[] args) throws InterruptedException { reporter.start(5, TimeUnit.SECONDS); Random r = new Random(); while(true){ processHandle(r.nextDouble()); Thread.sleep(100); } } private static void processHandle(Double d){ histogram.update((int) (d*100)); //在应用中,需要统计的位置调用Histogram的update()方法。 } }
6.Timer的使用
作用:统计请求的速率和处理时间
例如:某接口的总在一定时间内的请求总数,平均处理时间
public class TimerExample { //创建度量中心 private static final MetricRegistry registry = new MetricRegistry(); //输出到控制台 private static final ConsoleReporter report = ConsoleReporter.forRegistry(registry).build(); //实例化timer private static final Timer timer = registry.timer("request"); public static void main(String[] args) { report.start(5, TimeUnit.SECONDS); while(true){ handleRequest(); } } private static void handleRequest(){ Context time = timer.time();try { Thread.sleep(500); //模拟处理请求时间 } catch (Exception e) { System.out.println("err"); }finally { time.stop(); //每次执行完都会关闭 System.out.println("==== timer 已关闭"); } } } // 打印结果 19-6-17 11:25:27 =============================================================== -- Histograms ------------------------------------------------------------------ com.zpb.histograms.HistogramsExample.histogram count = 50 #总请求数 min = 0 max = 98 mean = 53.14 #平均值 stddev = 27.04 #标准差 median = 50.00 #中间值 75% <= 78.00 95% <= 92.00 98% <= 94.00 99% <= 98.00 99.9% <= 98.00
7.HealthChecks
作用:健康检查,用于对系统应用、子模块、关联模块的运行是否正常做检测
实现过程:
类A:继承 HealthCheck ,并重写check()方法 ,在check()中调用类B中的被检测方法
类B:定义一个方法,返回结果是boolean类型。(类B也可以是其它系统中的一个类)
public class HealthChecksExample extends HealthCheck{ private DataBase database; public HealthChecksExample(DataBase database) { this.database = database; } @Override protected Result check() throws Exception { if (database.ping()) { return Result.healthy(); } return Result.unhealthy("Can't ping database."); } static class DataBase{ //模拟ping方法 public boolean ping(){ Random r = new Random(); return r.nextBoolean(); } } public static void main(String[] args) { //创建健康检查注册中心 HealthCheckRegistry registry = new HealthCheckRegistry(); //将被检查的类注册到中心
registry.register("database1",new HealthChecksExample(new DataBase())); registry.register("database2", new HealthChecksExample(new DataBase()));
//从运行的健康检查注册中心获取被检测的结果 Set> entrySet = registry.runHealthChecks().entrySet(); while(true){ for(Entry entry : entrySet){ if(entry.getValue().isHealthy()){ System.out.println(entry.getKey()+": OK"); }else{ System.err.println(entry.getKey()+"FAIL:error message: "+entry.getValue().getMessage()); final Throwable e = entry.getValue().getError(); if(e !=null){ e.printStackTrace(); } } } try { Thread.sleep(1000); } catch (Exception e) { e.printStackTrace(); } } } }
//打印结果
database1FAIL:error message: Can't ping database.
database2: OK
database1FAIL:error message: Can't ping database.
database2: OK
database1FAIL:error message: Can't ping database.
database2: OK
二.report 报告
如上例子所示,我们拿到了很多类型的数据,但我们不能展示到控制台上,因此我们需要将数据导出,做成可展示的报告,在官网上有很多种类型的report,这里只介绍在工作中经常使用到的。
将数据写到log日志中
将日志通过logback写入到日志中,具体使用配置过程详见:loback的介绍与配置-(通俗易通)
public class TimerExample { //创建度量中心 private static final MetricRegistry registry = new MetricRegistry(); //输出到日志文件中 private static final Slf4jReporter report = Slf4jReporter.forRegistry(registry) .outputTo(LoggerFactory.getLogger("com.metrics.timer")) //定义该日志写到哪个包,这个你可以随意定义,但要与logback.xml中的logger中name一致即可 .convertRatesTo(TimeUnit.SECONDS) .convertDurationsTo(TimeUnit.SECONDS) .build(); //实例化timer private static final Timer timer = registry.timer("request"); public static void main(String[] args) { report.start(5, TimeUnit.SECONDS); while(true){ handleRequest(); } } private static void handleRequest(){ Context time = timer.time(); try { Thread.sleep(500);; //模拟处理请求时间 } catch (Exception e) { System.out.println("err ="+e); }finally { time.stop(); //一定要写finally,每次执行完都会关闭 System.out.println("==== timer 已关闭"); } } }
2.Counter将数据写入到日志中
public class CounterExample { private static final Logger LOG = LoggerFactory.getLogger(CounterExample.class); //度量注册中心 private static final MetricRegistry registry = new MetricRegistry(); //度量计数器 private static final Counter counter = registry.counter(MetricRegistry.name(CounterExample.class, "")); //通过logback打印到日志文件上 private static final Slf4jReporter reporter = Slf4jReporter.forRegistry(registry) .outputTo(LoggerFactory.getLogger("com.metrics")) .convertRatesTo(TimeUnit.SECONDS) .convertDurationsTo(TimeUnit.SECONDS) .build(); private static Queuequeue = new LinkedList (); public static void main(String[] args) throws Exception { reporter.start(5, TimeUnit.SECONDS); //每5秒钟写一次日志
new Thread(new Runnable() { @Override public void run() { try { production("abc"); } catch (InterruptedException e) { e.printStackTrace(); } } }).start(); new Thread(new Runnable() { @Override public void run() { try { consume(); } catch (InterruptedException e) { e.printStackTrace(); } } }).start();; Thread.currentThread().join(); } public static void production(String s) throws InterruptedException{ for(int i = 0; i < 100;i++){ counter.inc(); queue.offer(s); System.out.println("------- 生产 ----------->"+queue.size()); } } public static void consume() throws InterruptedException{ while(queue.size() != 0){ queue.poll(); //删除第1个元素 counter.dec(); System.err.println("<------- 消费 ----------- "+queue.size()); } } }