菜鸟入门-03配置告警

简介

引入了skywalking后,虽然界面可以清晰的看到链路情况,但是对于开发而言,更多的是在出现问题的时候我们才会主动去查询链路信息,而skywalking提供了告警功能可以及时让我们注意到问题。

告警主要有两块内容组成

  • 告警规则
  • 钩子

告警使用

规则

  • 告警名称,唯一,必须_rule结尾
  • 监控名称,来自官方的一些分析数据,位于skywalking/oap-server/generated-analysis/src/main/resources/official_analysis.oal ([https://github.com/apache/skywalking/blob/master/docs/en/guides/backend-oal-scripts.md] (https://github.com/apache/skywalking/blob/master/docs/en/guides/backend-oal-scripts.md)
  • 包含名称,服务,断点等,如图:
    image.png

    下面是官方等sample里面的内容

# [Optional] Default, match all services in this metrics
    include-names:
      - dubbox-provider
      - dubbox-consumer
  • Threshold,目标值。比如,时间1000ms,成功率90
  • OP,> 大于, < 小雨, = 等于
  • Period,告警检测周期
  • Count,数量
  • Silence period,沉默周期,如果告警在A时间触发,在A+sp时间内只会触发一次告警,大家应该经历过被已知告警轰炸的经历,所以这个还是很有必要的

官方还给出了默认告警规则,这里就不做过多介绍了。

We provided a default alarm-setting.yml in our distribution only for convenience, which including following rules

  1. Service average response time over 1s in last 3 minutes.
  2. Service success rate lower than 80% in last 2 minutes.
  3. Service 90% response time is over 1s in last 3 minutes
  4. Service Instance average response time over 1s in last 2 minutes.
  5. Endpoint average response time over 1s in last 2 minutes.

钩子

在上面有一篇文章介绍Webhook的内容。它主要就是我们日常告警中的一个回调功能。

Webhook requires the peer is a web container. The alarm message will send through HTTP post by application/json content type. The JSON format is based on List with following key information.

@Setter(AccessLevel.PUBLIC)
@Getter(AccessLevel.PUBLIC)
public class AlarmMessage {

    public static AlarmMessage NONE = new NoAlarm();

    private int scopeId;
    private String name;
    private int id0;
    private int id1;
    private String alarmMessage;
    private long startTime;

    private static class NoAlarm extends AlarmMessage {

    }
}

这里用到了lombok,个人觉得开源组件就不应该用lombok,也就多几行Get/Set,所见即所得还是更符合人类习惯的。lombok它是属于业务开发的蜜。

回归正题,下面是发送的代码

public class WebhookCallback implements AlarmCallback {
  
 @Override public void doAlarm(List alarmMessage) {
        if (remoteEndpoints.size() == 0) {
            return;
        }

        CloseableHttpClient httpClient = HttpClients.custom().build();
        try {
            remoteEndpoints.forEach(url -> {
                HttpPost post = new HttpPost(url);
                post.setConfig(requestConfig);
                post.setHeader("Accept", "application/json");
                post.setHeader("Content-type", "application/json");

                StringEntity entity = null;
                try {
                    entity = new StringEntity(gson.toJson(alarmMessage));
                    post.setEntity(entity);
                    CloseableHttpResponse httpResponse = httpClient.execute(post);
                    StatusLine statusLine = httpResponse.getStatusLine();
                    if (statusLine != null && statusLine.getStatusCode() != 200) {
                        logger.error("send alarm to " + url + " failure. Response code: " + statusLine.getStatusCode());
                    }
                } catch (UnsupportedEncodingException e) {
                    logger.error("Alarm to JSON error, " + e.getMessage(), e);
                } catch (ClientProtocolException e) {
                    logger.error("send alarm to " + url + " failure.", e);
                } catch (IOException e) {
                    logger.error("send alarm to " + url + " failure.", e);
                }
            });
        } finally {
            try {
                httpClient.close();
            } catch (IOException e) {
                logger.error(e.getMessage(), e);
            }
        }
    }

}

而它又是org.apache.skywalking.oap.server.core.alarm.provider.AlarmCore#start触发的,它是一个延迟线程池

Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {}, 10, 10, TimeUnit.SECONDS);

页面效果

我在dubbo服务端设置了随机sleep,然后可以看到出现了告警信息


菜鸟入门-03配置告警_第1张图片
image.png

6.x 官方告警文档

https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/backend-alarm.md

你可能感兴趣的:(菜鸟入门-03配置告警)