在 spring cloud 中引入 hystrix 使我们得以在内部依赖的某个服务异常时能够快速失败和做降级处理,而不至于级联影响到本身对其他接口的服务。
但是 hystrix 的配置还是蛮的,比如线程池隔离方法的核心线程数、等待队列数,或是各服务的超时时间等,虽然 hystrix 提供了一些默认的配置如官方 wiki https://github.com/Netflix/Hystrix/wiki/Configuration 列举。
hystrix 本身提供了 /hystrix.stream 这个 url 来持续输出监控数据,我们也可以用 dashboard 来图形化这些数据,如
此图表可以查看实时的监控数据,qps、是否短路、成功调用、失败调用、线程池状态等。虽然很实时,但是我们一开始预设的 hystrix 参数总是先连蒙带猜的,而应用的压力根据业务的不同高峰时段也是各种各样的,我们总不能无时无刻盯着这个图表来看某个参数是否过剩或者是否紧张。
得有个法子将这些监控数据给落地起来
一、插件方式
翻阅 hystrix 文档,其中插件部分有个 MetricsPublisher
我们可以通过继承 HystrixMetricsPublisher,重写 getMetricsPublisherForCommand 、 getMetricsPublisherForThreadPool、 getMetricsPublisherForCollapser 三个方法; 通过方法名我们可以看出它们返回的对象分别负责针对 Hystrix 命令(调用)、线程池、请求合并三种监控数据的发布;Command、ThreadPool、Collapser 就是监控状态的核心对象了,hystrix 会定时去更新这些对象的状态。
这几个方法只会在插件注册的时候调用一次, 并将hystrix 命令、 线程池和请求合并这些相关的监控对象的引用传入
这几个返回类型结构类似,我们以 HystrixMetricsPublisherThreadPool 为例
在构造实例的时候传入了线程池名称,线程池监控对象、线程池属性对象, 我在 initialize 初始化的时候开启一个线程去定时读取这几个监控对象的状态 然后执行自定义的处理逻辑(如推入 mq 或直接存入db中,然后过一段时间对这些数据进行统计分析),代码如下
static class HystrixMetricsStoredPublisherThreadPool implements HystrixMetricsPublisherThreadPool {
private final HystrixThreadPoolKey threadPoolKey;
private final HystrixThreadPoolMetrics metrics;
private final HystrixThreadPoolProperties properties;
public HystrixMetricsStoredPublisherThreadPool(
HystrixThreadPoolKey threadPoolKey,
HystrixThreadPoolMetrics metrics,
HystrixThreadPoolProperties properties) {
// 需要把 getMetricsPublisherFor**ThreadPool** 传入的几个参数的引用保存起来
this.threadPoolKey = threadPoolKey;
this.metrics = metrics;
this.properties = properties;
}
/**
* 只会在注册的时候执行一次,所以需要我们自己开启线程定时监控 构造函数传入的这几个对象的状态
*/
@Override
public void initialize() {
System.out.println("HystrixMetricsStoredPublisherThreadPool ----------- ");
ExecutorService executorService = Executors.newFixedThreadPool(1);
executorService.execute(
() -> {
StringBuilder sb = new StringBuilder("hystrix-metrics ------------------ ");
sb.append("\r\n");
sb.append("CurrentQueueSize: " + metrics.getCurrentQueueSize());
sb.append("CurrentCorePoolSize: "+metrics.getCurrentCorePoolSize());
sb.append("CumulativeCountThreadsRejected: " + metrics.getCumulativeCountThreadsRejected());
sb.append("CurrentActiveCount: " + metrics.getCurrentActiveCount());
// 在开启的线程中,定时读取 threadPoolKey、metrics、properties 三个字段的属性值
// 写入 mq 或者 db 供后续数据统计分析
});
}
}
然后在应用启动类中注册即可
static {
HystrixPlugins.getInstance().registerMetricsPublisher(new HystrixMetricsStoredPublisher());
}
二、 自定义监控 (推荐)
接入 hystrix 后,我们实时查看监控数据的方式是在 dashboard 中填入 {server-url}/hystrix.stream
这个链接会不断输出如下格式的数据
data: {"type":"HystrixThreadPool","name":"XXClient","currentTime":1526545816036,"currentActiveCount":0,"currentCompletedTaskCount":1,"currentCorePoolSize":20,"currentLargestPoolSize":1,"currentMaximumPoolSize":20,"currentPoolSize":1,"currentQueueSize":0,"currentTaskCount":1,"rollingCountThreadsExecuted":0,"rollingMaxActiveThreads":0,"rollingCountCommandRejections":0,"propertyValue_queueSizeRejectionThreshold":5,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"reportingHosts":1}
那既然这个 url 输出的就是我想要的数据,并且这个格式可以供 dashboard 解析,那么我直接把这种格式的数据落地起来(mq 异步落地 还是直接同步db )的话,是不是就可以读取这些数据来在 dashboard 上重播?
我找到这个url 对应的 servlet 类 HystrixMetricsStreamServlet extends HystrixSampleSseServlet
核心代码如下
// HystrixSampleSseServlet 中 doGet 里调用
private void handleRequest(HttpServletRequest request, final HttpServletResponse response) throws ServletException, IOException {
final AtomicBoolean moreDataWillBeSent = new AtomicBoolean(true);
Subscription sampleSubscription = null;
/* ensure we aren't allowing more connections than we want */
int numberConnections = incrementAndGetCurrentConcurrentConnections();
try {
int maxNumberConnectionsAllowed = getMaxNumberConcurrentConnectionsAllowed(); //may change at runtime, so look this up for each request
if (numberConnections > maxNumberConnectionsAllowed) {
response.sendError(503, "MaxConcurrentConnections reached: " + maxNumberConnectionsAllowed);
} else {
/* 初始化响应,设置一些 http 响应头 */
response.setHeader("Content-Type", "text/event-stream;charset=UTF-8");
response.setHeader("Cache-Control", "no-cache, no-store, max-age=0, must-revalidate");
response.setHeader("Pragma", "no-cache");
final PrintWriter writer = response.getWriter();
//since the sample stream is based on Observable.interval, events will get published on an RxComputation thread
//since writing to the servlet response is blocking, use the Rx IO thread for the write that occurs in the onNext
// rxjava 的方式订阅
sampleSubscription = sampleStream
.observeOn(Schedulers.io())
.subscribe(new Subscriber() {
@Override
public void onCompleted() {
logger.error("HystrixSampleSseServlet: ({}) received unexpected OnCompleted from sample stream", getClass().getSimpleName());
moreDataWillBeSent.set(false);
}
@Override
public void onError(Throwable e) {
moreDataWillBeSent.set(false);
}
/** 主要输出逻辑 */
@Override
public void onNext(String sampleDataAsString) {
if (sampleDataAsString != null) {
try {
// 默认隔 500ms 输出一次监控信息
writer.print("data: " + sampleDataAsString + "\n\n");
// explicitly check for client disconnect - PrintWriter does not throw exceptions
if (writer.checkError()) {
moreDataWillBeSent.set(false);
}
writer.flush();
} catch (Exception ex) {
moreDataWillBeSent.set(false);
}
}
}
});
while (moreDataWillBeSent.get() && !isDestroyed) {
try {
Thread.sleep(pausePollerThreadDelayInMs);
//in case stream has not started emitting yet, catch any clients which connect/disconnect before emits start
writer.print("ping: \n\n");
// explicitly check for client disconnect - PrintWriter does not throw exceptions
if (writer.checkError()) {
moreDataWillBeSent.set(false);
}
writer.flush();
} catch (Exception ex) {
moreDataWillBeSent.set(false);
}
}
}
} finally {
decrementCurrentConcurrentConnections();
if (sampleSubscription != null && !sampleSubscription.isUnsubscribed()) {
sampleSubscription.unsubscribe();
}
}
}
要做的是把这个类针对 http 输出的逻辑改成数据监控的逻辑,如推到 mq 或者直接写入文本中。
这样历史监控数据保存下来了, 材料应准备好,如何分析就看自己的了。