Hystrix中的HystrixRuntimeException错误

近期在项目中引入了Hystrix,用来做系统的降级,但是偶发性的会发生的HystrixRuntimeException错误,于是自己静下来研究一下这个错误。。带着问题去看源码应该快点

关于这个问题老外们早就讨论过。而官方的接口文档对于什么时候会发生错误就一句不清不楚的话

RuntimeException that is thrown when a HystrixCommand fails and does not have a fallback.

老外们也开始叽叽歪歪的讨论了:https://github.com/Netflix/Hystrix/issues/1357

总结了几句话:

This means that too many fallbacks were executing concurrently. Since the goal of Hystrix is to protect application threads, and fallbacks may run on the caller thread, Hystrix needs to limit the amount of concurrent fallbacks. Whenever this situation occurs, the fallback is not run, and an exception is returned to the caller.

In general, the purpose of a fallback is to provide substitute work that is less costly than the original work. Here are 2 options for proceeding:

Use a non-network fallback. That will be fast and no fallback concurrency limits will be hit

Wrap the fallback in its own Hystrix command. In this case, you're still getting protection when the original run() fails and the fallback path gets entered. You will still need to defined a fallback for this new command, and it should do little work to avoid problems as well.

换句话说:

fallback的并发的太多了。因为Hystrix的主要目标是保护应用线程,而且fallbacks也有可能在调用者线程中,,Hystrix需要限制并发的数量,当达到数量时无论多少fallback都不再运行了,就直接返回HystrixRuntimeException。

又有一个老外提供了暴力的解决方案,你不是限制数量吗?好好我就给你设个天大的数看你还限制不

.withFallbackIsolationSemaphoreMaxConcurrentRequests(Integer.MAX_VALUE)
.withExecutionIsolationSemaphoreMaxConcurrentRequests(Integer.MAX_VALUE)

这个时候,上面总结的老外不同意了,你这么暴力真的好吗??我给你出几个主意

1.要不你就直接自己返回一个不需要通过其他服务的stubed 对象,简单的说就是new 一个新对象填充上你的请求参数。。填上默认值

2.如果有主从关心的你可以自己分别包装HystrixCommand,实在不行再fallback简单的。。

在生产中我们团队也碰到了一个HystrixRuntimeException:异常栈如下

RuntimeException	com.netflix.hystrix.exception.HystrixRuntimeException	ERROR	SimpleMessage[message=poiInfoUtil.findPoiByThrift Exception] com.netflix.hystrix.exception.HystrixRuntimeException: CmdBatchQueryPartnerListByPoiId timed-out and fallback failed.
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:815)
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:790)
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:139)
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:71)
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:71)
at com.netflix.hystrix.AbstractCommand$DeprecatedOnFallbackHookApplication$1.onError(AbstractCommand.java:1451)
at com.netflix.hystrix.AbstractCommand$FallbackHookApplication$1.onError(AbstractCommand.java:1376)
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:71)
at rx.observers.Subscribers$5.onError(Subscribers.java:224)
at rx.Observable$ThrowObservable$1.call(Observable.java:10200)
at rx.Observable$ThrowObservable$1.call(Observable.java:10190)
at rx.Observable.unsafeSubscribe(Observable.java:8314)
at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51)
at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable$2.call(Observable.java:162)
at rx.Observable$2.call(Observable.java:154)
at rx.Observable.unsafeSubscribe(Observable.java:8314)
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:141)
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:71)
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:71)
at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$1.run(AbstractCommand.java:1121)
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:41)
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:37)
at com.meituan.service.hotel.api.command.hystrixConfig.HystrixCommandConfig$CatWrapHystrixConcurrencyStrategy$2.call(HystrixCommandConfig.java:62)
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable.run(HystrixContextRunnable.java:57)
at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$2.tick(AbstractCommand.java:1138)
at com.netflix.hystrix.util.HystrixTimer$1.run(HystrixTimer.java:99)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at com.netflix.hystrix.AbstractCommand.handleTimeoutViaFallback(AbstractCommand.java:980)
at com.netflix.hystrix.AbstractCommand.access$500(AbstractCommand.java:59)
at com.netflix.hystrix.AbstractCommand$12.call(AbstractCommand.java:595)
at com.netflix.hystrix.AbstractCommand$12.call(AbstractCommand.java:587)
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:139)
... 16 more

显示的是因为Hystrix的时间设置短了,但是就算这样的也应该是走的fallback中的逻辑而不是撒手不管。。。目前还需要进一步调研,git上的开发者也没有给清楚个所以然。。。


你可能感兴趣的:(Hystrx)