[问题分析]HystrixTimeout后请求没有立即中断

1. 问题

背景

SpringCloud框架,没有特殊的实现。即,请求到达Zuul网关后,由Ribbon负载均衡到目标组件节点,由Hystrix转发请求。

关键配置

    hystrix.command.default.execution.isolation.strategy=THREAD
    hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=10000

现象

一次调用中发现,请求过程超过10s后,后台已经打印HystrixTimeoutException且进入了自定义的FallbackProvider,前端仍然没有收到响应,直到请求链路处理完成,前端才返回FallbackProvider中返回的异常响应。

2. 分析

尝试一

看文档-Hystrix官方文档,THREAD隔离模式下是请求超时是会取消调用线程从而立即返回的,SEMAPHORE模式下会等待响应回来再判断是否超时。而上述配置的是所有的Route都默认是THREAD-线程隔离模式,遂认为配置没问题;

尝试二

跟踪源码,RxJava实现的,响应式编程不熟悉,初期调试时没头苍蝇一样到处打断点,看得不明所以(现在也是)。网上的资料大多是翻译上述文档,只知道是HystrixCommand.execute()发送请求,AbstractCommand.handleTimeoutViaFallback()触发FallbackProvider,中间超时如何处理没有说清;

尝试三

开启DEBUG级别日志,用日志跟踪一个请求的全部过程。发现打印的配置是executionIsolationStrategy=SEMAPHORE!!!查阅SpringCloud相关资料,发现用Hystrix+Ribbon的时候,发送请求用的是HystrixCommand的AbstractRibbonCommand实现,而后者的部分配置会覆盖掉HystrixCommandProperties中的配置,其中就有隔离模式这项配置,用的是ZuulProperties中的默认值SEMAPHORE:

AbstractRibbonCommand

    protected AbstractRibbonCommand(Setter setter, LBC client,
                                 RibbonCommandContext context,
                                 ZuulFallbackProvider fallbackProvider, IClientConfig config) {
        //将setter传到HystrixCommand的构造方法中                                 
        super(setter);
        this.client = client;
        this.context = context;
        this.zuulFallbackProvider = fallbackProvider;
        this.config = config;
    }

    //创建Setter
    protected static HystrixCommandProperties.Setter createSetter(IClientConfig config, String commandKey, ZuulProperties zuulProperties) {
        int hystrixTimeout = getHystrixTimeout(config, commandKey);
        return HystrixCommandProperties.Setter().withExecutionIsolationStrategy(
                //executionIsolationStrategy用的是ZuulProperties中的值
                zuulProperties.getRibbonIsolationStrategy()).withExecutionTimeoutInMilliseconds(hystrixTimeout);
    }

ZuulProperties

    //默认是SEMAPHORE
    private ExecutionIsolationStrategy ribbonIsolationStrategy = SEMAPHORE;

HystrixCommandProperties

    //最终setter作为参数builder传入
    protected HystrixCommandProperties(HystrixCommandKey key, HystrixCommandProperties.Setter builder, String propertyPrefix) {
        this.key = key;
        // 省略其它配置
        this.executionIsolationStrategy = getProperty(propertyPrefix, key, "execution.isolation.strategy", builder.getExecutionIsolationStrategy(), default_executionIsolationStrategy);

3. 解决方式

a. 指定commandKey的方式

hystrix.command.aService.execution.isolation.strategy=THREAD

b. 修改Zuul配置的方式,注意useSeparateThreadPools默认为false,此时所有组件共用一个commandKey=RibbinCommand的线程池

zuul.ribbonIsolationStrategy=THREAD指定ribbon的隔离模式
zuul.threadPool.useSeparateThreadPools=true每个commandKey一个线程池

4. Hystrix源码解析

此处是看了芋道源码-Hystrix源码解析之后自己整理的,可能有些难看懂(因为我也没全弄懂) - -

a. 发送请求,进入Zuul过滤器RibbonRoutingFilter,通过工厂类创建AbstractRibbonCommand,调用其execute方法

    protected ClientHttpResponse forward(RibbonCommandContext context) throws Exception {
        Map info = this.helper.debug(context.getMethod(),
                context.getUri(), context.getHeaders(), context.getParams(),
                context.getRequestEntity());
        // 创建AbstractRibbonCommand
        RibbonCommand command = this.ribbonCommandFactory.create(context);
        try {
            // 调用execute方法
            ClientHttpResponse response = command.execute();
            this.helper.appendDebug(info, response.getRawStatusCode(), response.getHeaders());
            return response;
        }
        catch (HystrixRuntimeException ex) {
            return handleException(info, ex);
        }

    }

b. 进入HystrixCommand.execute(),实际是调用Future.get()来立即获取异步方法HystrixCommand.queue()的结果

    public R execute() {
        try {
            //queue方法返回的是Future
            return queue().get();
        } catch (Exception e) {
            throw Exceptions.sneakyThrow(decomposeException(e));
        }
    }

c. 通过AbstractCommand.toObservable()创建一个待订阅的被观察对象(即Observable),创建过程:

  • 没有缓存时进入applyHystrixSemantics方法

                    Observable hystrixObservable =
                            Observable.defer(applyHystrixSemantics)
                                    .map(wrapWithAllOnNextHooks);
    
                    Observable afterCache;
    
                    // put in cache
                    if (requestCacheEnabled && cacheKey != null) {
                        // wrap it for caching
                        HystrixCachedObservable toCache = HystrixCachedObservable.from(hystrixObservable, _cmd);
                        HystrixCommandResponseFromCache fromCache = (HystrixCommandResponseFromCache) requestCache.putIfAbsent(cacheKey, toCache);
                        if (fromCache != null) {
                            // another thread beat us so we'll use the cached value instead
                            toCache.unsubscribe();
                            isResponseFromCache = true;
                            return handleRequestCacheHitAndEmitValues(fromCache, _cmd);
                        } else {
                            // we just created an ObservableCommand so we cast and return it
                            afterCache = toCache.toObservable();
                        }
                    } else {
                        //没有缓存使用applyHystrixSemantics
                        afterCache = hystrixObservable;
                    }
  • 获取到信号量后进入executeCommandAndObserve,在THREAD模式下executionSemaphore的实现是TryableSemaphoreNoOp,其tryAcquire方法始终返回true

                // 获取信号量,THREAD模式下始终为true
                if (executionSemaphore.tryAcquire()) {
                    try {
                        /* used to track userThreadExecutionTime */
                        executionResult = executionResult.setInvocationStartTime(System.currentTimeMillis());
                        return executeCommandAndObserve(_cmd)
                                .doOnError(markExceptionThrown)
                                .doOnTerminate(singleSemaphoreRelease)
                                .doOnUnsubscribe(singleSemaphoreRelease);
                    } catch (RuntimeException e) {
                        return Observable.error(e);
                    }
                } else {
                    return handleSemaphoreRejectionViaFallback();
                }
  • 对Observable进行一些装饰,触发事件、记录状态、异常处理,之后进入executeCommandWithSpecifiedIsolation

            final Func1> handleFallback = new Func1>() {
                @Override
                public Observable call(Throwable t) {
                    circuitBreaker.markNonSuccess();
                    Exception e = getExceptionFromThrowable(t);
                    executionResult = executionResult.setExecutionException(e);
                    if (e instanceof RejectedExecutionException) {
                        return handleThreadPoolRejectionViaFallback(e);
                    } else if (t instanceof HystrixTimeoutException) {
                        //超时异常在此处理
                        return handleTimeoutViaFallback();
                    } else if (t instanceof HystrixBadRequestException) {
                        return handleBadRequestByEmittingError(e);
                    } else {
                        // Treat HystrixBadRequestException from ExecutionHook like a plain HystrixBadRequestException.
                        if (e instanceof HystrixBadRequestException) {
                            eventNotifier.markEvent(HystrixEventType.BAD_REQUEST, commandKey);
                            return Observable.error(e);
                        }
    
                        return handleFailureViaFallback(e);
                    }
                }
            };
        
        // ...
        
        Observable execution;
        if (properties.executionTimeoutEnabled().get()) {
            execution = executeCommandWithSpecifiedIsolation(_cmd)
                    .lift(new HystrixObservableTimeoutOperator(_cmd));
        } else {
            execution = executeCommandWithSpecifiedIsolation(_cmd);
        }
  • executeCommandWithSpecifiedIsolation中对不同的隔离模式进行了不同的处理,主要区别是THREAD模式下对Observable调用了subscribeOn方法,切换到threadPool.getScheduler中的线程执行

            if (properties.executionIsolationStrategy().get() == ExecutionIsolationStrategy.THREAD) {
                // mark that we are executing in a thread (even if we end up being rejected we still were a THREAD execution and not SEMAPHORE)
                return Observable.defer(new Func0>() {
                ...
                }).doOnTerminate(new Action0() {
                ...
                }).doOnUnsubscribe(new Action0() {
                ...
                }).subscribeOn(threadPool.getScheduler(new Func0() {
                    @Override
                    public Boolean call() {
                        //HystrixCommand的状态是否为TIMED_OUT
                        return properties.executionIsolationThreadInterruptOnTimeout().get() && _cmd.isCommandTimedOut.get() == TimedOutStatus.TIMED_OUT;
                    }
                }));
            }
    • threadPool是AbstractCommand初始化时创建的HystrixThreadPool,默认实现是HystrixThreadPoolDefault,getScheduler方法动态更新配置并返回一个HystrixContextScheduler,参数shouldInterruptThread表示超时后是否打断请求执行线程

              @Override
              public Scheduler getScheduler(Func0 shouldInterruptThread) {
                  //动态更新配置
                  touchConfig();
                  return new HystrixContextScheduler(HystrixPlugins.getInstance().getConcurrencyStrategy(), this, shouldInterruptThread);
              }
    • (不知道哪里)调用HystrixContextScheduler.creataeWorker返回HystrixContextSchedulerWorker,继续调用HystrixContextSchedulerWorker.schedule

          public HystrixContextScheduler(HystrixConcurrencyStrategy concurrencyStrategy, HystrixThreadPool threadPool, Func0 shouldInterruptThread) {
              this.concurrencyStrategy = concurrencyStrategy;
              this.threadPool = threadPool;
              // actualScheduler是ThreadPoolScheduler
              this.actualScheduler = new ThreadPoolScheduler(threadPool, shouldInterruptThread);
          }
          
          @Override
          public Worker createWorker() {
              // 创建HystrixContextSchedulerWorker,参数是ThreadPoolWorker
              return new HystrixContextSchedulerWorker(actualScheduler.createWorker());
          }
    • 由实现类ThreadPoolWorker执行schedule方法,包装action,从AbstractCommand的threadPool中获取执行器执行之,将上述参数shouldInterruptThread作为参数构建一个FutureCompleterWithConfigurableInterrupt,作为订阅消息加入到执行任务中,超时后会将

              @Override
              public Subscription schedule(final Action0 action) {
                  if (subscription.isUnsubscribed()) {
                      // don't schedule, we are unsubscribed
                      return Subscriptions.unsubscribed();
                  }
      
                  // This is internal RxJava API but it is too useful.
                  // 包装action
                  ScheduledAction sa = new ScheduledAction(action);
      
                  // 这里不懂这个操作啥意思
                  subscription.add(sa);
                  sa.addParent(subscription);
      
                  // 获取执行器
                  ThreadPoolExecutor executor = (ThreadPoolExecutor) threadPool.getExecutor();
                  // 执行action
                  FutureTask f = (FutureTask) executor.submit(sa);
                  // 加入一个用于中断线程的subscription
                  sa.add(new FutureCompleterWithConfigurableInterrupt(f, shouldInterruptThread, executor));
      
                  return sa;
              }
    • FutureCompleterWithConfigurableInterrupt取消订阅时移除任务,中断请求

              @Override
              public void unsubscribe() {
                  // 移除上述action
                  executor.remove(f);
                  if (shouldInterruptThread.call()) {//结果为true取消future
                      f.cancel(true);
                  } else {
                      f.cancel(false);
                  }
              }
  • 继续上面的lift(new HystrixObservableTimeoutOperator(_cmd)),将_cmd(即HystrixCommand)作为构造参数传入HystrixObservableTimeoutOperator。其中lift方法在订阅时将当前的Observable经过方法参数Operator转换成新的Observable

            public HystrixObservableTimeoutOperator(final AbstractCommand originalCommand) {
                this.originalCommand = originalCommand;
            }
    • 构造一个订阅关系组CompositeSubscription,将观察者(还是不懂到底谁来订阅,不过应该是请求执行线程)放进去,方便后面超时后一起unsubscribe

              @Override
          public Subscriber call(final Subscriber child) {
              final CompositeSubscription s = new CompositeSubscription();
              // if the child unsubscribes we unsubscribe our parent as well
              child.add(s);
    • 构造一个TimerListener,交给HystrixTimer定时任务,在等待了配置的超时时间之后执行,任务内容是取消上述CompositeSubscription中所有的订阅关系,unsubscribe最终会到(还是理不清怎么调到那里的)上述FutureCompleterWithConfigurableInterrupt.unsubscribe方法,中断请求执行线程

                  // 构造定时任务,到点执行tick方法
                  TimerListener listener = new TimerListener() {
      
                  @Override
                  public void tick() {
                      // if we can go from NOT_EXECUTED to TIMED_OUT then we do the timeout codepath
                      // otherwise it means we lost a race and the run() execution completed or did not start
                      if (originalCommand.isCommandTimedOut.compareAndSet(TimedOutStatus.NOT_EXECUTED, TimedOutStatus.TIMED_OUT)) {
                          // report timeout failure
                          originalCommand.eventNotifier.markEvent(HystrixEventType.TIMEOUT, originalCommand.commandKey);
      
                          // shut down the original request
                          // 取消CompositeSubscription中的所有订阅,中断请求执行线程
                          s.unsubscribe();
      
                          final HystrixContextRunnable timeoutRunnable = new HystrixContextRunnable(originalCommand.concurrencyStrategy, hystrixRequestContext, new Runnable() {
      
                              @Override
                              public void run() {
                                  child.onError(new HystrixTimeoutException());
                              }
                          });
      
                          timeoutRunnable.run();
                          //if it did not start, then we need to mark a command start for concurrency metrics, and then issue the timeout
                      }
                  }
      
                  @Override
                  public int getIntervalTimeInMilliseconds() {
                      return originalCommand.properties.executionTimeoutInMilliseconds().get();
                  }
              };
      
              // 交给HystrixTimer执行
              final Reference tl = HystrixTimer.getInstance().addTimerListener(listener);
      
              // set externally so execute/queue can see this
              originalCommand.timeoutTimer.set(tl);
    • HystrixTimer.addTimerListener,intervalTimeInMilliseconds毫秒后执行上述tick方法

      startThreadIfNeeded();
      // add the listener
      
      Runnable r = new Runnable() {
      
          @Override
          public void run() {
              try {
                  listener.tick();
              } catch (Exception e) {
                  logger.error("Failed while ticking TimerListener", e);
              }
          }
      };
      
      ScheduledFuture f = executor.get().getThreadPool().scheduleAtFixedRate(r, listener.getIntervalTimeInMilliseconds(), listener.getIntervalTimeInMilliseconds(), TimeUnit.MILLISECONDS);
      return new TimerReference(listener, f);
    • child包装成parent并返回,parent只在消息状态为COMPELTED时往下传播,parent加入CompositeSubscription中

                  Subscriber parent = new Subscriber() {
      
                  @Override
                  public void onCompleted() {
                      if (isNotTimedOut()) {
                          // stop timer and pass notification through
                          // 
                          tl.clear();
                          child.onCompleted();
                      }
                  }
      
                  @Override
                  public void onError(Throwable e) {
                      if (isNotTimedOut()) {
                          // stop timer and pass notification through
                          tl.clear();
                          child.onError(e);
                      }
                  }
      
                  @Override
                  public void onNext(R v) {
                      if (isNotTimedOut()) {
                          child.onNext(v);
                      }
                  }
      
                  private boolean isNotTimedOut() {
                      // if already marked COMPLETED (by onNext) or succeeds in setting to COMPLETED
                      return originalCommand.isCommandTimedOut.get() == TimedOutStatus.COMPLETED ||
                              originalCommand.isCommandTimedOut.compareAndSet(TimedOutStatus.NOT_EXECUTED, TimedOutStatus.COMPLETED);
                  }
      
              };
              
          // if s is unsubscribed we want to unsubscribe the parent
          // parent加入CompositeSubscription
          s.add(parent);

你可能感兴趣的:(java,springcloud,hystrix)