Sentinel源码(四)(滑动窗口流量统计)

在Sentinel源码(三)slot解析中,我们讲到:
StatisticSlot是专用于实时统计的处理器插槽。
进入这个slot时,我们需要单独统计以下信息:

  1. ClusterNode:资源ID的集群节点的总统计。
  2. OriginNode:来自不同callersorigins的集群节点的统计信息。
  3. DefaultNode:特定上下文中特定资源名称的统计信息。
  4. 所有入口的总和统计。

我们知道了entry方法中下面的代码是线程数和流量统计

// Request passed, add thread count and pass count.
            node.increaseThreadNum();
            node.addPassRequest(count);

本文就介绍一下其算法:滑动窗口流量统计

increaseThreadNum()线程数统计

对于线程数的统计比较简单,通过内部维护一个LongAdder来进行当前线程数的统计,每进入一个请求加1,每释放一个请求减1,从而得到当前的线程数

private LongAdder curThreadNum = new LongAdder();

public void increaseThreadNum() {
        curThreadNum.increment();
    }

为什么使用LongAdder而不是AtomicLong呢?

LongAdder类与AtomicLong类的区别在于高并发时前者将对单一变量的CAS操作分散为对数组cells中多个元素的CAS操作,取值时进行求和;而在并发较低时仅对base变量进行CAS操作,与AtomicLong类原理相同。所以LongAdder的性能是高于AtomicLong的。

addPassRequest 流量统计

首先DefaultNode是StatisticNode的子类,DefaultNode是直接调用
StatisticNode的addPassRequest方法,同时DefaultNode的clusterNode也会进行流量统计

public void addPassRequest(int count) {
        super.addPassRequest(count);
        this.clusterNode.addPassRequest(count);
    }

我们先分析StatisticNode的addPassRequest方法

@Override
    public void addPassRequest(int count) {
        rollingCounterInSecond.addPass(count);
        rollingCounterInMinute.addPass(count);
    }

可以看到,StatisticNode中维护了两个窗口,具体如下:


   //秒级窗口,保存最近 INTERVAL 秒的统计信息。 INTERVAL 按给定的 sampleCount 划分为时间跨度。
    private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
        IntervalProperty.INTERVAL);

    //分钟级窗口,保存最近 60 秒的统计信息。 windowLengthInMs 特意设置为 1000 毫秒,意思是每桶每秒,这样我们就可以得到每一秒的准确统计。
    private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);

这两个接口都会调用ArrayMetric的addPass方法进行流量统计

ArrayMetric

看一下ArrayMetric的结构

private final LeapArray<MetricBucket> data;

    public ArrayMetric(int sampleCount, int intervalInMs) {
        this.data = new OccupiableBucketLeapArray(sampleCount, intervalInMs);
    }

    public ArrayMetric(int sampleCount, int intervalInMs, boolean enableOccupy) {
        if (enableOccupy) {
            this.data = new OccupiableBucketLeapArray(sampleCount, intervalInMs);
        } else {
            this.data = new BucketLeapArray(sampleCount, intervalInMs);
        }
    }

可以看到ArrayMetric其实也是一个包装类,内部通过实例化LeapArray的对应实现类,来实现具体的统计逻辑,LeapArray是一个抽象类,OccupiableBucketLeapArray和BucketLeapArray都是其具体的实现类

OccupiableBucketLeapArray在1.5版本之后才被引入,主要是为了解决一些高优先级的请求在限流触发的时候也能通过(通过占用未来时间窗口的名额来实现) 也是默认使用的LeapArray实现类

如何流量统计?

public void addPass(int count) {
		//获取当前窗口
        WindowWrap<MetricBucket> wrap = data.currentWindow();
        //窗口流量加count
        wrap.value().addPass(count);
    }
  1. 定位到当前窗口
  2. 获取到当前窗口WindowWrap的MetricBucket并执行addPass逻辑

这里我们先看下MetricBucket类,看看它做了哪些事情

	//存放当前窗口各种类型的统计值(类型包括 PASS BLOCK EXCEPTION 等)
	private final LongAdder[] counters;

    private volatile long minRt;

    public MetricBucket() {
        MetricEvent[] events = MetricEvent.values();
        this.counters = new LongAdder[events.length];
        for (MetricEvent event : events) {
            counters[event.ordinal()] = new LongAdder();
        }
        initMinRt();
    }
	//重置计数
    public MetricBucket reset(MetricBucket bucket) {
        for (MetricEvent event : MetricEvent.values()) {
            counters[event.ordinal()].reset();
            counters[event.ordinal()].add(bucket.get(event));
        }
        initMinRt();
        return this;
    }

    private void initMinRt() {
        this.minRt = SentinelConfig.statisticMaxRt();
    }

    /**
     * Reset the adders.
     *
     * @return new metric bucket in initial state
     */
    public MetricBucket reset() {
        for (MetricEvent event : MetricEvent.values()) {
            counters[event.ordinal()].reset();
        }
        initMinRt();
        return this;
    }

    public long get(MetricEvent event) {
        return counters[event.ordinal()].sum();
    }
	
    public MetricBucket add(MetricEvent event, long n) {
        counters[event.ordinal()].add(n);
        return this;
    }
    // 统计pass数
    public void addSuccess(int n) {
        add(MetricEvent.SUCCESS, n);
    }

MetricBucket通过定义了一个LongAdder数组来存储不同类型的流量统计值,具体的类型则都定义在MetricEvent枚举中。

public enum MetricEvent {

    /**
     * Normal pass.
     */
    PASS,
    /**
     * Normal block.
     */
    BLOCK,
    EXCEPTION,
    SUCCESS,
    RT,

    /**
     * Passed in future quota (pre-occupied, since 1.5.0).
     * 优先级高的请求允许占用未来窗口的流量
     */
    OCCUPIED_PASS
}

执行addPass方法获得对应LongAdder数组索引下标,将值递增

如果获得当前窗口?

下面再来看下data.currentWindow()的内部逻辑

public WindowWrap<T> currentWindow(long timeMillis) {
        if (timeMillis < 0) {
            return null;
        }

        int idx = calculateTimeIdx(timeMillis);
        // Calculate current bucket start time.
        long windowStart = calculateWindowStart(timeMillis);

        /*
         * Get bucket item at given time from the array.
         *
         * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
         * (2) Bucket is up-to-date, then just return the bucket.
         * (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
         */
        while (true) {
            WindowWrap<T> old = array.get(idx);
            if (old == null) {
                /*
                 *     B0       B1      B2    NULL      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            bucket is empty, so create new and update
                 *
                 * If the old bucket is absent, then we create a new bucket at {@code windowStart},
                 * then try to update circular array via a CAS operation. Only one thread can
                 * succeed to update, while other threads yield its time slice.
                 */
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {
                    // Successfully updated, return the created bucket.
                    return window;
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {
                /*
                 *     B0       B1      B2     B3      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            startTime of Bucket 3: 800, so it's up-to-date
                 *
                 * If current {@code windowStart} is equal to the start timestamp of old bucket,
                 * that means the time is within the bucket, so directly return the bucket.
                 */
                return old;
            } else if (windowStart > old.windowStart()) {
                /*
                 *   (old)
                 *             B0       B1      B2    NULL      B4
                 * |_______||_______|_______|_______|_______|_______||___
                 * ...    1200     1400    1600    1800    2000    2200  timestamp
                 *                              ^
                 *                           time=1676
                 *          startTime of Bucket 2: 400, deprecated, should be reset
                 *
                 * If the start timestamp of old bucket is behind provided time, that means
                 * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
                 * Note that the reset and clean-up operations are hard to be atomic,
                 * so we need a update lock to guarantee the correctness of bucket update.
                 *
                 * The update lock is conditional (tiny scope) and will take effect only when
                 * bucket is deprecated, so in most cases it won't lead to performance loss.
                 */
                if (updateLock.tryLock()) {
                    try {
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) {
                // Should not go through here, as the provided time is already behind.
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }
        }
    }

这个方法的注释已经写的很好了:

1.计算当前窗口下标

private int calculateTimeIdx(/*@Valid*/ long timeMillis) {
        long timeId = timeMillis / windowLengthInMs;
        // Calculate current index so we can map the timestamp to the leap array.
        return (int)(timeId % array.length());
    }


protected final AtomicReferenceArray<WindowWrap<T>> array;

方法不难,当前时间戳/窗口长度可以知道过了几个窗口了即当前是第几个窗口,对array取模即可得到该窗口在array中的下标

2.获得当前窗口开始时间

protected long calculateWindowStart(/*@Valid*/ long timeMillis) {
        return timeMillis - timeMillis % windowLengthInMs;
    }

这个方法巧妙获得当前时间戳所在窗口的开始时间

3.调整窗口
先看看LeapArray的成员变量

public LeapArray(int sampleCount, int intervalInMs) {
        AssertUtil.isTrue(sampleCount > 0, "bucket count is invalid: " + sampleCount);
        AssertUtil.isTrue(intervalInMs > 0, "total time interval of the sliding window should be positive");
        AssertUtil.isTrue(intervalInMs % sampleCount == 0, "time span needs to be evenly divided");

        this.windowLengthInMs = intervalInMs / sampleCount;
        this.intervalInMs = intervalInMs;
        this.sampleCount = sampleCount;

        this.array = new AtomicReferenceArray<>(sampleCount);
    }

其中:

  1. windowLengthInMs = intervalInMs / sampleCount;代表窗口长度
  2. intervalInMs = intervalInMs 统计周期
  3. sampleCount窗口个数
  4. array也是窗口个数

接下来是窗口调整的三种情况

(1)Bucket(窗口)不存在还未创建,则只需创建一个新的 Bucket 并 CAS 更新为循环数组。

WindowWrap<T> old = array.get(idx);
            if (old == null) {
                /*
                 *     B0       B1      B2    NULL      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            bucket is empty, so create new and update
                 *
                 * If the old bucket is absent, then we create a new bucket at {@code windowStart},
                 * then try to update circular array via a CAS operation. Only one thread can
                 * succeed to update, while other threads yield its time slice.
                 */
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {
                    // Successfully updated, return the created bucket.
                    return window;
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }

注释中的例子容易误导,把每个窗口的开始时间修改为下图:其中窗口长度位200,array的长度为5

Sentinel源码(四)(滑动窗口流量统计)_第1张图片
那么当时间戳为888时,对应的应该是idx为4的窗口,如果array.get(idx)为null,代表该窗口没有放入WindowWrap,则new一个WindowWrap并将array对应下标CAS更新为新的window

(2)如果当前windowStart旧桶的起始时间戳,则表示时间在桶内,直接返回桶。

else if (windowStart == old.windowStart()) {
                /*
                 *     B0       B1      B2     B3      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            startTime of Bucket 3: 800, so it's up-to-date
                 *
                 * If current {@code windowStart} is equal to the start timestamp of old bucket,
                 * that means the time is within the bucket, so directly return the bucket.
                 */

仍然以上图为例,当时间戳为888时,发现窗口存在,且开始时间也是800,那就直接返回该窗口就可以了
(3)如果旧存储桶的开始时间戳落后于提供的时间,则意味着该存储桶已被弃用。我们必须将存储桶重置为当前的 windowStart。
注意重置和清理操作很难是原子的,所以我们需要一个更新锁来保证桶更新的正确性。更新锁是有条件的(范围很小),只有在桶被弃用时才会生效,所以在大多数情况下不会导致性能损失。

else if (windowStart > old.windowStart()) {
                /*
                 *   (old)
                 *             B0       B1      B2    NULL      B4
                 * |_______||_______|_______|_______|_______|_______||___
                 * ...    1200     1400    1600    1800    2000    2200  timestamp
                 *                              ^
                 *                           time=1676
                 *          startTime of Bucket 2: 400, deprecated, should be reset
                 *
                 * If the start timestamp of old bucket is behind provided time, that means
                 * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
                 * Note that the reset and clean-up operations are hard to be atomic,
                 * so we need a update lock to guarantee the correctness of bucket update.
                 *
                 * The update lock is conditional (tiny scope) and will take effect only when
                 * bucket is deprecated, so in most cases it won't lead to performance loss.
                 */
                if (updateLock.tryLock()) {
                    try {
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }

仍然以笔者提供的图为主
Sentinel源码(四)(滑动窗口流量统计)_第2张图片

当时间戳来到1001,测试下标计算为0,我们上图0-200的窗口,开始时间是0,而我们计算的开始时间是1000>0,代表idx为0的窗口已经过期了,那么就更新idx为0的窗口开始时间为1000

注意:从array角度来看:新旧窗口都在array[0]的位置,是所在内存的第一个窗口,但从时间线(或者统计周期)来看,idx0为于当前统计周期的最后一个窗口,为了方便理解,窗口的一个完整统计周期可以理解为当前窗口往前数sampleCount个窗口个数,这里一定好好理解,接下来的占用未来窗口的做法就需要我们从时间线角度来思考

(4)最后一种情况则一般不会出现的,计算出的开始时间除非是服务器本身时针回拨了,否则不会比已经记录过的开始时间还小

} else if (windowStart < old.windowStart()) {
                // Should not go through here, as the provided time is already behind.
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }

至此,我们得到了当前时间所在的窗口,最后一步就是把当前窗口的
MetricEvent.PASS加一即可

public void addPass(int n) {
        add(MetricEvent.PASS, n);
    }

占用未来窗口的流量

在DefaultController中,如果存在优先级的资源申请,不能因为当前的限制,将有优先级请求直接否定,这边如何解决的?

虽然当前状况下资源已近消耗完,但是可以占用将来的一个令牌,并且node通过addOccupiedPass方法增加资源数量,然后sleep一段时间,时间到后,抛出PriortyWaite异常,该异常不会被记录异常数值中。在StatisticSlot中体现,

下面是canPass的部分代码,关于限流控制器我会单独再写一篇文章

if (prioritized && grade == RuleConstant.FLOW_GRADE_QPS) {
                long currentTime;
                long waitInMs;
                currentTime = TimeUtil.currentTimeMillis();
                waitInMs = node.tryOccupyNext(currentTime, acquireCount, count);
                if (waitInMs < OccupyTimeoutProperty.getOccupyTimeout()) {
                    node.addWaitingRequest(currentTime + waitInMs, acquireCount);
                    node.addOccupiedPass(acquireCount);
                    sleep(waitInMs);

                    // PriorityWaitException indicates that the request will pass after waiting for {@link @waitInMs}.
                    throw new PriorityWaitException(waitInMs);
                }
            }

我们看一下tryOccupyNext方法

public long tryOccupyNext(long currentTime, int acquireCount, double threshold) {
        double maxCount = threshold * IntervalProperty.INTERVAL / 1000;
        long currentBorrow = rollingCounterInSecond.waiting();
        if (currentBorrow >= maxCount) {
            return OccupyTimeoutProperty.getOccupyTimeout();
        }

        int windowLength = IntervalProperty.INTERVAL / SampleCountProperty.SAMPLE_COUNT;
        long earliestTime = currentTime - currentTime % windowLength + windowLength - IntervalProperty.INTERVAL;

        int idx = 0;
        /*
         * Note: here {@code currentPass} may be less than it really is NOW, because time difference
         * since call rollingCounterInSecond.pass(). So in high concurrency, the following code may
         * lead more tokens be borrowed.
         */
        long currentPass = rollingCounterInSecond.pass();
        while (earliestTime < currentTime) {
            long waitInMs = idx * windowLength + windowLength - currentTime % windowLength;
            if (waitInMs >= OccupyTimeoutProperty.getOccupyTimeout()) {
                break;
            }
            long windowPass = rollingCounterInSecond.getWindowPass(earliestTime);
            if (currentPass + currentBorrow + acquireCount - windowPass <= maxCount) {
                return waitInMs;
            }
            earliestTime += windowLength;
            currentPass -= windowPass;
            idx++;
        }

        return OccupyTimeoutProperty.getOccupyTimeout();
    }

currentBorrow是当前统计周期所有被借用的请求总数,会单独放在
borrowArray中,统计方式与上文array一致,位于默认实现类OccupiableBucketLeapArray中

public long currentWaiting() {
        borrowArray.currentWindow();
        long currentWaiting = 0;
        List<MetricBucket> list = borrowArray.values();

        for (MetricBucket window : list) {
            currentWaiting += window.pass();
        }
        return currentWaiting;
    }

(1)如果借用的总数已经超过限制,就返回occupyTimeout,默认500ms

OccupyTimeoutProperty.getOccupyTimeout()
private static volatile int occupyTimeout = 500;

(2)否则计算出当前统计周期最近的可用窗口开始时间
long earliestTime = currentTime - currentTime % windowLength + windowLength - IntervalProperty.INTERVAL;
可以看到earliestTime是当前时间所在窗口+一个窗口长度-统计周期
仍然以之前的图来看
Sentinel源码(四)(滑动窗口流量统计)_第3张图片
比如当前时间戳为1010,所在窗口开始时间为1000,加一个窗口时间为1200,减去周期1000为200,而 200至1200恰好为一个统计周期。上文讲到:为了方便理解,窗口的一个完整统计周期可以理解为当前窗口往前数sampleCount个窗口个数,
那么earliestTime代表当前统计周期中的第一个窗

(3)接下来获得当前统计周期已经通过的请求,先行记录下来,下文会与currentBorrow一起参与统计

public long pass() {
	        data.currentWindow();
	        long pass = 0;
	        List<MetricBucket> list = data.values();
	
	        for (MetricBucket window : list) {
	            pass += window.pass();
	        }
	        return pass;
	    }

(4)从earliestTime开始到当前窗口,判断整个周期内哪个窗口还有富裕的指标可以借用,但是计算的时间仍然不能超过OccupyTimeoutProperty.getOccupyTimeout()规定的最大占用时间

while (earliestTime < currentTime) {
            long waitInMs = idx * windowLength + windowLength - currentTime % windowLength;
            if (waitInMs >= OccupyTimeoutProperty.getOccupyTimeout()) {
                break;
            }
            long windowPass = rollingCounterInSecond.getWindowPass(earliestTime);
            if (currentPass + currentBorrow + acquireCount - windowPass <= maxCount) {
                return waitInMs;
            }
            earliestTime += windowLength;
            currentPass -= windowPass;
            idx++;
        }

waitInMs的意义就是当前时间到下一(第二,第三…)个窗口还需要多久,也就好要等着么长时间,当前请求才会被放行并记录再选中的窗口中

判断可以占用的条件是当前(周期通过请求+当前周期已经被占用的请求+本次请求-被选中的窗口已经通过的请求)

注意:周期通过请是包含被选中窗口的,所以需要减去被选中的窗口已经通过的请

至此获得了当前时间到借用窗口开始时间还需要等待的时间
只需要线程sleep(waitInMs);等待waitInMs后抛出PriorityWaitException

该异常会在StatisticSlot中的catch块被捕捉

catch (PriorityWaitException ex) {
            node.increaseThreadNum();
            if (context.getCurEntry().getOriginNode() != null) {
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
            }
            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onPass(context, resourceWrapper, node, count, args);
            }

该处内容已经在Sentinel源码(三)slot解析中的StatisticSlot部分讲过,没看过的同许可以回去看一下加深理解

你可能感兴趣的:(源码,springClound,微服务,java,sentinel,限流)