ZipKin样本取值源码探析

ZipKin是一个链路追踪服务,可以帮助我们追踪、分析多个服务之间调用延迟情况,可到官网了解更多情况 https://zipkin.io/,本文主要通过源码来探析一下ZipKin如何进行抽样统计。

在zipkin客户端采样率是通过Sampler类来完全控制,代码如下,

package com.github.kristofa.brave;

public abstract class Sampler {

  public static final Sampler ALWAYS_SAMPLE = new Sampler() {
    @Override public boolean isSampled(long traceId) {
      return true;
    }

    @Override public String toString() {
      return "AlwaysSample";
    }
  };

  public static final Sampler NEVER_SAMPLE = new Sampler() {
    @Override public boolean isSampled(long traceId) {
      return false;
    }

    @Override public String toString() {
      return "NeverSample";
    }
  };


  public abstract boolean isSampled(long traceId);

  public static Sampler create(float rate) {
    return CountingSampler.create(rate);
  }
}

同时Sampler还具有2个字类,分别是BoundarySampler和CountingSampler,按照zipkin介绍锁说BoundarySampler是用来应对high-traffic,CountingSampler是用来应对low-traffic,下面主要来看下BoundarySampler和CountingSampler的区别。

在创建brave的时候我们需要指定样本采集率、以及采集率实现,如下,

@Bean
    public Brave brave(SpanCollector spanCollector) {
        Brave.Builder builder = new Brave.Builder(srvId);// 指定serviceName
        builder.spanCollector(spanCollector);
        builder.traceSampler(Sampler.create(1));// 采集率
        return builder.build();
    }

通过builder.traceSampler指定采集率,当然也可以设置成

builder.traceSampler(CountingSampler.create(1));// 采集率

或者

builder.traceSampler(BoundarySampler.create(1));// 采集率

CountingSampler

CountingSampler继承了Sampler并且实现了create方法以及isSampled方法,CountingSampler.create()的实现如下,

public static Sampler create(final float rate) {
    if (rate == 0) return NEVER_SAMPLE;
    if (rate == 1.0) return ALWAYS_SAMPLE;
    checkArgument(rate >= 0.01f && rate < 1, "rate should be between 0.01 and 1: was %s", rate);
    return new CountingSampler(rate);
  }

比较简单,首先判断是否在边界,然后校验,接着计算出rate。在CountingSampler方法中主要逻辑是调用randomBitSet函数,如下,

static BitSet randomBitSet(int size, int cardinality, Random rnd) {
    BitSet result = new BitSet(size);
    int[] chosen = new int[cardinality];
    int i;
    for (i = 0; i < cardinality; ++i) {
      chosen[i] = i;
      result.set(i);
    }
    for (; i < size; ++i) {
      int j = rnd.nextInt(i + 1);
      if (j < cardinality) {
        result.clear(chosen[j]);
        result.set(i);
        chosen[j] = i;
      }
    }
    return result;
  }

有关更多bitset可以自行百度,这里的返回值bitset保存了结果为true的下标,数据结果类似

{
	"3":true,
    "23":true,
	"56":true,
    "78":true,
	"89":true,
    "90":true,
}

那Sampler究竟是如何使用这个bitset结果的呢?答案就在实现的isSampled方法中,如下,

@Override
  public synchronized boolean isSampled(long traceIdIgnored) {
    boolean result = sampleDecisions.get(i++);
    if (i == 100) i = 0;
    return result;
  }

其中sampleDecisions就是一个bitset对象,在CountingSampler中也有定义,isSampled方法前面增加了一把锁,说明这里肯定是希望线安全,isSampled方法中是一个计数器,计数器从1-100,每次调用加1,然后从bitset中取出当前的数据是否为true,具体调用在ClientTracer中进行,代码如下,

SpanId newSpanId = getNewSpanId();
if (sample == null) {
    // No sample indication is present.
    if (!traceSampler().isSampled(newSpanId.traceId)) {
        spanAndEndpoint().state().setCurrentClientSpan(null);
        return null;
    }
}

BoundarySampler

BoundarySampler继承了Sampler并且实现了create方法以及isSampled方法,BoundarySampler.create()的实现如下,

  public static Sampler create(float rate) {
    if (rate == 0) return Sampler.NEVER_SAMPLE;
    if (rate == 1.0) return ALWAYS_SAMPLE;
    checkArgument(rate > 0.0001 && rate < 1, "rate should be between 0.0001 and 1: was %s", rate);
    final long boundary = (long) (rate * 10000); // safe cast as less <= 1
    return new BoundarySampler(boundary);
  }

这里面相对比CountingSampler更加简单,它没有使用bitset存放数据,而是在isSampled方法中通过取余的方式进行比较,如下,

  @Override
  public boolean isSampled(long traceId) {
    long t = Math.abs(traceId ^ SALT);
    return t % 10000 <= boundary;
  }

isSampled方法的调用也和CountingSampler是一样的。

个人总结:通过对比CountingSampler和BoundarySampler的采集率实现发现BoundarySampler虽然可以支持客户端大流量,但是采集率不是太准确,有浮动,这可能和它的自身算法有关系,在大流量情况下着点偏差可以忽略;CountingSampler虽然支持的流量不多,但是非常准确。个人推荐还是使用BoundarySampler模式,搞不好哪天流量爆增了。

完。

阅读原文

你可能感兴趣的:(zipkin)