京东hotkey源码解析

写在前面

京东hotkey是一个经过京东大促验证的hotkey防御中间件,大概原理是通过上报key访问数到统计服务器集群,统计服务器集群将hotkey通知到客户端,让hotkey能缓存到本地内存中,做到毫秒级的Scale-Out。处理方式有点像美团cat实时收集数据进行统计,只不过美团cat没有反向通知逻辑而已。非常贴近工作实践,值得一看。


hotkey架构图

client端

首先看一下缓存入口Cache的get方法,JdHotKeyStore.getValue是获取hotkey的方法,并且会进行访问次数的统计上报,如果获取到hotkey不为空,则直接返回,否则从redis获取并调用JdHotKeyStore.smartSet判断是否有hotkey,有则设置值,最后返回。

    public String get(String key) {
        Object object = JdHotKeyStore.getValue(key);
        //如果已经缓存过了
        if (object != null) {
            return object.toString();
        } else {
            String value = getFromRedis(key);
            JdHotKeyStore.smartSet(key, value);
            return value;
        }
    }

JdHotKeyStore.getValue会先调用inRule校验此key是否有对应规则,如果没有对应规则则不处理,然后调用getValueSimple从本地内存中获取hotkey的存储对象ValueModel,如果没有获取到,则调用HotKeyPusher.push开始计数;如果获取到,会调用isNearExpire判断是否快过期了,如果是也计数,然后取出ValueModel里的value是否有设置对应值,有才返回。最后调用KeyHandlerFactory.getCounter().collect进行对应规则的计数。下面来一步步分析此流程。

    public static Object getValue(String key) {
        return getValue(key, null);
    }

 /**
     * 获取value,如果value不存在则发往netty
     */
    public static Object getValue(String key, KeyType keyType) {
        try {
            //如果没有为该key配置规则,就不用上报key
            if (!inRule(key)) {
                return null;
            }
            Object userValue = null;

            ValueModel value = getValueSimple(key);

            if (value == null) {
                HotKeyPusher.push(key, keyType);
            } else {
                //临近过期了,也发
                if (isNearExpire(value)) {
                    HotKeyPusher.push(key, keyType);
                }
                Object object = value.getValue();
                //如果是默认值,也返回null
                if (object instanceof Integer && Constant.MAGIC_NUMBER == (int) object) {
                    userValue = null;
                } else {
                    userValue = object;
                }
            }

            //统计计数
            KeyHandlerFactory.getCounter().collect(new KeyHotModel(key, value != null));

            return userValue;
        } catch (Exception e) {
            return null;
        }

    }

inRule会去KeyRule缓存中获取对应的规则,经过层层调用会到KeyRuleHolder的findByKey方法,然后继续调用其findRule方法选择对应的KeyRule,如果没有KeyRule就直接返回了,否则会拿到它的duration(hotkey缓存时间),拿到对应duration的本地缓存。实际上这里为了方法的通用性,用了get来代替contain的判断。

  /**
     * 判断这个key是否在被探测的规则范围内
     */
    private static boolean inRule(String key) {
        return CacheFactory.getCache(key) != null;
    }

    public static LocalCache getCache(String key) {
        return KeyRuleHolder.findByKey(key);
    }

    public static LocalCache findByKey(String key) {
        if (StrUtil.isEmpty(key)) {
            return null;
        }
        KeyRule keyRule = findRule(key);
        if (keyRule == null) {
            return null;
        }
        return RULE_CACHE_MAP.get(keyRule.getDuration());
    }

findRule的逻辑比较特别,作者已经留下了注释,优先全匹配->prefix匹配-> * 通配,这样做是为了更精确选择对应的规则。比如配置了sku_的前缀规则,但是茅台sku的流量突升,需要针对茅台sku的本地缓存再长一点时间让系统平稳渡过高峰期,那就配置一个sku_moutai_sku_id的全匹配规则,这样不会干扰到其他sku的缓存规则。

 //遍历该app的所有rule,找到与key匹配的rule。优先全匹配->prefix匹配-> * 通配
    private static KeyRule findRule(String key) {
        KeyRule prefix = null;
        KeyRule common = null;
        for (KeyRule keyRule : KEY_RULES) {
            if (key.equals(keyRule.getKey())) {
                return keyRule;
            }
            if ((keyRule.isPrefix() && key.startsWith(keyRule.getKey()))) {
                prefix = keyRule;
            }
            if ("*".equals(keyRule.getKey())) {
                common = keyRule;
            }
        }
        if (prefix != null) {
            return prefix;
        }
        return common;
    }

那么KEY_RULES的规则是怎么来的呢?这就要说到etcd了,其实可以把etcd当做zookeeper,也有对配置crud,然后通知客户端的功能。这里是做了定时拉取+监听变化的双重保证,这里跟携程apollo的处理非常像:不要把鸡蛋放在一个篮子,兜底功能真的很重要。每5秒定时从etcd拉取规则,开启监听器有变化就去etcd拉取规则。fetchRuleFromEtcd从ectd的rule_path获取rules,然后转化成ruleList继续调用notifyRuleChange进行本地处理。

 private void fetchRule() {
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
        //开启拉取etcd的worker信息,如果拉取失败,则定时继续拉取
        scheduledExecutorService.scheduleAtFixedRate(() -> {
            JdLogger.info(getClass(), "trying to connect to etcd and fetch rule info");
            boolean success = fetchRuleFromEtcd();
            if (success) {
                //拉取已存在的热key
                fetchExistHotKey();

                scheduledExecutorService.shutdown();
            }

        }, 0, 5, TimeUnit.SECONDS);
    }

    private boolean fetchRuleFromEtcd() {
        IConfigCenter configCenter = EtcdConfigFactory.configCenter();
        try {
            List ruleList = new ArrayList<>();
            //从etcd获取自己的rule
            String rules = configCenter.get(ConfigConstant.rulePath + Context.APP_NAME);
            if (StringUtil.isNullOrEmpty(rules)) {
                JdLogger.warn(getClass(), "rule is empty");
                //会清空本地缓存队列
                notifyRuleChange(ruleList);
                return true;
            }
            ruleList = FastJsonUtils.toList(rules, KeyRule.class);

            notifyRuleChange(ruleList);
            return true;
        } catch (StatusRuntimeException ex) {
            //etcd连不上
            JdLogger.error(getClass(), "etcd connected fail. Check the etcd address!!!");
            return false;
        } catch (Exception e) {
            JdLogger.error(getClass(), "fetch rule failure, please check the rule info in etcd");
            return true;
        }

    }

    /**
     * 异步监听rule规则变化
     */
    private void startWatchRule() {
        ExecutorService executorService = Executors.newSingleThreadExecutor();
        executorService.submit(() -> {
            JdLogger.info(getClass(), "--- begin watch rule change ----");
            try {
                IConfigCenter configCenter = EtcdConfigFactory.configCenter();
                KvClient.WatchIterator watchIterator = configCenter.watch(ConfigConstant.rulePath + Context.APP_NAME);
                //如果有新事件,即rule的变更,就重新拉取所有的信息
                while (watchIterator.hasNext()) {
                    //这句必须写,next会让他卡住,除非真的有新rule变更
                    WatchUpdate watchUpdate = watchIterator.next();
                    List eventList = watchUpdate.getEvents();
                    JdLogger.info(getClass(), "rules info changed. begin to fetch new infos. rule change is " + eventList);

                    //全量拉取rule信息
                    fetchRuleFromEtcd();
                }
            } catch (Exception e) {
                JdLogger.error(getClass(), "watch err");
            }


        });
    }

notifyRuleChange会往EventBus发送KeyRuleInfoChangeEvent的通知,进而进入KeyRuleHolder的putRules方法,这里可以看到维护了KEY_RULES和RULE_CACHE_MAP。

    private void notifyRuleChange(List rules) {
        EventBusCenter.getInstance().post(new KeyRuleInfoChangeEvent(rules));
    }

    @Subscribe
    public void ruleChange(KeyRuleInfoChangeEvent event) {
        JdLogger.info(getClass(), "new rules info is :" + event.getKeyRules());
        List ruleList = event.getKeyRules();
        if (ruleList == null) {
            return;
        }

        putRules(ruleList);
    }

  /**
     * 所有的规则,如果规则的超时时间变化了,会重建caffeine
     */
    public static void putRules(List keyRules) {
        synchronized (KEY_RULES) {
            //如果规则为空,清空规则表
            if (CollectionUtil.isEmpty(keyRules)) {
                KEY_RULES.clear();
                RULE_CACHE_MAP.clear();
                return;
            }

            KEY_RULES.clear();
            KEY_RULES.addAll(keyRules);

            Set durationSet = keyRules.stream().map(KeyRule::getDuration).collect(Collectors.toSet());
            for (Integer duration : RULE_CACHE_MAP.keySet()) {
                //先清除掉那些在RULE_CACHE_MAP里存的,但是rule里已没有的
                if (!durationSet.contains(duration)) {
                    RULE_CACHE_MAP.remove(duration);
                }
            }

            //遍历所有的规则
            for (KeyRule keyRule : keyRules) {
                int duration = keyRule.getDuration();
                if (RULE_CACHE_MAP.get(duration) == null) {
                    LocalCache cache = CacheFactory.build(duration);
                    RULE_CACHE_MAP.put(duration, cache);
                }
            }
        }
    }

回到原有流程,getValueSimple方法的链路比较长,主要是通过key的规则,获取到对应的duration,然后从对应duration的本地缓存中获取ValueModel。

    /**
     * 仅获取value,如果不存在也不上报热key
     */
    static ValueModel getValueSimple(String key) {
        Object object = getCache(key).get(key);
        if (object == null) {
            return null;
        }
        return (ValueModel) object;
    }

    private static LocalCache getCache(String key) {
        return CacheFactory.getNonNullCache(key);
    }

    public static LocalCache getNonNullCache(String key) {
        LocalCache localCache = getCache(key);
        if (localCache == null) {
            return DEFAULT_CACHE;
        }
        return localCache;
    }

    public static LocalCache getCache(String key) {
        return KeyRuleHolder.findByKey(key);
    }

    public static LocalCache findByKey(String key) {
        if (StrUtil.isEmpty(key)) {
            return null;
        }
        KeyRule keyRule = findRule(key);
        if (keyRule == null) {
            return null;
        }
        return RULE_CACHE_MAP.get(keyRule.getDuration());
    }

接下来是HotKeyPusher.push,如果是remove则在etcd创建一个节点然后再删除,达到集群删除的效果。如果是探测并且key在规则内,则调用KeyHandlerFactory.getCollector().collect进行统计。

   public static void push(String key, KeyType keyType) {
        push(key, keyType, 1, false);
    }

public static void push(String key, KeyType keyType, int count, boolean remove) {
        if (count <= 0) {
            count = 1;
        }
        if (keyType == null) {
            keyType = KeyType.REDIS_KEY;
        }
        if (key == null) {
            return;
        }
        HotKeyModel hotKeyModel = new HotKeyModel();
        hotKeyModel.setAppName(Context.APP_NAME);
        hotKeyModel.setKeyType(keyType);
        hotKeyModel.setCount(count);
        hotKeyModel.setRemove(remove);
        hotKeyModel.setKey(key);


        if (remove) {
            //如果是删除key,就直接发到etcd去,不用做聚合。但是有点问题现在,这个删除只能删手工添加的key,不能删worker探测出来的
            //因为各个client都在监听手工添加的那个path,没监听自动探测的path。所以如果手工的那个path下,没有该key,那么是删除不了的。
            //删不了,就达不到集群监听删除事件的效果,怎么办呢?可以通过新增的方式,新增一个热key,然后删除它
            EtcdConfigFactory.configCenter().putAndGrant(HotKeyPathTool.keyPath(hotKeyModel), Constant.DEFAULT_DELETE_VALUE, 1);
            EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyPath(hotKeyModel));
            //也删worker探测的目录
            EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyRecordPath(hotKeyModel));
        } else {
            //如果key是规则内的要被探测的key,就积累等待传送
            if (KeyRuleHolder.isKeyInRule(key)) {
                //积攒起来,等待每半秒发送一次
                KeyHandlerFactory.getCollector().collect(hotKeyModel);
            }
        }
    }

KeyHandlerFactory.getCollector().collect方法交替使用两个map,对count进行累加,这样清理map的时候就不需要停顿了,交替使用是避免停顿的有效方式。

    @Override
    public void collect(HotKeyModel hotKeyModel) {
        String key = hotKeyModel.getKey();
        if (StrUtil.isEmpty(key)) {
            return;
        }
        if (atomicLong.get() % 2 == 0) {
            //不存在时返回null并将key-value放入,已有相同key时,返回该key对应的value,并且不覆盖
            HotKeyModel model = map0.putIfAbsent(key, hotKeyModel);
            if (model != null) {
                model.setCount(model.getCount() + hotKeyModel.getCount());
            }
        } else {
            HotKeyModel model = map1.putIfAbsent(key, hotKeyModel);
            if (model != null) {
                model.setCount(model.getCount() + hotKeyModel.getCount());
            }
        }
    }

接回上文,还有一个 KeyHandlerFactory.getCounter().collect收集的是规则的访问次数,也是取到对应的规则,然后对规则的访问总数、热次数进行累加。

 @Override
    public void collect(KeyHotModel keyHotModel) {
        if (atomicLong.get() % 2 == 0) {
            put(keyHotModel.getKey(), keyHotModel.isHot(), HIT_MAP_0);
        } else {
            put(keyHotModel.getKey(), keyHotModel.isHot(), HIT_MAP_1);
        }
    }

    public void put(String key, boolean isHot, ConcurrentHashMap map) {
        //如key是pin_的前缀,则存储pin_
        String rule = KeyRuleHolder.rule(key);
        //不在规则内的不处理
        if (StrUtil.isEmpty(rule)) {
            return;
        }
        String nowTime = nowTime();

        //rule + 分隔符 + 2020-10-23 21:11:22
        String mapKey = rule + Constant.COUNT_DELIMITER + nowTime;
        //该方法线程安全
        HitCount hitCount = map.computeIfAbsent(mapKey, v -> new HitCount());
        if (isHot) {
            hitCount.hotHitCount.incrementAndGet();
        }
        hitCount.totalHitCount.incrementAndGet();
    }

两个指标的收集已经分析完毕,那怎么发送到worker呢?来到PushSchedulerStarter,这里会启动对两个指标的定时线程池,分别会定时调用NettyKeyPusher的send和sendCount方法。

    /**
     * 每0.5秒推送一次待测key
     */
    public static void startPusher(Long period) {
        if (period == null || period <= 0) {
            period = 500L;
        }
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
        scheduledExecutorService.scheduleAtFixedRate(() -> {
            IKeyCollector collectHK = KeyHandlerFactory.getCollector();
            KeyHandlerFactory.getPusher().send(Context.APP_NAME, collectHK.lockAndGetResult());
            collectHK.finishOnce();
        },0, period, TimeUnit.MILLISECONDS);
    }

    /**
     * 每10秒推送一次数量统计
     */
    public static void startCountPusher(Integer period) {
        if (period == null || period <= 0) {
            period = 10;
        }
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
        scheduledExecutorService.scheduleAtFixedRate(() -> {
            IKeyCollector collectHK = KeyHandlerFactory.getCounter();
            KeyHandlerFactory.getPusher().sendCount(Context.APP_NAME, collectHK.lockAndGetResult());
            collectHK.finishOnce();
        },0, period, TimeUnit.SECONDS);
    }

NettyKeyPusher的send和sendCount方法都是为统计数据选择对应的worker然后进行请求,chooseChannel就是根据key哈希到其中一个worker上,然后发送请求即可。

  @Override
    public void send(String appName, List list) {
        //积攒了半秒的key集合,按照hash分发到不同的worker
        long now = System.currentTimeMillis();

        Map> map = new HashMap<>();
        for(HotKeyModel model : list) {
            model.setCreateTime(now);
            Channel channel = WorkerInfoHolder.chooseChannel(model.getKey());
            if (channel == null) {
                continue;
            }

            List newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
            newList.add(model);
        }

        for (Channel channel : map.keySet()) {
            try {
                List batch = map.get(channel);
                channel.writeAndFlush(MsgBuilder.buildByteBuf(new HotKeyMsg(MessageType.REQUEST_NEW_KEY, FastJsonUtils.convertObjectToJSON(batch)))).sync();
            } catch (Exception e) {
                try {
                    InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
                    JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
                } catch (Exception ex) {
                    JdLogger.error(getClass(),"flush error");
                }
            }
        }
    }

    @Override
    public void sendCount(String appName, List list) {
        //积攒了10秒的数量,按照hash分发到不同的worker
        long now = System.currentTimeMillis();
        Map> map = new HashMap<>();
        for(KeyCountModel model : list) {
            model.setCreateTime(now);
            Channel channel = WorkerInfoHolder.chooseChannel(model.getRuleKey());
            if (channel == null) {
                continue;
            }

            List newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
            newList.add(model);
        }

        for (Channel channel : map.keySet()) {
            try {
                List batch = map.get(channel);
                channel.writeAndFlush(MsgBuilder.buildByteBuf(new HotKeyMsg(Context.APP_NAME,
                        MessageType.REQUEST_HIT_COUNT, FastJsonUtils.convertObjectToJSON(batch)))).sync();
            } catch (Exception e) {
                try {
                    InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
                    JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
                } catch (Exception ex) {
                    JdLogger.error(getClass(),"flush error");
                }
            }
        }
    }

    public static Channel chooseChannel(String key) {
        if (StrUtil.isEmpty(key) || WORKER_HOLDER.size() == 0) {
            return null;
        }
        int index = Math.abs(key.hashCode() % WORKER_HOLDER.size());

        return WORKER_HOLDER.get(index).channel;
    }

最后当worker统计到hotkey时,client需要接收worker推送过来的hotkey进行存储,可以看到NettyClientHandler会向EventBus发送ReceiveNewKeyEvent事件,ReceiveNewKeyListener收到此事件后将调用receiveNewKeyListener.newKey,将hotkey放到本地缓存,client端的处理流程就结束了。

  @Override
    protected void channelRead0(ChannelHandlerContext channelHandlerContext, String message) {
        HotKeyMsg msg = FastJsonUtils.toBean(message, HotKeyMsg.class);
        if (MessageType.PONG == msg.getMessageType()) {
            JdLogger.info(getClass(), "heart beat");
            return;
        }
        if (MessageType.RESPONSE_NEW_KEY == msg.getMessageType()) {
            JdLogger.info(getClass(), "receive new key : " + msg);
            HotKeyModel model = FastJsonUtils.toBean(msg.getBody(), HotKeyModel.class);
            EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
        }
    }

    @Subscribe
    public void newKeyComing(ReceiveNewKeyEvent event) {
        HotKeyModel hotKeyModel = event.getModel();
        if (hotKeyModel == null) {
            return;
        }
        //收到新key推送
        if (receiveNewKeyListener != null) {
            receiveNewKeyListener.newKey(hotKeyModel);
        }
    }

  @Override
    public void newKey(HotKeyModel hotKeyModel) {
        long now = System.currentTimeMillis();
        //如果key到达时已经过去1秒了,记录一下。手工删除key时,没有CreateTime
        if (hotKeyModel.getCreateTime() != 0 && Math.abs(now - hotKeyModel.getCreateTime()) > 1000) {
            JdLogger.warn(getClass(), "the key comes too late : " + hotKeyModel.getKey() + " now " +
                    +now + " keyCreateAt " + hotKeyModel.getCreateTime());
        }
        if (hotKeyModel.isRemove()) {
            //如果是删除事件,就直接删除
            deleteKey(hotKeyModel.getKey());
            return;
        }
        //已经是热key了,又推过来同样的热key,做个日志记录,并刷新一下
        if (JdHotKeyStore.isHot(hotKeyModel.getKey())) {
            JdLogger.warn(getClass(), "receive repeat hot key :" + hotKeyModel.getKey() + " at " + now);
        }
        addKey(hotKeyModel.getKey());
    }

    private void addKey(String key) {
        ValueModel valueModel = ValueModel.defaultValue(key);
        if (valueModel == null) {
            //不符合任何规则
            deleteKey(key);
            return;
        }
        //如果原来该key已经存在了,那么value就被重置,过期时间也会被重置。如果原来不存在,就新增的热key
        JdHotKeyStore.setValueDirectly(key, valueModel);
    }


    private void deleteKey(String key) {
        CacheFactory.getNonNullCache(key).delete(key);
    }

  static void setValueDirectly(String key, Object value) {
        getCache(key).set(key, value);
    }

worker端

由上文可知,client与worker的交互只有推送统计数据到worker,worker接收处理,最后推送hotkey到client。因此worker端只需要分析两个部分:统计数据汇总、推送hotkey。
首先看到HotKey的处理逻辑是在HotKeyFilter中,首先会对totalReceiveKeyCount进行累加,然后调用publishMsg,如果统计信息超时1秒或者在白名单中就不处理,否则继续调用keyProducer.push。

 @Override
    public boolean chain(HotKeyMsg message, ChannelHandlerContext ctx) {
        if (MessageType.REQUEST_NEW_KEY == message.getMessageType()) {
            totalReceiveKeyCount.incrementAndGet();
            publishMsg(message.getBody(), ctx);
            return false;
        }

        return true;
    }

    private void publishMsg(String message, ChannelHandlerContext ctx) {
        //老版的用的单个HotKeyModel,新版用的数组
        List models = FastJsonUtils.toList(message, HotKeyModel.class);
        long now = SystemClock.now();
        for (HotKeyModel model : models) {
            //白名单key不处理
            if (WhiteListHolder.contains(model.getKey())) {
                continue;
            }
            long timeOut = now - model.getCreateTime();
            if (timeOut > 1000) {
                logger.info("key timeout " + timeOut + ", from ip : " + NettyIpUtil.clientIp(ctx));
            }
            keyProducer.push(model, now);
        }
    }

keyProducer.push将未过时的统计信息丢进queue中。

    public void push(HotKeyModel model, long now) {
        if (model == null || model.getKey() == null) {
            return;
        }
        //5秒前的过时消息就不处理了
        if (now - model.getCreateTime() > InitConstant.timeOut) {
            expireTotalCount.increment();
            return;
        }

        try {
            QUEUE.put(model);
            totalOfferCount.increment();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

worker端会开启指定数量的KeyConsumer,不断消费queue中的统计数据。根据统计数据的类型调用KeyListener的removeKey和newKey。

    @Bean
    public Consumer consumer() {
        int nowCount = CpuNum.workerCount();
        //将实际值赋给static变量
        if (threadCount != 0) {
            nowCount = threadCount;
        } else {
            if (nowCount >= 8) {
                nowCount = nowCount / 2;
            }
        }

        List consumerList = new ArrayList<>();
        for (int i = 0; i < nowCount; i++) {
            KeyConsumer keyConsumer = new KeyConsumer();
            keyConsumer.setKeyListener(iKeyListener);
            consumerList.add(keyConsumer);

            threadPoolExecutor.submit(keyConsumer::beginConsume);
        }
        return new Consumer(consumerList);
    }
}

public void beginConsume() {
        while (true) {
            try {
                HotKeyModel model = QUEUE.take();
                if (model.isRemove()) {
                    iKeyListener.removeKey(model, KeyEventOriginal.CLIENT);
                } else {
                    iKeyListener.newKey(model, KeyEventOriginal.CLIENT);
                }

                //处理完毕,将数量加1
                totalDealCount.increment();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }

KeyListener的removeKey和newKey方法对Cache中的滑动窗口SlidingWindow进行删除或者累加,删除或者达到一定访问数就会推送到根据appname选出所有client进行推送。

 @Override
    public void newKey(HotKeyModel hotKeyModel, KeyEventOriginal original) {
        //cache里的key
        String key = buildKey(hotKeyModel);
        //判断是不是刚热不久
        Object o = hotCache.getIfPresent(key);
        if (o != null) {
            return;
        }

        //********** watch here ************//
        //该方法会被InitConstant.threadCount个线程同时调用,存在多线程问题
        //下面的那句addCount是加了锁的,代表给Key累加数量时是原子性的,不会发生多加、少加的情况,到了设定的阈值一定会hot
        //譬如阈值是2,如果多个线程累加,在没hot前,hot的状态肯定是对的,譬如thread1 加1,thread2加1,那么thread2会hot返回true,开启推送
        //但是极端情况下,譬如阈值是10,当前是9,thread1走到这里时,加1,返回true,thread2也走到这里,加1,此时是11,返回true,问题来了
        //该key会走下面的else两次,也就是2次推送。
        //所以出现问题的原因是hotCache.getIfPresent(key)这一句在并发情况下,没return掉,放了两个key+1到addCount这一步时,会有问题
        //测试代码在TestBlockQueue类,直接运行可以看到会同时hot

        //那么该问题用解决吗,NO,不需要解决,1 首先要发生的条件极其苛刻,很难触发,以京东这样高的并发量,线上我也没见过触发连续2次推送同一个key的
        //2 即便触发了,后果也是可以接受的,2次推送而已,毫无影响,客户端无感知。但是如果非要解决,就要对slidingWindow实例加锁了,必然有一些开销

        //所以只要保证key数量不多计算就可以,少计算了没事。因为热key必然频率高,漏计几次没事。但非热key,多计算了,被干成了热key就不对了
        SlidingWindow slidingWindow = checkWindow(hotKeyModel, key);
        //看看hot没
        boolean hot = slidingWindow.addCount(hotKeyModel.getCount());

        if (!hot) {
            //如果没hot,重新put,cache会自动刷新过期时间
            CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).put(key, slidingWindow);
        } else {
            hotCache.put(key, 1);

            //删掉该key
            CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).invalidate(key);

            //开启推送
            hotKeyModel.setCreateTime(SystemClock.now());
            logger.info(NEW_KEY_EVENT + hotKeyModel.getKey());
            //分别推送到各client和etcd
            for (IPusher pusher : iPushers) {
                pusher.push(hotKeyModel);
            }

        }

    }

@Override
    public void removeKey(HotKeyModel hotKeyModel, KeyEventOriginal original) {
        //cache里的key
        String key = buildKey(hotKeyModel);

        hotCache.invalidate(key);
        CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).invalidate(key);

        //推送所有client删除
        hotKeyModel.setCreateTime(SystemClock.now());
        logger.info(DELETE_KEY_EVENT + hotKeyModel.getKey());

        for (IPusher pusher : iPushers) {
            pusher.remove(hotKeyModel);
        }

    }


    /**
     * 生成或返回该key的滑窗
     */
    private SlidingWindow checkWindow(HotKeyModel hotKeyModel, String key) {
        //取该key的滑窗
        return (SlidingWindow) CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).get(key, (Function) s -> {
            //是个新key,获取它的规则
            KeyRule keyRule = KeyRuleHolder.getRuleByAppAndKey(hotKeyModel);
            return new SlidingWindow(keyRule.getInterval(), keyRule.getThreshold());
        });
    }

最后总结

京东的hotkey处理是通过计数来动态判断是否为hotkey,然后缓存再本地内存中,做到毫秒级的scale out。那还有没有其他解决方案?下面是我的观点:
1.如果面对一些缓存key很少的场景,比如活动页信息(同时进行的活动页不可能超过1000),完全就可以直接将缓存放在本地内存中,到了刷新时间就从redis拉取最新缓存即可,不需要动态计算hotkey。也就是常见的多级缓存。
2.同样是动态判断hotkey,但会将hotkey迁移到专门的、更多节点、更高性能的hotkey redis集群中,集群中每个节点都有同一个hotkey缓存,这样就可以做到请求的分散,避免流量都流向同一个redis节点,判断是hotkey就去hotkey集群中取,不需要存在本地内存中了,维护起来会比较简单。

你可能感兴趣的:(京东hotkey源码解析)