写在前面
京东hotkey是一个经过京东大促验证的hotkey防御中间件,大概原理是通过上报key访问数到统计服务器集群,统计服务器集群将hotkey通知到客户端,让hotkey能缓存到本地内存中,做到毫秒级的Scale-Out。处理方式有点像美团cat实时收集数据进行统计,只不过美团cat没有反向通知逻辑而已。非常贴近工作实践,值得一看。
client端
首先看一下缓存入口Cache的get方法,JdHotKeyStore.getValue是获取hotkey的方法,并且会进行访问次数的统计上报,如果获取到hotkey不为空,则直接返回,否则从redis获取并调用JdHotKeyStore.smartSet判断是否有hotkey,有则设置值,最后返回。
public String get(String key) {
Object object = JdHotKeyStore.getValue(key);
//如果已经缓存过了
if (object != null) {
return object.toString();
} else {
String value = getFromRedis(key);
JdHotKeyStore.smartSet(key, value);
return value;
}
}
JdHotKeyStore.getValue会先调用inRule校验此key是否有对应规则,如果没有对应规则则不处理,然后调用getValueSimple从本地内存中获取hotkey的存储对象ValueModel,如果没有获取到,则调用HotKeyPusher.push开始计数;如果获取到,会调用isNearExpire判断是否快过期了,如果是也计数,然后取出ValueModel里的value是否有设置对应值,有才返回。最后调用KeyHandlerFactory.getCounter().collect进行对应规则的计数。下面来一步步分析此流程。
public static Object getValue(String key) {
return getValue(key, null);
}
/**
* 获取value,如果value不存在则发往netty
*/
public static Object getValue(String key, KeyType keyType) {
try {
//如果没有为该key配置规则,就不用上报key
if (!inRule(key)) {
return null;
}
Object userValue = null;
ValueModel value = getValueSimple(key);
if (value == null) {
HotKeyPusher.push(key, keyType);
} else {
//临近过期了,也发
if (isNearExpire(value)) {
HotKeyPusher.push(key, keyType);
}
Object object = value.getValue();
//如果是默认值,也返回null
if (object instanceof Integer && Constant.MAGIC_NUMBER == (int) object) {
userValue = null;
} else {
userValue = object;
}
}
//统计计数
KeyHandlerFactory.getCounter().collect(new KeyHotModel(key, value != null));
return userValue;
} catch (Exception e) {
return null;
}
}
inRule会去KeyRule缓存中获取对应的规则,经过层层调用会到KeyRuleHolder的findByKey方法,然后继续调用其findRule方法选择对应的KeyRule,如果没有KeyRule就直接返回了,否则会拿到它的duration(hotkey缓存时间),拿到对应duration的本地缓存。实际上这里为了方法的通用性,用了get来代替contain的判断。
/**
* 判断这个key是否在被探测的规则范围内
*/
private static boolean inRule(String key) {
return CacheFactory.getCache(key) != null;
}
public static LocalCache getCache(String key) {
return KeyRuleHolder.findByKey(key);
}
public static LocalCache findByKey(String key) {
if (StrUtil.isEmpty(key)) {
return null;
}
KeyRule keyRule = findRule(key);
if (keyRule == null) {
return null;
}
return RULE_CACHE_MAP.get(keyRule.getDuration());
}
findRule的逻辑比较特别,作者已经留下了注释,优先全匹配->prefix匹配-> * 通配,这样做是为了更精确选择对应的规则。比如配置了sku_的前缀规则,但是茅台sku的流量突升,需要针对茅台sku的本地缓存再长一点时间让系统平稳渡过高峰期,那就配置一个sku_moutai_sku_id的全匹配规则,这样不会干扰到其他sku的缓存规则。
//遍历该app的所有rule,找到与key匹配的rule。优先全匹配->prefix匹配-> * 通配
private static KeyRule findRule(String key) {
KeyRule prefix = null;
KeyRule common = null;
for (KeyRule keyRule : KEY_RULES) {
if (key.equals(keyRule.getKey())) {
return keyRule;
}
if ((keyRule.isPrefix() && key.startsWith(keyRule.getKey()))) {
prefix = keyRule;
}
if ("*".equals(keyRule.getKey())) {
common = keyRule;
}
}
if (prefix != null) {
return prefix;
}
return common;
}
那么KEY_RULES的规则是怎么来的呢?这就要说到etcd了,其实可以把etcd当做zookeeper,也有对配置crud,然后通知客户端的功能。这里是做了定时拉取+监听变化的双重保证,这里跟携程apollo的处理非常像:不要把鸡蛋放在一个篮子,兜底功能真的很重要。每5秒定时从etcd拉取规则,开启监听器有变化就去etcd拉取规则。fetchRuleFromEtcd从ectd的rule_path获取rules,然后转化成ruleList继续调用notifyRuleChange进行本地处理。
private void fetchRule() {
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
//开启拉取etcd的worker信息,如果拉取失败,则定时继续拉取
scheduledExecutorService.scheduleAtFixedRate(() -> {
JdLogger.info(getClass(), "trying to connect to etcd and fetch rule info");
boolean success = fetchRuleFromEtcd();
if (success) {
//拉取已存在的热key
fetchExistHotKey();
scheduledExecutorService.shutdown();
}
}, 0, 5, TimeUnit.SECONDS);
}
private boolean fetchRuleFromEtcd() {
IConfigCenter configCenter = EtcdConfigFactory.configCenter();
try {
List ruleList = new ArrayList<>();
//从etcd获取自己的rule
String rules = configCenter.get(ConfigConstant.rulePath + Context.APP_NAME);
if (StringUtil.isNullOrEmpty(rules)) {
JdLogger.warn(getClass(), "rule is empty");
//会清空本地缓存队列
notifyRuleChange(ruleList);
return true;
}
ruleList = FastJsonUtils.toList(rules, KeyRule.class);
notifyRuleChange(ruleList);
return true;
} catch (StatusRuntimeException ex) {
//etcd连不上
JdLogger.error(getClass(), "etcd connected fail. Check the etcd address!!!");
return false;
} catch (Exception e) {
JdLogger.error(getClass(), "fetch rule failure, please check the rule info in etcd");
return true;
}
}
/**
* 异步监听rule规则变化
*/
private void startWatchRule() {
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(() -> {
JdLogger.info(getClass(), "--- begin watch rule change ----");
try {
IConfigCenter configCenter = EtcdConfigFactory.configCenter();
KvClient.WatchIterator watchIterator = configCenter.watch(ConfigConstant.rulePath + Context.APP_NAME);
//如果有新事件,即rule的变更,就重新拉取所有的信息
while (watchIterator.hasNext()) {
//这句必须写,next会让他卡住,除非真的有新rule变更
WatchUpdate watchUpdate = watchIterator.next();
List eventList = watchUpdate.getEvents();
JdLogger.info(getClass(), "rules info changed. begin to fetch new infos. rule change is " + eventList);
//全量拉取rule信息
fetchRuleFromEtcd();
}
} catch (Exception e) {
JdLogger.error(getClass(), "watch err");
}
});
}
notifyRuleChange会往EventBus发送KeyRuleInfoChangeEvent的通知,进而进入KeyRuleHolder的putRules方法,这里可以看到维护了KEY_RULES和RULE_CACHE_MAP。
private void notifyRuleChange(List rules) {
EventBusCenter.getInstance().post(new KeyRuleInfoChangeEvent(rules));
}
@Subscribe
public void ruleChange(KeyRuleInfoChangeEvent event) {
JdLogger.info(getClass(), "new rules info is :" + event.getKeyRules());
List ruleList = event.getKeyRules();
if (ruleList == null) {
return;
}
putRules(ruleList);
}
/**
* 所有的规则,如果规则的超时时间变化了,会重建caffeine
*/
public static void putRules(List keyRules) {
synchronized (KEY_RULES) {
//如果规则为空,清空规则表
if (CollectionUtil.isEmpty(keyRules)) {
KEY_RULES.clear();
RULE_CACHE_MAP.clear();
return;
}
KEY_RULES.clear();
KEY_RULES.addAll(keyRules);
Set durationSet = keyRules.stream().map(KeyRule::getDuration).collect(Collectors.toSet());
for (Integer duration : RULE_CACHE_MAP.keySet()) {
//先清除掉那些在RULE_CACHE_MAP里存的,但是rule里已没有的
if (!durationSet.contains(duration)) {
RULE_CACHE_MAP.remove(duration);
}
}
//遍历所有的规则
for (KeyRule keyRule : keyRules) {
int duration = keyRule.getDuration();
if (RULE_CACHE_MAP.get(duration) == null) {
LocalCache cache = CacheFactory.build(duration);
RULE_CACHE_MAP.put(duration, cache);
}
}
}
}
回到原有流程,getValueSimple方法的链路比较长,主要是通过key的规则,获取到对应的duration,然后从对应duration的本地缓存中获取ValueModel。
/**
* 仅获取value,如果不存在也不上报热key
*/
static ValueModel getValueSimple(String key) {
Object object = getCache(key).get(key);
if (object == null) {
return null;
}
return (ValueModel) object;
}
private static LocalCache getCache(String key) {
return CacheFactory.getNonNullCache(key);
}
public static LocalCache getNonNullCache(String key) {
LocalCache localCache = getCache(key);
if (localCache == null) {
return DEFAULT_CACHE;
}
return localCache;
}
public static LocalCache getCache(String key) {
return KeyRuleHolder.findByKey(key);
}
public static LocalCache findByKey(String key) {
if (StrUtil.isEmpty(key)) {
return null;
}
KeyRule keyRule = findRule(key);
if (keyRule == null) {
return null;
}
return RULE_CACHE_MAP.get(keyRule.getDuration());
}
接下来是HotKeyPusher.push,如果是remove则在etcd创建一个节点然后再删除,达到集群删除的效果。如果是探测并且key在规则内,则调用KeyHandlerFactory.getCollector().collect进行统计。
public static void push(String key, KeyType keyType) {
push(key, keyType, 1, false);
}
public static void push(String key, KeyType keyType, int count, boolean remove) {
if (count <= 0) {
count = 1;
}
if (keyType == null) {
keyType = KeyType.REDIS_KEY;
}
if (key == null) {
return;
}
HotKeyModel hotKeyModel = new HotKeyModel();
hotKeyModel.setAppName(Context.APP_NAME);
hotKeyModel.setKeyType(keyType);
hotKeyModel.setCount(count);
hotKeyModel.setRemove(remove);
hotKeyModel.setKey(key);
if (remove) {
//如果是删除key,就直接发到etcd去,不用做聚合。但是有点问题现在,这个删除只能删手工添加的key,不能删worker探测出来的
//因为各个client都在监听手工添加的那个path,没监听自动探测的path。所以如果手工的那个path下,没有该key,那么是删除不了的。
//删不了,就达不到集群监听删除事件的效果,怎么办呢?可以通过新增的方式,新增一个热key,然后删除它
EtcdConfigFactory.configCenter().putAndGrant(HotKeyPathTool.keyPath(hotKeyModel), Constant.DEFAULT_DELETE_VALUE, 1);
EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyPath(hotKeyModel));
//也删worker探测的目录
EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyRecordPath(hotKeyModel));
} else {
//如果key是规则内的要被探测的key,就积累等待传送
if (KeyRuleHolder.isKeyInRule(key)) {
//积攒起来,等待每半秒发送一次
KeyHandlerFactory.getCollector().collect(hotKeyModel);
}
}
}
KeyHandlerFactory.getCollector().collect方法交替使用两个map,对count进行累加,这样清理map的时候就不需要停顿了,交替使用是避免停顿的有效方式。
@Override
public void collect(HotKeyModel hotKeyModel) {
String key = hotKeyModel.getKey();
if (StrUtil.isEmpty(key)) {
return;
}
if (atomicLong.get() % 2 == 0) {
//不存在时返回null并将key-value放入,已有相同key时,返回该key对应的value,并且不覆盖
HotKeyModel model = map0.putIfAbsent(key, hotKeyModel);
if (model != null) {
model.setCount(model.getCount() + hotKeyModel.getCount());
}
} else {
HotKeyModel model = map1.putIfAbsent(key, hotKeyModel);
if (model != null) {
model.setCount(model.getCount() + hotKeyModel.getCount());
}
}
}
接回上文,还有一个 KeyHandlerFactory.getCounter().collect收集的是规则的访问次数,也是取到对应的规则,然后对规则的访问总数、热次数进行累加。
@Override
public void collect(KeyHotModel keyHotModel) {
if (atomicLong.get() % 2 == 0) {
put(keyHotModel.getKey(), keyHotModel.isHot(), HIT_MAP_0);
} else {
put(keyHotModel.getKey(), keyHotModel.isHot(), HIT_MAP_1);
}
}
public void put(String key, boolean isHot, ConcurrentHashMap map) {
//如key是pin_的前缀,则存储pin_
String rule = KeyRuleHolder.rule(key);
//不在规则内的不处理
if (StrUtil.isEmpty(rule)) {
return;
}
String nowTime = nowTime();
//rule + 分隔符 + 2020-10-23 21:11:22
String mapKey = rule + Constant.COUNT_DELIMITER + nowTime;
//该方法线程安全
HitCount hitCount = map.computeIfAbsent(mapKey, v -> new HitCount());
if (isHot) {
hitCount.hotHitCount.incrementAndGet();
}
hitCount.totalHitCount.incrementAndGet();
}
两个指标的收集已经分析完毕,那怎么发送到worker呢?来到PushSchedulerStarter,这里会启动对两个指标的定时线程池,分别会定时调用NettyKeyPusher的send和sendCount方法。
/**
* 每0.5秒推送一次待测key
*/
public static void startPusher(Long period) {
if (period == null || period <= 0) {
period = 500L;
}
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
scheduledExecutorService.scheduleAtFixedRate(() -> {
IKeyCollector collectHK = KeyHandlerFactory.getCollector();
KeyHandlerFactory.getPusher().send(Context.APP_NAME, collectHK.lockAndGetResult());
collectHK.finishOnce();
},0, period, TimeUnit.MILLISECONDS);
}
/**
* 每10秒推送一次数量统计
*/
public static void startCountPusher(Integer period) {
if (period == null || period <= 0) {
period = 10;
}
ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
scheduledExecutorService.scheduleAtFixedRate(() -> {
IKeyCollector collectHK = KeyHandlerFactory.getCounter();
KeyHandlerFactory.getPusher().sendCount(Context.APP_NAME, collectHK.lockAndGetResult());
collectHK.finishOnce();
},0, period, TimeUnit.SECONDS);
}
NettyKeyPusher的send和sendCount方法都是为统计数据选择对应的worker然后进行请求,chooseChannel就是根据key哈希到其中一个worker上,然后发送请求即可。
@Override
public void send(String appName, List list) {
//积攒了半秒的key集合,按照hash分发到不同的worker
long now = System.currentTimeMillis();
Map> map = new HashMap<>();
for(HotKeyModel model : list) {
model.setCreateTime(now);
Channel channel = WorkerInfoHolder.chooseChannel(model.getKey());
if (channel == null) {
continue;
}
List newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
newList.add(model);
}
for (Channel channel : map.keySet()) {
try {
List batch = map.get(channel);
channel.writeAndFlush(MsgBuilder.buildByteBuf(new HotKeyMsg(MessageType.REQUEST_NEW_KEY, FastJsonUtils.convertObjectToJSON(batch)))).sync();
} catch (Exception e) {
try {
InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
} catch (Exception ex) {
JdLogger.error(getClass(),"flush error");
}
}
}
}
@Override
public void sendCount(String appName, List list) {
//积攒了10秒的数量,按照hash分发到不同的worker
long now = System.currentTimeMillis();
Map> map = new HashMap<>();
for(KeyCountModel model : list) {
model.setCreateTime(now);
Channel channel = WorkerInfoHolder.chooseChannel(model.getRuleKey());
if (channel == null) {
continue;
}
List newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
newList.add(model);
}
for (Channel channel : map.keySet()) {
try {
List batch = map.get(channel);
channel.writeAndFlush(MsgBuilder.buildByteBuf(new HotKeyMsg(Context.APP_NAME,
MessageType.REQUEST_HIT_COUNT, FastJsonUtils.convertObjectToJSON(batch)))).sync();
} catch (Exception e) {
try {
InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
} catch (Exception ex) {
JdLogger.error(getClass(),"flush error");
}
}
}
}
public static Channel chooseChannel(String key) {
if (StrUtil.isEmpty(key) || WORKER_HOLDER.size() == 0) {
return null;
}
int index = Math.abs(key.hashCode() % WORKER_HOLDER.size());
return WORKER_HOLDER.get(index).channel;
}
最后当worker统计到hotkey时,client需要接收worker推送过来的hotkey进行存储,可以看到NettyClientHandler会向EventBus发送ReceiveNewKeyEvent事件,ReceiveNewKeyListener收到此事件后将调用receiveNewKeyListener.newKey,将hotkey放到本地缓存,client端的处理流程就结束了。
@Override
protected void channelRead0(ChannelHandlerContext channelHandlerContext, String message) {
HotKeyMsg msg = FastJsonUtils.toBean(message, HotKeyMsg.class);
if (MessageType.PONG == msg.getMessageType()) {
JdLogger.info(getClass(), "heart beat");
return;
}
if (MessageType.RESPONSE_NEW_KEY == msg.getMessageType()) {
JdLogger.info(getClass(), "receive new key : " + msg);
HotKeyModel model = FastJsonUtils.toBean(msg.getBody(), HotKeyModel.class);
EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
}
}
@Subscribe
public void newKeyComing(ReceiveNewKeyEvent event) {
HotKeyModel hotKeyModel = event.getModel();
if (hotKeyModel == null) {
return;
}
//收到新key推送
if (receiveNewKeyListener != null) {
receiveNewKeyListener.newKey(hotKeyModel);
}
}
@Override
public void newKey(HotKeyModel hotKeyModel) {
long now = System.currentTimeMillis();
//如果key到达时已经过去1秒了,记录一下。手工删除key时,没有CreateTime
if (hotKeyModel.getCreateTime() != 0 && Math.abs(now - hotKeyModel.getCreateTime()) > 1000) {
JdLogger.warn(getClass(), "the key comes too late : " + hotKeyModel.getKey() + " now " +
+now + " keyCreateAt " + hotKeyModel.getCreateTime());
}
if (hotKeyModel.isRemove()) {
//如果是删除事件,就直接删除
deleteKey(hotKeyModel.getKey());
return;
}
//已经是热key了,又推过来同样的热key,做个日志记录,并刷新一下
if (JdHotKeyStore.isHot(hotKeyModel.getKey())) {
JdLogger.warn(getClass(), "receive repeat hot key :" + hotKeyModel.getKey() + " at " + now);
}
addKey(hotKeyModel.getKey());
}
private void addKey(String key) {
ValueModel valueModel = ValueModel.defaultValue(key);
if (valueModel == null) {
//不符合任何规则
deleteKey(key);
return;
}
//如果原来该key已经存在了,那么value就被重置,过期时间也会被重置。如果原来不存在,就新增的热key
JdHotKeyStore.setValueDirectly(key, valueModel);
}
private void deleteKey(String key) {
CacheFactory.getNonNullCache(key).delete(key);
}
static void setValueDirectly(String key, Object value) {
getCache(key).set(key, value);
}
worker端
由上文可知,client与worker的交互只有推送统计数据到worker,worker接收处理,最后推送hotkey到client。因此worker端只需要分析两个部分:统计数据汇总、推送hotkey。
首先看到HotKey的处理逻辑是在HotKeyFilter中,首先会对totalReceiveKeyCount进行累加,然后调用publishMsg,如果统计信息超时1秒或者在白名单中就不处理,否则继续调用keyProducer.push。
@Override
public boolean chain(HotKeyMsg message, ChannelHandlerContext ctx) {
if (MessageType.REQUEST_NEW_KEY == message.getMessageType()) {
totalReceiveKeyCount.incrementAndGet();
publishMsg(message.getBody(), ctx);
return false;
}
return true;
}
private void publishMsg(String message, ChannelHandlerContext ctx) {
//老版的用的单个HotKeyModel,新版用的数组
List models = FastJsonUtils.toList(message, HotKeyModel.class);
long now = SystemClock.now();
for (HotKeyModel model : models) {
//白名单key不处理
if (WhiteListHolder.contains(model.getKey())) {
continue;
}
long timeOut = now - model.getCreateTime();
if (timeOut > 1000) {
logger.info("key timeout " + timeOut + ", from ip : " + NettyIpUtil.clientIp(ctx));
}
keyProducer.push(model, now);
}
}
keyProducer.push将未过时的统计信息丢进queue中。
public void push(HotKeyModel model, long now) {
if (model == null || model.getKey() == null) {
return;
}
//5秒前的过时消息就不处理了
if (now - model.getCreateTime() > InitConstant.timeOut) {
expireTotalCount.increment();
return;
}
try {
QUEUE.put(model);
totalOfferCount.increment();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
worker端会开启指定数量的KeyConsumer,不断消费queue中的统计数据。根据统计数据的类型调用KeyListener的removeKey和newKey。
@Bean
public Consumer consumer() {
int nowCount = CpuNum.workerCount();
//将实际值赋给static变量
if (threadCount != 0) {
nowCount = threadCount;
} else {
if (nowCount >= 8) {
nowCount = nowCount / 2;
}
}
List consumerList = new ArrayList<>();
for (int i = 0; i < nowCount; i++) {
KeyConsumer keyConsumer = new KeyConsumer();
keyConsumer.setKeyListener(iKeyListener);
consumerList.add(keyConsumer);
threadPoolExecutor.submit(keyConsumer::beginConsume);
}
return new Consumer(consumerList);
}
}
public void beginConsume() {
while (true) {
try {
HotKeyModel model = QUEUE.take();
if (model.isRemove()) {
iKeyListener.removeKey(model, KeyEventOriginal.CLIENT);
} else {
iKeyListener.newKey(model, KeyEventOriginal.CLIENT);
}
//处理完毕,将数量加1
totalDealCount.increment();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
KeyListener的removeKey和newKey方法对Cache中的滑动窗口SlidingWindow进行删除或者累加,删除或者达到一定访问数就会推送到根据appname选出所有client进行推送。
@Override
public void newKey(HotKeyModel hotKeyModel, KeyEventOriginal original) {
//cache里的key
String key = buildKey(hotKeyModel);
//判断是不是刚热不久
Object o = hotCache.getIfPresent(key);
if (o != null) {
return;
}
//********** watch here ************//
//该方法会被InitConstant.threadCount个线程同时调用,存在多线程问题
//下面的那句addCount是加了锁的,代表给Key累加数量时是原子性的,不会发生多加、少加的情况,到了设定的阈值一定会hot
//譬如阈值是2,如果多个线程累加,在没hot前,hot的状态肯定是对的,譬如thread1 加1,thread2加1,那么thread2会hot返回true,开启推送
//但是极端情况下,譬如阈值是10,当前是9,thread1走到这里时,加1,返回true,thread2也走到这里,加1,此时是11,返回true,问题来了
//该key会走下面的else两次,也就是2次推送。
//所以出现问题的原因是hotCache.getIfPresent(key)这一句在并发情况下,没return掉,放了两个key+1到addCount这一步时,会有问题
//测试代码在TestBlockQueue类,直接运行可以看到会同时hot
//那么该问题用解决吗,NO,不需要解决,1 首先要发生的条件极其苛刻,很难触发,以京东这样高的并发量,线上我也没见过触发连续2次推送同一个key的
//2 即便触发了,后果也是可以接受的,2次推送而已,毫无影响,客户端无感知。但是如果非要解决,就要对slidingWindow实例加锁了,必然有一些开销
//所以只要保证key数量不多计算就可以,少计算了没事。因为热key必然频率高,漏计几次没事。但非热key,多计算了,被干成了热key就不对了
SlidingWindow slidingWindow = checkWindow(hotKeyModel, key);
//看看hot没
boolean hot = slidingWindow.addCount(hotKeyModel.getCount());
if (!hot) {
//如果没hot,重新put,cache会自动刷新过期时间
CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).put(key, slidingWindow);
} else {
hotCache.put(key, 1);
//删掉该key
CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).invalidate(key);
//开启推送
hotKeyModel.setCreateTime(SystemClock.now());
logger.info(NEW_KEY_EVENT + hotKeyModel.getKey());
//分别推送到各client和etcd
for (IPusher pusher : iPushers) {
pusher.push(hotKeyModel);
}
}
}
@Override
public void removeKey(HotKeyModel hotKeyModel, KeyEventOriginal original) {
//cache里的key
String key = buildKey(hotKeyModel);
hotCache.invalidate(key);
CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).invalidate(key);
//推送所有client删除
hotKeyModel.setCreateTime(SystemClock.now());
logger.info(DELETE_KEY_EVENT + hotKeyModel.getKey());
for (IPusher pusher : iPushers) {
pusher.remove(hotKeyModel);
}
}
/**
* 生成或返回该key的滑窗
*/
private SlidingWindow checkWindow(HotKeyModel hotKeyModel, String key) {
//取该key的滑窗
return (SlidingWindow) CaffeineCacheHolder.getCache(hotKeyModel.getAppName()).get(key, (Function) s -> {
//是个新key,获取它的规则
KeyRule keyRule = KeyRuleHolder.getRuleByAppAndKey(hotKeyModel);
return new SlidingWindow(keyRule.getInterval(), keyRule.getThreshold());
});
}
最后总结
京东的hotkey处理是通过计数来动态判断是否为hotkey,然后缓存再本地内存中,做到毫秒级的scale out。那还有没有其他解决方案?下面是我的观点:
1.如果面对一些缓存key很少的场景,比如活动页信息(同时进行的活动页不可能超过1000),完全就可以直接将缓存放在本地内存中,到了刷新时间就从redis拉取最新缓存即可,不需要动态计算hotkey。也就是常见的多级缓存。
2.同样是动态判断hotkey,但会将hotkey迁移到专门的、更多节点、更高性能的hotkey redis集群中,集群中每个节点都有同一个hotkey缓存,这样就可以做到请求的分散,避免流量都流向同一个redis节点,判断是hotkey就去hotkey集群中取,不需要存在本地内存中了,维护起来会比较简单。