- 1 概述
- 2 相关配置
- 3 实现原理
1 概述
相信了解过Java NIO Selector.select
存在的bug可能会造成select
在并没有事件准备好时提前返回,又我们一般会在循环中调用select
方法,进而会造成循环空转。
Netty在NioEventLoop
中考虑了这个问题,并通过在select
方法不正常返回(Netty源码注释称其为prematurely,即提前返回)超过一定次数时重新创建新的Selector
来修复此bug。
2 相关配置
Netty提供了配置参数io.netty.selectorAutoRebuildThreshold
供用户定义select
创建新Selector
提前返回的次数阈值,超过该次数则会触发Selector
自动重建,默认为512。
但是如果指定的io.netty.selectorAutoRebuildThreshold
小于3在Netty中被视为关闭了该功能。
3 实现原理
Netty对Selector.select
提前返回的检测和处理逻辑主要在NioEventLoop.select
方法中:
//NioEventLoop
private void select(boolean oldWakenUp) throws IOException {
Selector selector = this.selector;
try {
//计数器置0
int selectCnt = 0;
long currentTimeNanos = System.nanoTime();
//根据注册的定时任务,获取本次select的阻塞时间
long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
for (;;) {
//每次循环迭代都重新计算一次select的可阻塞时间
long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
//如果可阻塞时间为0,表示已经有定时任务快要超时
//此时如果是第一次循环(selectCnt=0),则调用一次
//selector.selectNow,然后退出循环返回
//selectorNow方法的调用主要是为了尽可能检测
//出准备好的网络事件进行处理
if (timeoutMillis <= 0) {
if (selectCnt == 0) {
selector.selectNow();
selectCnt = 1;
}
break;
}
// If a task was submitted when wakenUp value was true, the task didn't get a chance to call
// Selector#wakeup. So we need to check task queue again before executing select operation.
// If we don't, the task might be pended until select operation was timed out.
// It might be pended until idle timeout if IdleStateHandler existed in pipeline.
//如果没有定时任务超时,但是有以前注册的任务(这里不限定
//是定时任务),且成功设置wakenUp为true,则调用
//selectNow并返回
if (hasTasks() && wakenUp.compareAndSet(false, true)) {
selector.selectNow();
selectCnt = 1;
break;
}
//调用select方法,阻塞时间为上面算出的最近一个将要超时的
//定时任务时间
int selectedKeys = selector.select(timeoutMillis);
//计数器加1
selectCnt ++;
//selectedKeys != 0:如果返回的准备好时间的selectedKeys个
//数不为0表示这次是因为确实有事件准备好的正常返回
//oldWakenUp:表示进来时,已经有其他地方对selector进行了
//唤醒操作
//wakenUp.get():也表示selector被唤醒
//hasTasks() || hasScheduledTasks():表示有任务或
//定时任务要执行
//发生以上几种情况任一种则直接返回
if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
// - Selected something,
// - waken up by user, or
// - the task queue has a pending task.
// - a scheduled task is ready for processing
break;
}
//如果线程被中断,计数器置零,直接返回
if (Thread.interrupted()) {
// Thread was interrupted so reset selected keys and break so we not run into a busy loop.
// As this is most likely a bug in the handler of the user or it's client library we will
// also log it.
//
// See https://github.com/netty/netty/issues/2426
if (logger.isDebugEnabled()) {
logger.debug("Selector.select() returned prematurely because " +
"Thread.currentThread().interrupt() was called. Use " +
"NioEventLoop.shutdownGracefully() to shutdown the NioEventLoop.");
}
selectCnt = 1;
break;
}
//这里判断select返回是否是因为计算的超时时间
//已过,这种情况下也属于正常返回,计数器置1
//进入下次循环
long time = System.nanoTime();
if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
// timeoutMillis elapsed without anything selected.
selectCnt = 1;
} else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
//进入这个分支,表示启用了select bug修复机制,即
//配置的io.netty.selectorAutoRebuildThreshold
//参数大于3,且上面select方法提前返回次数已经大于
//配置的阈值,则会触发selector重建
// The selector returned prematurely many times in a row.
// Rebuild the selector to work around the problem.
logger.warn(
"Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.",
selectCnt, selector);
//进行selector重建
rebuildSelector();
selector = this.selector;
//重建完之后,尝试调用非阻塞版本select一次,
//并直接返回
// Select again to populate selectedKeys.
selector.selectNow();
selectCnt = 1;
break;
}
currentTimeNanos = time;
}
//这种是对于关闭select bug修复机制的程序的处理,
//简单记录日志,便于排查问题
if (selectCnt > MIN_PREMATURE_SELECTOR_RETURNS) {
if (logger.isDebugEnabled()) {
logger.debug("Selector.select() returned prematurely {} times in a row for Selector {}.",
selectCnt - 1, selector);
}
}
} catch (CancelledKeyException e) {
if (logger.isDebugEnabled()) {
logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector {} - JDK bug?",
selector, e);
}
// Harmless exception - log anyway
}
}
上面调用的rebuildSelector
源码如下:
//NioEventLoop
/**
* Replaces the current {@link Selector} of this event loop with newly created {@link Selector}s to work
* around the infamous epoll 100% CPU bug.
*/
public void rebuildSelector() {
//如果不在该线程中,则放到任务队列中
if (!inEventLoop()) {
execute(new Runnable() {
@Override
public void run() {
rebuildSelector0();
}
});
return;
}
//否则表示在该线程中,直接调用实际重建方法
rebuildSelector0();
}
private void rebuildSelector0() {
final Selector oldSelector = selector;
final SelectorTuple newSelectorTuple;
//如果旧的selector为空,则直接返回
if (oldSelector == null) {
return;
}
try {
//新建一个新的selector
newSelectorTuple = openSelector();
} catch (Exception e) {
logger.warn("Failed to create a new Selector.", e);
return;
}
//对于注册在旧selector上的所有key,依次重新在新建的
//selecor上重新注册一遍
// Register all channels to the new Selector.
int nChannels = 0;
for (SelectionKey key: oldSelector.keys()) {
Object a = key.attachment();
try {
if (!key.isValid() || key.channel().keyFor(newSelectorTuple.unwrappedSelector) != null) {
continue;
}
int interestOps = key.interestOps();
key.cancel();
SelectionKey newKey = key.channel().register(newSelectorTuple.unwrappedSelector, interestOps, a);
if (a instanceof AbstractNioChannel) {
// Update SelectionKey
((AbstractNioChannel) a).selectionKey = newKey;
}
nChannels ++;
} catch (Exception e) {
logger.warn("Failed to re-register a Channel to the new Selector.", e);
if (a instanceof AbstractNioChannel) {
AbstractNioChannel ch = (AbstractNioChannel) a;
ch.unsafe().close(ch.unsafe().voidPromise());
} else {
@SuppressWarnings("unchecked")
NioTask task = (NioTask) a;
invokeChannelUnregistered(task, key, e);
}
}
}
//将该NioEventLoop关联的selector赋值为新建的selector
selector = newSelectorTuple.selector;
unwrappedSelector = newSelectorTuple.unwrappedSelector;
try {
//关闭旧的selector
// time to close the old selector as everything else is registered to the new one
oldSelector.close();
} catch (Throwable t) {
if (logger.isWarnEnabled()) {
logger.warn("Failed to close the old Selector.", t);
}
}
logger.info("Migrated " + nChannels + " channel(s) to the new Selector.");
}