在上一节,记录了NioEventLoop启动前做的一些事情,并最终找到一个方法run,如果不记得可以回上一节看看,因为这个run方法是本篇以及相关章节的入口。
这里再贴下run方法里面的三个核心方法:
io.netty.channel.nio.NioEventLoop#run
...
// 检查I/O事件
select(wakenUp.getAndSet(false));
...
// 处理上面select查到的I/O事件
processSelectedKeys();
...
// 运行上面处理的事件集
runAllTasks(ioTime * (100 - ioRatio) / ioRatio);
...
本节要追踪的就是select方法,除了正常逻辑以外,还可以看到Netty是如何解决jdk空轮询bug。
一些我的口头名词解释:select操作和检测I/O事件是同一个意思。
Netty Version
Netty中除了普通的任务之外,还会有一些定时任务,而这些定时任务,在执行之前实际上是存储在一个定时任务队列中,这个队列里的元素是按照截止时间排序的(本节会讲到这一点)。
另外,在本节之前的篇章中提到的任务队列都不是定时任务队列,在这一节会简单看一下这个任务队列。
首先将视角切回io.netty.channel.nio.NioEventLoop#run
,这是起点,如果忘记怎么过来的请看回顾,下面直接开始。
首先找到这一段代码:
select(wakenUp.getAndSet(false));
进入select方法,由于代码很长,下面分段贴,此处【坐标1】:
io.netty.channel.nio.NioEventLoop#select
Selector selector = this.selector;
try {
int selectCnt = 0;
long currentTimeNanos = System.nanoTime();
long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
for (;;) {
long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
if (timeoutMillis <= 0) {
if (selectCnt == 0) {
selector.selectNow();
selectCnt = 1;
}
break;
}
...(略)
关于上面提到的非阻塞,其实看完源码后仍然我不太理解,因为我追进源码最终还是看到synchronized,也许是指这里的synchronized并未上升到重量级锁。
这里再贴下文档原话,我英语再塑料,也不至于翻译错非阻塞吧:This method performs a non-blocking selection operation。
这里先不往下看,进入delayNanos方法看看:
io.netty.util.concurrent.SingleThreadEventExecutor#delayNanos
protected long delayNanos(long currentTimeNanos) {
ScheduledFutureTask<?> scheduledTask = peekScheduledTask();
if (scheduledTask == null) {
return SCHEDULE_PURGE_INTERVAL;
}
return scheduledTask.delayNanos(currentTimeNanos);
}
再看看ScheduledFutureTask这个类,找到它的compareTo方法:
io.netty.util.concurrent.ScheduledFutureTask#compareTo
public int compareTo(Delayed o) {
if (this == o) {
return 0;
}
ScheduledFutureTask<?> that = (ScheduledFutureTask<?>) o;
long d = deadlineNanos() - that.deadlineNanos();
if (d < 0) {
return -1;
} else if (d > 0) {
return 1;
} else if (id < that.id) {
return -1;
} else if (id == that.id) {
throw new Error();
} else {
return 1;
}
}
解读了一些时间参数的含义、验证了定时任务是按照截止时间排序后,再次将视角转回io.netty.channel.nio.NioEventLoop#select
,继续来看看【坐标1】还没贴出的代码:
io.netty.channel.nio.NioEventLoop#select
...(略)
// If a task was submitted when wakenUp value was true, the task didn't get a chance to call
// Selector#wakeup. So we need to check task queue again before executing select operation.
// If we don't, the task might be pended until select operation was timed out.
// It might be pended until idle timeout if IdleStateHandler existed in pipeline.
if (hasTasks() && wakenUp.compareAndSet(false, true)) {
selector.selectNow();
selectCnt = 1;
break;
}
int selectedKeys = selector.select(timeoutMillis);
selectCnt ++;
...(略)
继续往下看代码:
io.netty.channel.nio.NioEventLoop#select
...(略)
if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
// - Selected something,
// - waken up by user, or
// - the task queue has a pending task.
// - a scheduled task is ready for processing
break;
}
...(略)
中间还有个判断线程中断的就跳过了,基本没啥需要解释的,下面来看看一段比较关键的,也正是这段解决了jdk空轮询的bug:
io.netty.channel.nio.NioEventLoop#select
...(略)
long time = System.nanoTime();
if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
// timeoutMillis elapsed without anything selected.
selectCnt = 1;
} else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
// The selector returned prematurely many times in a row.
// Rebuild the selector to work around the problem.
logger.warn(
"Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.",
selectCnt, selector);
rebuildSelector();
selector = this.selector;
// Select again to populate selectedKeys.
selector.selectNow();
selectCnt = 1;
break;
}
currentTimeNanos = time;
}//for循环结束
...(略)
**如果执行代码时,到达了上面ifelse的前一行,但却没有进入if或else if,就说明发生了空轮询。**只是空轮询次数低于SELECTOR_AUTO_REBUILD_THRESHOLD(默认512)时在不断重试。
再分析一下Netty是如何判断空轮询的:
而Netty则是靠rebuildSelector();这个方法去解决空轮询bug的,不妨跟进去看看(代码很长,但逻辑还是很简单):
io.netty.channel.nio.NioEventLoop#rebuildSelector
public void rebuildSelector() {
if (!inEventLoop()) {
// 保证线程安全
execute(new Runnable() {
@Override
public void run() {
rebuildSelector();
}
});
return;
}
final Selector oldSelector = selector;
final Selector newSelector;
if (oldSelector == null) {
return;
}
try {
newSelector = openSelector();
} catch (Exception e) {
logger.warn("Failed to create a new Selector.", e);
return;
}
// Register all channels to the new Selector.
int nChannels = 0;
for (;;) {
try {
for (SelectionKey key: oldSelector.keys()) {
// 其实就是Channel
Object a = key.attachment();
try {
if (!key.isValid() || key.channel().keyFor(newSelector) != null) {
continue;
}
int interestOps = key.interestOps();
key.cancel();
SelectionKey newKey = key.channel().register(newSelector, interestOps, a);
if (a instanceof AbstractNioChannel) {
// Update SelectionKey
((AbstractNioChannel) a).selectionKey = newKey;
}
nChannels ++;
} catch (Exception e) {
logger.warn("Failed to re-register a Channel to the new Selector.", e);
if (a instanceof AbstractNioChannel) {
AbstractNioChannel ch = (AbstractNioChannel) a;
ch.unsafe().close(ch.unsafe().voidPromise());
} else {
@SuppressWarnings("unchecked")
NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
invokeChannelUnregistered(task, key, e);
}
}
}
} catch (ConcurrentModificationException e) {
// Probably due to concurrent modification of the key set.
continue;
}
break;
}
selector = newSelector;
try {
// time to close the old selector as everything else is registered to the new one
oldSelector.close();
} catch (Throwable t) {
if (logger.isWarnEnabled()) {
logger.warn("Failed to close the old Selector.", t);
}
}
logger.info("Migrated " + nChannels + " channel(s) to the new Selector.");
}
Netty解决空轮询bug的手法看上去也很"暴力",就是重建一个新的selector,并把旧selector上的selectedKeys全部复制到新的selector上,再用新的selector替换旧的selector。之后再尝试select操作就很可能不会再发生空轮询bug了。
关于selectedKey,在这篇有提到过。
关于attchment取出的Object就是Chaneel这个说法,不记得的可以看这篇。