目录
NioEventLoop的创建
NioEventLoop的启动
NioEventLoop的执行
检测IO事件并解决空轮询Bug
处理IO事件
任务执行
NioEventLoop的字面意思为NIO事件循环,它主要用于新连接的接入,以及数据流的读与写,可以理解为是整个Netty组件的发动机。每一个NioEventLoop都是由NioEventLoopGroup创建。
NioEventLoop可以看成一个线程执行器,在它的继承体系中,最顶端是Executor接口。而NioEventLoopGroup就是一组线程执行器,它的默认容量是2 * CPU核数。
在NioEventLoopGroup的初始化过程中,首先创建一个ThreadPerTaskExecutor,它的作用是为每一个任务创建NioEventLoop,NioEventLoop的线程命名规则为nioEventLoop-1-xx。
具体创建NioEventLoop的过程在newChild()方法中,这个方法有以下逻辑:
最后便是新连接绑定的过程,这时需要从NioEventLoopGroup中选择一个NioEventLoop与其绑定。每一个新连接进入时会有一个自增的index,最简单的方式就是用这个index与NioEventLoopGroup容量进行取模,计算出NioEventLoop的下标位置。不过Netty还对这个逻辑进行了优化,当NioEventLoopGroup容量是2的指数倍时,会通过按位与的方式计算下标(类似HashMap),这样效率更高。如果不是2的指数倍则直接取模。
NioEventLoop的启动入口在doBind0方法中:
private static void doBind0(
final ChannelFuture regFuture, final Channel channel,
final SocketAddress localAddress, final ChannelPromise promise) {
// This method is invoked before channelRegistered() is triggered. Give user handlers a chance to set up
// the pipeline in its channelRegistered() implementation.
channel.eventLoop().execute(new Runnable() {
@Override
public void run() {
if (regFuture.isSuccess()) {
channel.bind(localAddress, promise).addListener(ChannelFutureListener.CLOSE_ON_FAILURE);
} else {
promise.setFailure(regFuture.cause());
}
}
});
}
这里run()方法中的逻辑就是具体的端口绑定,我们可以把这个过程看做一个task,而execute()方法就为了创建线程并执行这个task。
@Override
public void execute(Runnable task) {
if (task == null) {
throw new NullPointerException("task");
}
// 判断当前线程是否是EventLoop线程
boolean inEventLoop = inEventLoop();
if (inEventLoop) {
addTask(task);
} else {
// 如果不是内部线程,则通过ThreadPerTaskExecutor创建一个线程并执行其run方法
startThread();
// 将task加入队列
addTask(task);
if (isShutdown() && removeTask(task)) {
reject();
}
}
if (!addTaskWakesUp && wakesUpForTask(task)) {
wakeup(inEventLoop);
}
}
在上面的代码中可以看到,NioEventLoop启动后就开始执行它的run方法,下面就分析一下run方法的逻辑。
@Override
protected void run() {
for (;;) {
try {
switch (selectStrategy.calculateStrategy(selectNowSupplier, hasTasks())) {
case SelectStrategy.CONTINUE:
continue;
case SelectStrategy.SELECT:
// wakenUp.getAndSet(false):标识select操作是未唤醒状态
// select具体逻辑可以看下面的select方法分析
select(wakenUp.getAndSet(false));
// 'wakenUp.compareAndSet(false, true)' is always evaluated
// before calling 'selector.wakeup()' to reduce the wake-up
// overhead. (Selector.wakeup() is an expensive operation.)
//
// However, there is a race condition in this approach.
// The race condition is triggered when 'wakenUp' is set to
// true too early.
//
// 'wakenUp' is set to true too early if:
// 1) Selector is waken up between 'wakenUp.set(false)' and
// 'selector.select(...)'. (BAD)
// 2) Selector is waken up between 'selector.select(...)' and
// 'if (wakenUp.get()) { ... }'. (OK)
//
// In the first case, 'wakenUp' is set to true and the
// following 'selector.select(...)' will wake up immediately.
// Until 'wakenUp' is set to false again in the next round,
// 'wakenUp.compareAndSet(false, true)' will fail, and therefore
// any attempt to wake up the Selector will fail, too, causing
// the following 'selector.select(...)' call to block
// unnecessarily.
//
// To fix this problem, we wake up the selector again if wakenUp
// is true immediately after selector.select(...).
// It is inefficient in that it wakes up the selector for both
// the first case (BAD - wake-up required) and the second case
// (OK - no wake-up required).
// 阻塞,直到被唤醒
if (wakenUp.get()) {
selector.wakeup();
}
default:
// fallthrough
}
cancelledKeys = 0;
needsToSelectAgain = false;
final int ioRatio = this.ioRatio;
if (ioRatio == 100) {
try {
// 处理IO事件
processSelectedKeys();
} finally {
// Ensure we always run tasks.
runAllTasks();
}
} else {
final long ioStartTime = System.nanoTime();
try {
processSelectedKeys();
} finally {
// Ensure we always run tasks.
final long ioTime = System.nanoTime() - ioStartTime;
runAllTasks(ioTime * (100 - ioRatio) / ioRatio);
}
}
} catch (Throwable t) {
handleLoopException(t);
}
// Always handle shutdown even if the loop processing threw an exception.
try {
if (isShuttingDown()) {
closeAll();
if (confirmShutdown()) {
return;
}
}
} catch (Throwable t) {
handleLoopException(t);
}
}
}
检测IO事件主要在select方法中完成,下面来看一下代码逻辑。
private void select(boolean oldWakenUp) throws IOException {
Selector selector = this.selector;
try {
// select次数
int selectCnt = 0;
long currentTimeNanos = System.nanoTime();
// 计算一个select的截止时间
long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
// 无限循环
for (;;) {
long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
// 判断是否超时
if (timeoutMillis <= 0) {
// 如果超时,执行一次非阻塞的selectNow操作,并跳出循环
if (selectCnt == 0) {
selector.selectNow();
selectCnt = 1;
}
break;
}
// If a task was submitted when wakenUp value was true, the task didn't get a chance to call
// Selector#wakeup. So we need to check task queue again before executing select operation.
// If we don't, the task might be pended until select operation was timed out.
// It might be pended until idle timeout if IdleStateHandler existed in pipeline.
// 如果没有超时,且任务队列中有任务,说明来活了,执行selectNow,跳出轮询
if (hasTasks() && wakenUp.compareAndSet(false, true)) {
selector.selectNow();
selectCnt = 1;
break;
}
// 调用NIO底层的select方法进行阻塞,阻塞时间为timeoutMillis,即一直阻塞到截止时间
int selectedKeys = selector.select(timeoutMillis);
selectCnt ++;
// 1、轮询到了某个事件
// 2、中途被用户唤醒
// 3、异步队列有任务
// 4、定时任务队列有任务
// 只要满足一个条件便终止轮询
if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
// - Selected something,
// - waken up by user, or
// - the task queue has a pending task.
// - a scheduled task is ready for processing
break;
}
if (Thread.interrupted()) {
// Thread was interrupted so reset selected keys and break so we not run into a busy loop.
// As this is most likely a bug in the handler of the user or it's client library we will
// also log it.
//
// See https://github.com/netty/netty/issues/2426
if (logger.isDebugEnabled()) {
logger.debug("Selector.select() returned prematurely because " +
"Thread.currentThread().interrupt() was called. Use " +
"NioEventLoop.shutdownGracefully() to shutdown the NioEventLoop.");
}
selectCnt = 1;
break;
}
long time = System.nanoTime();
// 如果 当前时间 - 阻塞时间 >= 截止时间 == false,说明被提前唤醒,空轮询可能被触发
if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
// timeoutMillis elapsed without anything selected.
selectCnt = 1;
} else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
// The selector returned prematurely many times in a row.
// Rebuild the selector to work around the problem.
logger.warn(
"Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.",
selectCnt, selector);
// 当空轮询次数超过一定阈值(512),重建Selector
// 主要逻辑就是将老的Selector上的Channel注册到新Selector,最后注销老Selelctor
rebuildSelector();
selector = this.selector;
// Select again to populate selectedKeys.
selector.selectNow();
selectCnt = 1;
break;
}
currentTimeNanos = time;
}
if (selectCnt > MIN_PREMATURE_SELECTOR_RETURNS) {
if (logger.isDebugEnabled()) {
logger.debug("Selector.select() returned prematurely {} times in a row for Selector {}.",
selectCnt - 1, selector);
}
}
} catch (CancelledKeyException e) {
if (logger.isDebugEnabled()) {
logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector {} - JDK bug?",
selector, e);
}
// Harmless exception - log anyway
}
}
总结一下select方法的逻辑:
1、首先计算本次select的截止时间deadline。
2、根据当前时间与截止时间比较,如果超时,结束本次select轮询操作。
3、如果没有超时,且任务队列中出现任务需要处理,结束select轮询。
4、如果没有超时,且任务队列没有任务, 调用NIO底层的select方法进行阻塞,会一直阻塞到截止时间,同时记录轮询次数。阻塞可以被外部任务唤醒。
5、阻塞结束后,会判断(当前时间 - 阻塞时间 >= 截止时间)的结果,如果为false,说明阻塞被提前唤醒,如果唤醒没有任务,说明可能触发了空轮询的Bug。Netty会对空轮询次数进行统计,当次数达到一定阈值(512)时,重建Selector,将老的Selector上的Channel注册到新Selector上。
处理IO事件在run方法中的processSelectedKeys方法内实现,默认会遍历SelectionKey数组(这里Netty对JDK底层实现进行了优化,将HashSet替换成了数组),获取SelectionKey的attachment(NioChannel),如果SelectionKey合法则开始处理IO事件。
任务执行在run方法中的runAllTasks方法中实现:
protected boolean runAllTasks(long timeoutNanos) {
// 聚合定时任务到普通任务队列中
fetchFromScheduledTaskQueue();
Runnable task = pollTask();
if (task == null) {
afterRunningAllTasks();
return false;
}
// 计算任务截止时间
final long deadline = ScheduledFutureTask.nanoTime() + timeoutNanos;
long runTasks = 0;
long lastExecutionTime;
// 循环执行任务
for (;;) {
// 执行任务的run方法
safeExecute(task);
runTasks ++;
// Check timeout every 64 tasks because nanoTime() is relatively expensive.
// XXX: Hard-coded value - will make it configurable if it is really a problem.
// 当累计达到64次时,判断是否到达截止时间,如果到了就跳出循环
if ((runTasks & 0x3F) == 0) {
lastExecutionTime = ScheduledFutureTask.nanoTime();
if (lastExecutionTime >= deadline) {
break;
}
}
task = pollTask();
if (task == null) {
lastExecutionTime = ScheduledFutureTask.nanoTime();
break;
}
}
afterRunningAllTasks();
this.lastExecutionTime = lastExecutionTime;
return true;
}
为什么要将用户操作封装成任务保存到队列?
当I/O线程和用户线程同时操作网络资源时,为了防止高并发导致的锁竞争,将用户线程的操作封装成Task放入任务队列中,由I/O线程负责执行,这样就实现了局部无锁化。
NioEventLoop可以处理两类任务:
任务执行过程中会从普通任务队列消费任务,当累计64个任务时会判断当前时间是否超过截止时间,如果超过了就中断执行,开始新的一轮select。