Netty源码-Selector.select bug修复实现

  • 1 概述
  • 2 相关配置
  • 3 实现原理

1 概述

相信了解过Java NIO Selector.select存在的bug可能会造成select在并没有事件准备好时提前返回,又我们一般会在循环中调用select方法,进而会造成循环空转。

Netty在NioEventLoop中考虑了这个问题,并通过在select方法不正常返回(Netty源码注释称其为prematurely,即提前返回)超过一定次数时重新创建新的Selector来修复此bug。

2 相关配置

Netty提供了配置参数io.netty.selectorAutoRebuildThreshold供用户定义select创建新Selector提前返回的次数阈值,超过该次数则会触发Selector自动重建,默认为512。

但是如果指定的io.netty.selectorAutoRebuildThreshold小于3在Netty中被视为关闭了该功能。

3 实现原理

Netty对Selector.select提前返回的检测和处理逻辑主要在NioEventLoop.select方法中:

//NioEventLoop
private void select(boolean oldWakenUp) throws IOException {
    Selector selector = this.selector;
    try {
        //计数器置0
        int selectCnt = 0;
        long currentTimeNanos = System.nanoTime();
        //根据注册的定时任务,获取本次select的阻塞时间
        long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
        for (;;) {
            //每次循环迭代都重新计算一次select的可阻塞时间
            long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
            //如果可阻塞时间为0,表示已经有定时任务快要超时
            //此时如果是第一次循环(selectCnt=0),则调用一次
            //selector.selectNow,然后退出循环返回
            //selectorNow方法的调用主要是为了尽可能检测
            //出准备好的网络事件进行处理
            if (timeoutMillis <= 0) {
                if (selectCnt == 0) {
                    selector.selectNow();
                    selectCnt = 1;
                }
                break;
            }

            // If a task was submitted when wakenUp value was true, the task didn't get a chance to call
            // Selector#wakeup. So we need to check task queue again before executing select operation.
            // If we don't, the task might be pended until select operation was timed out.
            // It might be pended until idle timeout if IdleStateHandler existed in pipeline.
            //如果没有定时任务超时,但是有以前注册的任务(这里不限定
            //是定时任务),且成功设置wakenUp为true,则调用
            //selectNow并返回
            if (hasTasks() && wakenUp.compareAndSet(false, true)) {
                selector.selectNow();
                selectCnt = 1;
                break;
            }
            //调用select方法,阻塞时间为上面算出的最近一个将要超时的
            //定时任务时间
            int selectedKeys = selector.select(timeoutMillis);
            //计数器加1
            selectCnt ++;

            //selectedKeys != 0:如果返回的准备好时间的selectedKeys个
            //数不为0表示这次是因为确实有事件准备好的正常返回
            //oldWakenUp:表示进来时,已经有其他地方对selector进行了
            //唤醒操作
            //wakenUp.get():也表示selector被唤醒
            //hasTasks() || hasScheduledTasks():表示有任务或
            //定时任务要执行
            //发生以上几种情况任一种则直接返回
            if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
                // - Selected something,
                // - waken up by user, or
                // - the task queue has a pending task.
                // - a scheduled task is ready for processing
                break;
            }
            //如果线程被中断,计数器置零,直接返回
            if (Thread.interrupted()) {
                // Thread was interrupted so reset selected keys and break so we not run into a busy loop.
                // As this is most likely a bug in the handler of the user or it's client library we will
                // also log it.
                //
                // See https://github.com/netty/netty/issues/2426
                if (logger.isDebugEnabled()) {
                    logger.debug("Selector.select() returned prematurely because " +
                            "Thread.currentThread().interrupt() was called. Use " +
                            "NioEventLoop.shutdownGracefully() to shutdown the NioEventLoop.");
                }
                selectCnt = 1;
                break;
            }
            //这里判断select返回是否是因为计算的超时时间
            //已过,这种情况下也属于正常返回,计数器置1
            //进入下次循环
            long time = System.nanoTime();
            if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
                // timeoutMillis elapsed without anything selected.
                selectCnt = 1;
            } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
                    selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
                //进入这个分支,表示启用了select bug修复机制,即
                //配置的io.netty.selectorAutoRebuildThreshold
                //参数大于3,且上面select方法提前返回次数已经大于
                //配置的阈值,则会触发selector重建
                // The selector returned prematurely many times in a row.
                // Rebuild the selector to work around the problem.
                logger.warn(
                        "Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.",
                        selectCnt, selector);
                //进行selector重建
                rebuildSelector();
                selector = this.selector;
                //重建完之后,尝试调用非阻塞版本select一次,
                //并直接返回
                // Select again to populate selectedKeys.
                selector.selectNow();
                selectCnt = 1;
                break;
            }

            currentTimeNanos = time;
        }

        //这种是对于关闭select bug修复机制的程序的处理,
        //简单记录日志,便于排查问题
        if (selectCnt > MIN_PREMATURE_SELECTOR_RETURNS) {
            if (logger.isDebugEnabled()) {
                logger.debug("Selector.select() returned prematurely {} times in a row for Selector {}.",
                        selectCnt - 1, selector);
            }
        }
    } catch (CancelledKeyException e) {
        if (logger.isDebugEnabled()) {
            logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector {} - JDK bug?",
                    selector, e);
        }
        // Harmless exception - log anyway
    }
}

上面调用的rebuildSelector源码如下:

//NioEventLoop
/**
* Replaces the current {@link Selector} of this event loop with newly created {@link Selector}s to work
* around the infamous epoll 100% CPU bug.
*/
public void rebuildSelector() {
    //如果不在该线程中,则放到任务队列中
    if (!inEventLoop()) {
        execute(new Runnable() {
            @Override
            public void run() {
                rebuildSelector0();
            }
        });
        return;
    }
    //否则表示在该线程中,直接调用实际重建方法
    rebuildSelector0();
}

private void rebuildSelector0() {
    final Selector oldSelector = selector;
    final SelectorTuple newSelectorTuple;

    //如果旧的selector为空,则直接返回
    if (oldSelector == null) {
        return;
    }

    try {
        //新建一个新的selector
        newSelectorTuple = openSelector();
    } catch (Exception e) {
        logger.warn("Failed to create a new Selector.", e);
        return;
    }

    //对于注册在旧selector上的所有key,依次重新在新建的
    //selecor上重新注册一遍
    // Register all channels to the new Selector.
    int nChannels = 0;
    for (SelectionKey key: oldSelector.keys()) {
        Object a = key.attachment();
        try {
            if (!key.isValid() || key.channel().keyFor(newSelectorTuple.unwrappedSelector) != null) {
                continue;
            }

            int interestOps = key.interestOps();
            key.cancel();
            SelectionKey newKey = key.channel().register(newSelectorTuple.unwrappedSelector, interestOps, a);
            if (a instanceof AbstractNioChannel) {
                // Update SelectionKey
                ((AbstractNioChannel) a).selectionKey = newKey;
            }
            nChannels ++;
        } catch (Exception e) {
            logger.warn("Failed to re-register a Channel to the new Selector.", e);
            if (a instanceof AbstractNioChannel) {
                AbstractNioChannel ch = (AbstractNioChannel) a;
                ch.unsafe().close(ch.unsafe().voidPromise());
            } else {
                @SuppressWarnings("unchecked")
                NioTask task = (NioTask) a;
                invokeChannelUnregistered(task, key, e);
            }
        }
    }

    //将该NioEventLoop关联的selector赋值为新建的selector
    selector = newSelectorTuple.selector;
    unwrappedSelector = newSelectorTuple.unwrappedSelector;

    try {
        //关闭旧的selector
        // time to close the old selector as everything else is registered to the new one
        oldSelector.close();
    } catch (Throwable t) {
        if (logger.isWarnEnabled()) {
            logger.warn("Failed to close the old Selector.", t);
        }
    }

    logger.info("Migrated " + nChannels + " channel(s) to the new Selector.");
}

你可能感兴趣的:(Netty源码-Selector.select bug修复实现)