NIO在Windows下占用大量CPU资源原因详解

1. 现象简述

在项目性能测试过程中发现,同样的代码,连接同样数量(10万)的设备(设备和代码之间通过NIO有大量的数据交互),在Linux下CPU利用率只有20%~30%,而在windows下却一直高于80%。

2. 原因初步排查

通过jconsole分别监控运行在linux和windows上的程序,在【线程】选项卡中发现,windows下启动了大量未命名线程,堆栈信息都类似于下图:
NIO在Windows下占用大量CPU资源原因详解_第1张图片

通过Java自带的Jstack将Java程序对应进程的内存信息导出,命令如下:

jstack -l 31372 > c:/31372.stack  

说明: 其中31372为该进程的PID。

然后搜索有相同堆栈信息的线程,发现同样的线程启动了97个,通过windows的监控工具Process Explorer(该工具使用可参考这里)可以发现,这些线程每个大约占用0.7%~0.9%的CPU资源,那么这97个线程约占用了69.7%的CPU资源,而Linux并未启动这些线程,这也就可以从宏观上解释windows下CPU利用率比Linux高出60%多的现象了。

3. NIO深度分析

从openJDK下载WindowsSelectorImpl类的源码,可以发现:

final class WindowsSelectorImpl extends SelectorImpl {
    ...
}

该类继承了SelectorImpl类,于是找到SelectorImpl类的源码:

abstract class SelectorImpl extends AbstractSelector {
    ...
}

可以发现,该类继承了AbstractSelector抽象类。

在Eclipse中,可以看到这个类的继承关系:
NIO在Windows下占用大量CPU资源原因详解_第2张图片

即这些类最终的实现类为Selector,在代码中找到使用Selector类的地方:

int n = selector.select(25);

Selector类的select()方法在SelectorImpl类中实现,具体如下:

public int select(long timeout) throws IOException {
    if (timeout < 0)
        throw new IllegalArgumentException("Negative timeout");
    return lockAndDoSelect((timeout == 0) ? -1 : timeout);
}

select()方法调用了lockAndDoSelect()方法,源码如下:

private int lockAndDoSelect(long timeout) throws IOException {
    synchronized (this) {
        if (!isOpen())
            throw new ClosedSelectorException();
        synchronized (publicKeys) {
            synchronized (publicSelectedKeys) {
                return doSelect(timeout);
            }
        }
    }
}

lockAndDoSelect()方法调用了doSelect()方法,而doSelect()方法在SelectorImpl类中是抽象方法。

protected abstract int doSelect(long timeout) throws IOException;

其具体实现与操作系统相关,windwos系统中该方法在WindowsSelectorImpl类中实现,Linux系统中该类在EPollSelectorImpl类中实现。

3.1 Windows下NIO的实现分析

查看WindowsSelectorImpl类的源码,找到doSelect()方法,源码如下:

protected int doSelect(long timeout) throws IOException {
    if (channelArray == null)
        throw new ClosedSelectorException();
    this.timeout = timeout; // set selector timeout
    processDeregisterQueue();
    if (interruptTriggered) {
        resetWakeupSocket();
        return 0;
    }
    // Calculate number of helper threads needed for poll. If necessary
    // threads are created here and start waiting on startLock
    adjustThreadsCount();
    finishLock.reset(); // reset finishLock
    // Wakeup helper threads, waiting on startLock, so they start polling.
    // Redundant threads will exit here after wakeup.
    startLock.startThreads();
    // do polling in the main thread. Main thread is responsible for
    // first MAX_SELECTABLE_FDS entries in pollArray.
    try {
        begin();
        try {
            subSelector.poll();
        } catch (IOException e) {
            finishLock.setException(e); // Save this exception
        }
        // Main thread is out of poll(). Wakeup others and wait for them
        if (threads.size() > 0)
            finishLock.waitForHelperThreads();
      } finally {
          end();
      }
    // Done with poll(). Set wakeupSocket to nonsignaled  for the next run.
    finishLock.checkForException();
    processDeregisterQueue();
    int updated = updateSelectedKeys();
    // Done with poll(). Set wakeupSocket to nonsignaled  for the next run.
    resetWakeupSocket();
    return updated;
}

该方法中,startLock是为了在需要时启动辅助线程,运行完一次即阻塞辅助线程,到下一次需要时再启动。finishLock是为了让主线程等到所有辅助线程运行完才一起返回。
该方法中最重要的三个步骤分别为第12行、第16行和第22行,下面分别进行详细分析。

3.1.1 调整辅助线程

从WindowsSelectorImpl类的源码中可以找到adjustThreadsCount()方法的实现如下:

// After some channels registered/deregistered, the number of required
// helper threads may have changed. Adjust this number.
private void adjustThreadsCount() {
    if (threadsCount > threads.size()) {
        // More threads needed. Start more threads.
        for (int i = threads.size(); i < threadsCount; i++) {
            SelectThread newThread = new SelectThread(i);
            threads.add(newThread);
            newThread.setDaemon(true);
            newThread.start();
        }
    } else if (threadsCount < threads.size()) {
        // Some threads become redundant. Remove them from the threads List.
        for (int i = threads.size() - 1 ; i >= threadsCount; i--)
            threads.remove(i).makeZombie();
    }
}

从注释可以看到,该方法在NIO中channel注册或者注销之后,对辅助线程的数量进行调整。其中threads.size()为当前辅助线程的数量,threadsCount为需要的辅助线程的数量。如果当前的数量小于需要的数量时,创建新的辅助线程,以达到需要的数量。如果当前的数量大于需要的数量,则杀掉多余的线程。
该方法逻辑简单,但是需要深究一下这个需要的辅助线程数量threadsCount是如何计算的。通过查找发现,该变量在channel注册时,implRegister()方法调用growIfNeeded()方法来对其增加;当channel注销时,implDereg()方法对其减少。其中常量MAX_SELECTABLE_FDS=1024,即每增加1024个channel,就会增加一个辅助线程;每减少1024个channel,就会减少一个辅助线程。

protected void implRegister(SelectionKeyImpl ski) {
    synchronized (closeLock) {
        if (pollWrapper == null)
            throw new ClosedSelectorException();
        growIfNeeded();
        channelArray[totalChannels] = ski;
        ski.setIndex(totalChannels);
        fdMap.put(ski);
        keys.add(ski);
        pollWrapper.addEntry(totalChannels, ski);
        totalChannels++;
    }
}

private void growIfNeeded() {
    if (channelArray.length == totalChannels) {
        int newSize = totalChannels * 2; // Make a larger array
        SelectionKeyImpl temp[] = new SelectionKeyImpl[newSize];
        System.arraycopy(channelArray, 1, temp, 1, totalChannels - 1);
        channelArray = temp;
        pollWrapper.grow(newSize);
    }
    if (totalChannels % MAX_SELECTABLE_FDS == 0) { // more threads needed
        pollWrapper.addWakeupSocket(wakeupSourceFd, totalChannels);
        totalChannels++;
        threadsCount++;
    }
}

protected void implDereg(SelectionKeyImpl ski) throws IOException{
    int i = ski.getIndex();
    assert (i >= 0);
    if (i != totalChannels - 1) {
        // Copy end one over it
        SelectionKeyImpl endChannel = channelArray[totalChannels-1];
        channelArray[i] = endChannel;
        endChannel.setIndex(i);
        pollWrapper.replaceEntry(pollWrapper, totalChannels - 1,
                                                            pollWrapper, i);
    }
    channelArray[totalChannels - 1] = null;
    totalChannels--;
    ski.setIndex(-1);
    if ( totalChannels != 1 && totalChannels % MAX_SELECTABLE_FDS == 1) {
        totalChannels--;
        threadsCount--; // The last thread has become redundant.
    }
    fdMap.remove(ski); // Remove the key from fdMap, keys and selectedKeys
    keys.remove(ski);
    selectedKeys.remove(ski);
    deregister(ski);
    SelectableChannel selch = ski.channel();
    if (!selch.isOpen() && !selch.isRegistered())
        ((SelChImpl)selch).kill();
}

从上面分析可知,当channel注册或者注销时,会增加或者减少辅助线程,那这个辅助线程SelectThread所完成的工作是什么呢?
SelectThread类为WindowsSelectorImpl类的一个内部类,具体如下:

// Represents a helper thread used for select.
private final class SelectThread extends Thread {
    private final int index; // index of this thread
    final SubSelector subSelector;
    private long lastRun = 0; // last run number
    private volatile boolean zombie;
    // Creates a new thread
    private SelectThread(int i) {
        this.index = i;
        this.subSelector = new SubSelector(i);
        //make sure we wait for next round of poll
        this.lastRun = startLock.runsCounter;
    }
    void makeZombie() {
        zombie = true;
    }
    boolean isZombie() {
        return zombie;
    }
    public void run() {
        while (true) { // poll loop
            // wait for the start of poll. If this thread has become
            // redundant, then exit.
            if (startLock.waitForStart(this))
                return;
            // call poll()
            try {
                subSelector.poll(index);
            } catch (IOException e) {
                // Save this exception and let other threads finish.
                finishLock.setException(e);
            }
            // notify main thread, that this thread has finished, and
            // wakeup others, if this thread is the first to finish.
            finishLock.threadFinished();
        }
    }
}

从代码中可以看出,该线程run()方法里,主要逻辑是当获得startLock时,执行subSelector.poll(index),这个操作与doSelect()方法中第22行基本一样,稍后一起说明。

小结: 到此,调整辅助线程这个步骤基本解释完了,其主要逻辑是根据channel数量来调整辅助线程的数据,而辅助线程的主要工作就是subSelector.poll(index)。

3.1.2 启动辅助线程

doSelect()方法第16行:

startLock.startThreads();

主要作用是启动辅助线程(具体可查看StartLock类的代码,这里不作详细解释),从上面分析可知,SelectThread需要获得startLock才执行。启动辅助线程后,这些线程主要操作就是subSelector.poll(index),这个操作与doSelect()方法中第22行基本一样,稍后一起说明。

3.1.3 poll操作

doSelect()方法第22行:

subSelector.poll();

的操作与SelectThread的工作基本一致,区别在于这里没有参数,而SelectThread中调用的poll()方法有index参数。
首先找到SubSelector类,具体如下:

private final class SubSelector {
        private final int pollArrayIndex; // starting index in pollArray to poll
        // These arrays will hold result of native select().
        // The first element of each array is the number of selected sockets.
        // Other elements are file descriptors of selected sockets.
        private final int[] readFds = new int [MAX_SELECTABLE_FDS + 1];
        private final int[] writeFds = new int [MAX_SELECTABLE_FDS + 1];
        private final int[] exceptFds = new int [MAX_SELECTABLE_FDS + 1];

        private SubSelector() {
            this.pollArrayIndex = 0; // main thread
        }

        private SubSelector(int threadIndex) { // helper threads
            this.pollArrayIndex = (threadIndex + 1) * MAX_SELECTABLE_FDS;
        }

        private int poll() throws IOException{ // poll for the main thread
            return poll0(pollWrapper.pollArrayAddress,
                         Math.min(totalChannels, MAX_SELECTABLE_FDS),
                         readFds, writeFds, exceptFds, timeout);
        }

        private int poll(int index) throws IOException {
            // poll for helper threads
            return  poll0(pollWrapper.pollArrayAddress +
                     (pollArrayIndex * PollArrayWrapper.SIZE_POLLFD),
                     Math.min(MAX_SELECTABLE_FDS,
                             totalChannels - (index + 1) * MAX_SELECTABLE_FDS),
                     readFds, writeFds, exceptFds, timeout);
        }

        private native int poll0(long pollAddress, int numfds,
             int[] readFds, int[] writeFds, int[] exceptFds, long timeout);
             
        ......
        ......
}

从SubSelector类的源码可知,不管poll()方法是否带了参数,最后都是调用本地方法poll0(),去检测是否有channel发生了已注册的事件。

3.1.4 总结

通过以上分析,windows下NIO的select()的实现是:主线程只负责检测前1024个channel,其他channel分给相应的子线程SelectThread来完成(每个SelectThread线程负责检测1024个channel)。所以,当有大量连接(channel)时,就会启动大量的SelectThread来协助检测通道事件,导致CPU利用率很高。

3.2 Linux下NIO的实现分析

NIO在windows下占用大量CPU资源的原因找到了,但是Linux下为什么不会有同样的问题,这里就需要分析一下NIO在Linux下的实现了。

上面分析提到过,在Linux系统中,doSelect()方法由EPollSelectorImpl类实现,具体如下:

protected int doSelect(long timeout) throws IOException {
    if (closed)
        throw new ClosedSelectorException();
    processDeregisterQueue();
    try {
        begin();
        pollWrapper.poll(timeout);
    } finally {
        end();
    }
    processDeregisterQueue();
    int numKeysUpdated = updateSelectedKeys();
    if (pollWrapper.interrupted()) {
        // Clear the wakeup pipe
        pollWrapper.putEventOps(pollWrapper.interruptedIndex(), 0);
        synchronized (interruptLock) {
            pollWrapper.clearInterrupted();
            IOUtil.drain(fd0);
            interruptTriggered = false;
        }
    }
    return numKeysUpdated;
}

从源码中可以看出,该方法中主要操作就是第7行中的pollWrapper.poll(timeout),下面找到EPollArrayWrapper类中对poll()方法的实现,具体如下:

int poll(long timeout) throws IOException {
    updateRegistrations();
    updated = epollWait(pollArrayAddress, NUM_EPOLLEVENTS, timeout, epfd);
    for (int i=0; i

可以看到,poll()方法主要操作是调用本地方法epollWait()来获取是否有通道事件。

private native int epollWait(long pollAddress, int numfds, long timeout, int epfd) throws IOException;

总结: 通过以上分析可知,Linux对select()的实现并未启动多个线程来检测channel,所以不会出现占用大量CPU资源的情况。

4. 结语

以上分析中看到的Windows下和Linux下NIO中select()实现的巨大差异,归根结底是因为windows和linux中网络IO模型不同:linux通过epoll实现,windows通过select实现。当然,这里分析也没有非常彻底,还可以继续探究上面分析中提到的本地方法,从而深入到系统所使用的网络IO模型,这才是问题真正的根源。以上内容如有错误之处,敬请各位指正。

5. 参考

http://www.code-sea.com/?p=97
https://blog.csdn.net/yzq234040228/article/details/44493863
https://blog.csdn.net/panxj856856/article/details/80432669

你可能感兴趣的:(Java)