在项目性能测试过程中发现,同样的代码,连接同样数量(10万)的设备(设备和代码之间通过NIO有大量的数据交互),在Linux下CPU利用率只有20%~30%,而在windows下却一直高于80%。
通过jconsole分别监控运行在linux和windows上的程序,在【线程】选项卡中发现,windows下启动了大量未命名线程,堆栈信息都类似于下图:
通过Java自带的Jstack将Java程序对应进程的内存信息导出,命令如下:
jstack -l 31372 > c:/31372.stack
说明: 其中31372为该进程的PID。
然后搜索有相同堆栈信息的线程,发现同样的线程启动了97个,通过windows的监控工具Process Explorer(该工具使用可参考这里)可以发现,这些线程每个大约占用0.7%~0.9%的CPU资源,那么这97个线程约占用了69.7%的CPU资源,而Linux并未启动这些线程,这也就可以从宏观上解释windows下CPU利用率比Linux高出60%多的现象了。
从openJDK下载WindowsSelectorImpl类的源码,可以发现:
final class WindowsSelectorImpl extends SelectorImpl {
...
}
该类继承了SelectorImpl类,于是找到SelectorImpl类的源码:
abstract class SelectorImpl extends AbstractSelector {
...
}
可以发现,该类继承了AbstractSelector抽象类。
即这些类最终的实现类为Selector,在代码中找到使用Selector类的地方:
int n = selector.select(25);
Selector类的select()方法在SelectorImpl类中实现,具体如下:
public int select(long timeout) throws IOException {
if (timeout < 0)
throw new IllegalArgumentException("Negative timeout");
return lockAndDoSelect((timeout == 0) ? -1 : timeout);
}
select()方法调用了lockAndDoSelect()方法,源码如下:
private int lockAndDoSelect(long timeout) throws IOException {
synchronized (this) {
if (!isOpen())
throw new ClosedSelectorException();
synchronized (publicKeys) {
synchronized (publicSelectedKeys) {
return doSelect(timeout);
}
}
}
}
lockAndDoSelect()方法调用了doSelect()方法,而doSelect()方法在SelectorImpl类中是抽象方法。
protected abstract int doSelect(long timeout) throws IOException;
其具体实现与操作系统相关,windwos系统中该方法在WindowsSelectorImpl类中实现,Linux系统中该类在EPollSelectorImpl类中实现。
查看WindowsSelectorImpl类的源码,找到doSelect()方法,源码如下:
protected int doSelect(long timeout) throws IOException {
if (channelArray == null)
throw new ClosedSelectorException();
this.timeout = timeout; // set selector timeout
processDeregisterQueue();
if (interruptTriggered) {
resetWakeupSocket();
return 0;
}
// Calculate number of helper threads needed for poll. If necessary
// threads are created here and start waiting on startLock
adjustThreadsCount();
finishLock.reset(); // reset finishLock
// Wakeup helper threads, waiting on startLock, so they start polling.
// Redundant threads will exit here after wakeup.
startLock.startThreads();
// do polling in the main thread. Main thread is responsible for
// first MAX_SELECTABLE_FDS entries in pollArray.
try {
begin();
try {
subSelector.poll();
} catch (IOException e) {
finishLock.setException(e); // Save this exception
}
// Main thread is out of poll(). Wakeup others and wait for them
if (threads.size() > 0)
finishLock.waitForHelperThreads();
} finally {
end();
}
// Done with poll(). Set wakeupSocket to nonsignaled for the next run.
finishLock.checkForException();
processDeregisterQueue();
int updated = updateSelectedKeys();
// Done with poll(). Set wakeupSocket to nonsignaled for the next run.
resetWakeupSocket();
return updated;
}
该方法中,startLock是为了在需要时启动辅助线程,运行完一次即阻塞辅助线程,到下一次需要时再启动。finishLock是为了让主线程等到所有辅助线程运行完才一起返回。
该方法中最重要的三个步骤分别为第12行、第16行和第22行,下面分别进行详细分析。
从WindowsSelectorImpl类的源码中可以找到adjustThreadsCount()方法的实现如下:
// After some channels registered/deregistered, the number of required
// helper threads may have changed. Adjust this number.
private void adjustThreadsCount() {
if (threadsCount > threads.size()) {
// More threads needed. Start more threads.
for (int i = threads.size(); i < threadsCount; i++) {
SelectThread newThread = new SelectThread(i);
threads.add(newThread);
newThread.setDaemon(true);
newThread.start();
}
} else if (threadsCount < threads.size()) {
// Some threads become redundant. Remove them from the threads List.
for (int i = threads.size() - 1 ; i >= threadsCount; i--)
threads.remove(i).makeZombie();
}
}
从注释可以看到,该方法在NIO中channel注册或者注销之后,对辅助线程的数量进行调整。其中threads.size()为当前辅助线程的数量,threadsCount为需要的辅助线程的数量。如果当前的数量小于需要的数量时,创建新的辅助线程,以达到需要的数量。如果当前的数量大于需要的数量,则杀掉多余的线程。
该方法逻辑简单,但是需要深究一下这个需要的辅助线程数量threadsCount是如何计算的。通过查找发现,该变量在channel注册时,implRegister()方法调用growIfNeeded()方法来对其增加;当channel注销时,implDereg()方法对其减少。其中常量MAX_SELECTABLE_FDS=1024,即每增加1024个channel,就会增加一个辅助线程;每减少1024个channel,就会减少一个辅助线程。
protected void implRegister(SelectionKeyImpl ski) {
synchronized (closeLock) {
if (pollWrapper == null)
throw new ClosedSelectorException();
growIfNeeded();
channelArray[totalChannels] = ski;
ski.setIndex(totalChannels);
fdMap.put(ski);
keys.add(ski);
pollWrapper.addEntry(totalChannels, ski);
totalChannels++;
}
}
private void growIfNeeded() {
if (channelArray.length == totalChannels) {
int newSize = totalChannels * 2; // Make a larger array
SelectionKeyImpl temp[] = new SelectionKeyImpl[newSize];
System.arraycopy(channelArray, 1, temp, 1, totalChannels - 1);
channelArray = temp;
pollWrapper.grow(newSize);
}
if (totalChannels % MAX_SELECTABLE_FDS == 0) { // more threads needed
pollWrapper.addWakeupSocket(wakeupSourceFd, totalChannels);
totalChannels++;
threadsCount++;
}
}
protected void implDereg(SelectionKeyImpl ski) throws IOException{
int i = ski.getIndex();
assert (i >= 0);
if (i != totalChannels - 1) {
// Copy end one over it
SelectionKeyImpl endChannel = channelArray[totalChannels-1];
channelArray[i] = endChannel;
endChannel.setIndex(i);
pollWrapper.replaceEntry(pollWrapper, totalChannels - 1,
pollWrapper, i);
}
channelArray[totalChannels - 1] = null;
totalChannels--;
ski.setIndex(-1);
if ( totalChannels != 1 && totalChannels % MAX_SELECTABLE_FDS == 1) {
totalChannels--;
threadsCount--; // The last thread has become redundant.
}
fdMap.remove(ski); // Remove the key from fdMap, keys and selectedKeys
keys.remove(ski);
selectedKeys.remove(ski);
deregister(ski);
SelectableChannel selch = ski.channel();
if (!selch.isOpen() && !selch.isRegistered())
((SelChImpl)selch).kill();
}
从上面分析可知,当channel注册或者注销时,会增加或者减少辅助线程,那这个辅助线程SelectThread所完成的工作是什么呢?
SelectThread类为WindowsSelectorImpl类的一个内部类,具体如下:
// Represents a helper thread used for select.
private final class SelectThread extends Thread {
private final int index; // index of this thread
final SubSelector subSelector;
private long lastRun = 0; // last run number
private volatile boolean zombie;
// Creates a new thread
private SelectThread(int i) {
this.index = i;
this.subSelector = new SubSelector(i);
//make sure we wait for next round of poll
this.lastRun = startLock.runsCounter;
}
void makeZombie() {
zombie = true;
}
boolean isZombie() {
return zombie;
}
public void run() {
while (true) { // poll loop
// wait for the start of poll. If this thread has become
// redundant, then exit.
if (startLock.waitForStart(this))
return;
// call poll()
try {
subSelector.poll(index);
} catch (IOException e) {
// Save this exception and let other threads finish.
finishLock.setException(e);
}
// notify main thread, that this thread has finished, and
// wakeup others, if this thread is the first to finish.
finishLock.threadFinished();
}
}
}
从代码中可以看出,该线程run()方法里,主要逻辑是当获得startLock时,执行subSelector.poll(index),这个操作与doSelect()方法中第22行基本一样,稍后一起说明。
小结: 到此,调整辅助线程这个步骤基本解释完了,其主要逻辑是根据channel数量来调整辅助线程的数据,而辅助线程的主要工作就是subSelector.poll(index)。
doSelect()方法第16行:
startLock.startThreads();
主要作用是启动辅助线程(具体可查看StartLock类的代码,这里不作详细解释),从上面分析可知,SelectThread需要获得startLock才执行。启动辅助线程后,这些线程主要操作就是subSelector.poll(index),这个操作与doSelect()方法中第22行基本一样,稍后一起说明。
doSelect()方法第22行:
subSelector.poll();
的操作与SelectThread的工作基本一致,区别在于这里没有参数,而SelectThread中调用的poll()方法有index参数。
首先找到SubSelector类,具体如下:
private final class SubSelector {
private final int pollArrayIndex; // starting index in pollArray to poll
// These arrays will hold result of native select().
// The first element of each array is the number of selected sockets.
// Other elements are file descriptors of selected sockets.
private final int[] readFds = new int [MAX_SELECTABLE_FDS + 1];
private final int[] writeFds = new int [MAX_SELECTABLE_FDS + 1];
private final int[] exceptFds = new int [MAX_SELECTABLE_FDS + 1];
private SubSelector() {
this.pollArrayIndex = 0; // main thread
}
private SubSelector(int threadIndex) { // helper threads
this.pollArrayIndex = (threadIndex + 1) * MAX_SELECTABLE_FDS;
}
private int poll() throws IOException{ // poll for the main thread
return poll0(pollWrapper.pollArrayAddress,
Math.min(totalChannels, MAX_SELECTABLE_FDS),
readFds, writeFds, exceptFds, timeout);
}
private int poll(int index) throws IOException {
// poll for helper threads
return poll0(pollWrapper.pollArrayAddress +
(pollArrayIndex * PollArrayWrapper.SIZE_POLLFD),
Math.min(MAX_SELECTABLE_FDS,
totalChannels - (index + 1) * MAX_SELECTABLE_FDS),
readFds, writeFds, exceptFds, timeout);
}
private native int poll0(long pollAddress, int numfds,
int[] readFds, int[] writeFds, int[] exceptFds, long timeout);
......
......
}
从SubSelector类的源码可知,不管poll()方法是否带了参数,最后都是调用本地方法poll0(),去检测是否有channel发生了已注册的事件。
通过以上分析,windows下NIO的select()的实现是:主线程只负责检测前1024个channel,其他channel分给相应的子线程SelectThread来完成(每个SelectThread线程负责检测1024个channel)。所以,当有大量连接(channel)时,就会启动大量的SelectThread来协助检测通道事件,导致CPU利用率很高。
NIO在windows下占用大量CPU资源的原因找到了,但是Linux下为什么不会有同样的问题,这里就需要分析一下NIO在Linux下的实现了。
上面分析提到过,在Linux系统中,doSelect()方法由EPollSelectorImpl类实现,具体如下:
protected int doSelect(long timeout) throws IOException {
if (closed)
throw new ClosedSelectorException();
processDeregisterQueue();
try {
begin();
pollWrapper.poll(timeout);
} finally {
end();
}
processDeregisterQueue();
int numKeysUpdated = updateSelectedKeys();
if (pollWrapper.interrupted()) {
// Clear the wakeup pipe
pollWrapper.putEventOps(pollWrapper.interruptedIndex(), 0);
synchronized (interruptLock) {
pollWrapper.clearInterrupted();
IOUtil.drain(fd0);
interruptTriggered = false;
}
}
return numKeysUpdated;
}
从源码中可以看出,该方法中主要操作就是第7行中的pollWrapper.poll(timeout),下面找到EPollArrayWrapper类中对poll()方法的实现,具体如下:
int poll(long timeout) throws IOException {
updateRegistrations();
updated = epollWait(pollArrayAddress, NUM_EPOLLEVENTS, timeout, epfd);
for (int i=0; i
可以看到,poll()方法主要操作是调用本地方法epollWait()来获取是否有通道事件。
private native int epollWait(long pollAddress, int numfds, long timeout, int epfd) throws IOException;
总结: 通过以上分析可知,Linux对select()的实现并未启动多个线程来检测channel,所以不会出现占用大量CPU资源的情况。
以上分析中看到的Windows下和Linux下NIO中select()实现的巨大差异,归根结底是因为windows和linux中网络IO模型不同:linux通过epoll实现,windows通过select实现。当然,这里分析也没有非常彻底,还可以继续探究上面分析中提到的本地方法,从而深入到系统所使用的网络IO模型,这才是问题真正的根源。以上内容如有错误之处,敬请各位指正。
http://www.code-sea.com/?p=97
https://blog.csdn.net/yzq234040228/article/details/44493863
https://blog.csdn.net/panxj856856/article/details/80432669