一、背景
最近在研究netty的源代码,今天发表一篇关于netty的线程框架--Reactor线程模型,作为最近研究成果。如果有还不了解Reactor模型请自行百度,网上有很多关于Reactor模式。
研究netty的时候,先看了下《netty权威指南》,里面讲解不错,从原理到源码均有介绍,那为什么要写本篇博客呢?《netty权威指南》在介绍线程模型时候,介绍不够细腻,流程没有打通。我个人认为,这部分是基石,只要把这部分搞清楚,对后面Channel和Pipe流水线处理就可游刃有余了。此次分析Netty是基于5.0版本
今天以《netty权威指南》中TimeServer实例进行分析,具体实现方法(核心)如下:
public void bind(int port) throws Exception {
EventLoopGroup bossGroup = new NioEventLoopGroup();
EventLoopGroup workerGroup = new NioEventLoopGroup();
try {
ServerBootstrap b = new ServerBootstrap();
b.group(bossGroup, workerGroup)
.channel(NioServerSocketChannel.class)
.option(ChannelOption.SO_BACKLOG, 1024)
.childHandler(new ChildChannelHandler());
ChannelFuture f = b.bind(port).sync();
f.channel().closeFuture().sync();
} finally {
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
}
}
private class ChildChannelHandler extends ChannelInitializer {
protected void initChannel(SocketChannel ch) throws Exception {
// 将监听事件 注册到ChannelPipe流水线中 放到链表中 也可以注册多个监听事件 可以指定名字如果没有名字 会自动生成
ch.pipeline().addLast("GetTime", new TimeServerHandler());
}
}
通过上面的代码可知,最重要的两个类是:NioEventLoopGroup和ServerBootstrap(如果是客户端则是Bootstrap),下面是这两个类的UML类图:
二、NioEventLoopGroup线程组
NioEventLoopGroup类主要工作是,创建一个线程池。上述代码中创建了两个EventLoop,一个是boosGroup,主要是用于监听,另外一个是workerGroup主要用于C/S通信。这两个线程池是实现Reactor线程模型的基础。接下来分析按照uml类图关系进行介绍,从下往上开始。
NioEventLoopGroup类代码较少,其中最重要的方式就是下面。这个方法是父类MultithreadEventLoopGroup定义的抽象方法,此方法主要用XXXX,是一个线程,后面会看到调用的地方。
@Override
protected EventLoop newChild(Executor executor, Object... args) throws Exception {
return new NioEventLoop(this, executor, (SelectorProvider) args[0]);
}
调用无参的NioEventLoopGroup的构造函数最终会调用,
public NioEventLoopGroup(
int nThreads, ThreadFactory threadFactory, final SelectorProvider selectorProvider) {
super(nThreads, threadFactory, selectorProvider);
}
说一下此处的实参,nThreads是0,threadFactory是null,selectorProvider是调用SelectorProvider.provider()。第三个参数是生成Selector选择器(Java底层网络模型采用的linux epoll模型,而非select模型),最后调用父类的MultithreadEventLoopGroup的构造方法。
protected MultithreadEventExecutorGroup(int nThreads, Executor executor, Object... args) {
if (nThreads <= 0) {
throw new IllegalArgumentException(String.format("nThreads: %d (expected: > 0)", nThreads));
}
if (executor == null) {
executor = new ThreadPerTaskExecutor(newDefaultThreadFactory());
}
children = new EventExecutor[nThreads];
for (int i = 0; i < nThreads; i ++) {
boolean success = false;
try {
children[i] = newChild(executor, args);
success = true;
} catch (Exception e) {
// TODO: Think about if this is a good exception type
throw new IllegalStateException("failed to create a child event loop", e);
} finally {
if (!success) {
for (int j = 0; j < i; j ++) {
children[j].shutdownGracefully();
}
for (int j = 0; j < i; j ++) {
EventExecutor e = children[j];
try {
while (!e.isTerminated()) {
e.awaitTermination(Integer.MAX_VALUE, TimeUnit.SECONDS);
}
} catch (InterruptedException interrupted) {
Thread.currentThread().interrupt();
break;
}
}
}
}
}
final FutureListener
此方法有两点说明:
1) 这个地方的executor一直都是都null,所以在这个地方创建一个默认executor执行器。这个ThreadPerTaskExecutor类中只有一个具体方法,是实现execute方法。这个方法在后面会调用到。
2) 第一个for循环主要是创建线程的。其中方法newChild(),实际调用的是NioEventLoopGroup类中的newChild方法。
三、NioEventLoop线程
下面是NioEventLoop的UML类图。
在NioEventLoop构造方法中,主要做了两件事情:
1、将excutor赋值给父类并且父类创建Task队列。
2、创建selector选择器并且初始胡selectorKey。
在NioEventLoop类中有一个最重要的方法,就是run方法,此方法是一个死循环(除非关闭、异常才退出),这run方法就是用于轮训事件消息,包括accept事件、read事件、write事件。这个方法在初始化NioEventLoopGroup不会调用到(是bind时调用),后面再详细介绍run方法。
三、ServerBootStrap服务启动
通过上面的代码可知,ServerBootStrap需要设置线程池,Channel以及流水线Pipe,设置完这些则调用bind开始监听流程,最终会调用到doBind方法,方法如下:
private ChannelFuture doBind(final SocketAddress localAddress) {
final ChannelFuture regFuture = initAndRegister();
final Channel channel = regFuture.channel();
if (regFuture.cause() != null) {
return regFuture;
}
final ChannelPromise promise;
if (regFuture.isDone()) {
promise = channel.newPromise();
doBind0(regFuture, channel, localAddress, promise);
} else {
// Registration future is almost always fulfilled already, but just in case it's not.
promise = new DefaultChannelPromise(channel, GlobalEventExecutor.INSTANCE);
regFuture.addListener(new ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture future) throws Exception {
doBind0(regFuture, channel, localAddress, promise);
}
});
}
return promise;
}
initAndRegister初始化并注册,此函数中有createChannel和init(channel)
final ChannelFuture initAndRegister() {
Channel channel;
try {
channel = createChannel();
} catch (Throwable t) {
return VoidChannel.INSTANCE.newFailedFuture(t);
}
try {
init(channel);
} catch (Throwable t) {
channel.unsafe().closeForcibly();
return channel.newFailedFuture(t);
}
ChannelPromise regFuture = channel.newPromise();
channel.unsafe().register(regFuture);
if (regFuture.cause() != null) {
if (channel.isRegistered()) {
channel.close();
} else {
channel.unsafe().closeForcibly();
}
}
return regFuture;
}
createChannel方法实现,在类ServerBootstrap中,其中group()是获取bossGroup,next()是从bossGroup线程池中取一个线程,此线程主要用监听socket。newChannel中第二参数childGroup是workerGroup线程池,该线程池主要用于客户端建链成功之后,提供C/S服务线程,这也就是Reactor线程模型。
@Override
Channel createChannel() {
EventLoop eventLoop = group().next();
return channelFactory().newChannel(eventLoop, childGroup);
}
newChannel方法,是通过反射方式动态创建类对象即创建NioServerSocketChannel。
对于init(channel)方法比较简单,主要用于设置options和流水线pipe。
下面是register方法:
public final void register(final ChannelPromise promise) {
if (eventLoop.inEventLoop()) {
register0(promise);
} else {
try {
eventLoop.execute(new Runnable() {
@Override
public void run() {
register0(promise);
}
});
} catch (Throwable t) {
logger.warn(
"Force-closing a channel whose registration task was not accepted by an event loop: {}",
AbstractChannel.this, t);
closeForcibly();
closeFuture.setClosed();
promise.setFailure(t);
}
}
}
该方法第一步判断执行register线程与eventLoop线程是否相同(eventLoop是来自bossGroup,在方法createChannel中设置),第一次肯定不相同,因此当前线程是main线程,所以会进入else分支。eventLoop.execute方法实现在类SingleThreadEventExecutor:
public void execute(Runnable task) {
if (task == null) {
throw new NullPointerException("task");
}
boolean inEventLoop = inEventLoop();
if (inEventLoop) {
addTask(task);
} else {
startThread();
addTask(task);
if (isShutdown() && removeTask(task)) {
reject();
}
}
if (!addTaskWakesUp) {
wakeup(inEventLoop);
}
}
根据上面分析,这个会进入else分支,启动线程并且将task添加到阻塞队列中,启动的线程会从队列中取出task并且执行task。
方法startThread会调用到doStartThread,执行executor.execute接口,此接口的实现方法是类ThreadPerTaskExecutor中execute方法,该方法会调用start方法,将线程激活。下面看一下run方法,这个run方法中最重要的一行代码是:SingleThreadEventExecutor.this.run();第一次调用run接口,该接口实现方法是在NioEventLoop.java中run方法。
当main线程启动子线程-A后,会把task加入到队列中,然后main线程就去执行doBind0方法。而子线程-A启动成功后对从队列中取出这个task并且执行这个task。doBind0方法是由main线程执行,main线程会把doBind0具体操作放到队列中,然后由子线程-A去执行bind操作。至此,main线程所做的事情就结束了,最后会回到main方法中阻塞。
五、子线程-A执行task
子线程执行的task,定义在doStartThread方法中,这段代码最终一行代码就是SingleThreadEventExecutor.this.run();这个是一个接口,那么实现在哪里呢?
private void doStartThread() {
assert thread == null;
executor.execute(new Runnable() {
@Override
public void run() {
thread = Thread.currentThread();
if (interrupted) {
thread.interrupt();
}
boolean success = false;
updateLastExecutionTime();
try {
SingleThreadEventExecutor.this.run();
success = true;
} catch (Throwable t) {
logger.warn("Unexpected exception from an event executor: ", t);
} finally {
if (state < ST_SHUTTING_DOWN) {
state = ST_SHUTTING_DOWN;
}
// Check if confirmShutdown() was called at the end of the loop.
if (success && gracefulShutdownStartTime == 0) {
logger.error("Buggy " + EventExecutor.class.getSimpleName() + " implementation; " +
SingleThreadEventExecutor.class.getSimpleName() + ".confirmShutdown() must be called " +
"before run() implementation terminates.");
}
try {
// Run all remaining tasks and shutdown hooks.
for (;;) {
if (confirmShutdown()) {
break;
}
}
} finally {
try {
cleanup();
} finally {
synchronized (stateLock) {
state = ST_TERMINATED;
}
threadLock.release();
if (!taskQueue.isEmpty()) {
logger.warn(
"An event executor terminated with " +
"non-empty task queue (" + taskQueue.size() + ')');
}
terminationFuture.setSuccess(null);
}
}
}
}
});
}
run的实现方法:NioEventLoop.java中run方法,这里就和前面串起来了。哈哈
protected void run() {
for (;;) {
oldWakenUp = wakenUp.getAndSet(false);
try {
if (hasTasks()) {
selectNow();
} else {
select();
if (wakenUp.get()) {
selector.wakeup();
}
}
cancelledKeys = 0;
final long ioStartTime = System.nanoTime();
needsToSelectAgain = false;
if (selectedKeys != null) {
processSelectedKeysOptimized(selectedKeys.flip());
} else {
processSelectedKeysPlain(selector.selectedKeys());
}
final long ioTime = System.nanoTime() - ioStartTime;
final int ioRatio = this.ioRatio;
runAllTasks(ioTime * (100 - ioRatio) / ioRatio);
if (isShuttingDown()) {
closeAll();
if (confirmShutdown()) {
break;
}
}
} catch (Throwable t) {
logger.warn("Unexpected exception in the selector loop.", t);
// Prevent possible consecutive immediate failures that lead to
// excessive CPU consumption.
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// Ignore.
}
}
}
}
从这个方法看,是一个死循环,主要用于轮训事件,如果有task存在则立即触发select,否则睡眠一段时间,这个和linux select模型类似。接下来是处理SelectKey,默认会进入processSelectedKeysOptimized方法,开始循环遍历,默认进入if分支,下面是processSelectedKey方法,主要内容是三段if判断:
if ((readyOps & (SelectionKey.OP_READ | SelectionKey.OP_ACCEPT)) != 0 || readyOps == 0) {
unsafe.read();
if (!ch.isOpen()) {
// Connection already closed - no need to handle write.
return;
}
}
OP_READ和OP_ACCEPT事件:主要是用于客户端连接、客户端发来的消息。
if ((readyOps & SelectionKey.OP_WRITE) != 0) {
// Call forceFlush which will also take care of clear the OP_WRITE once there is nothing left to write
ch.unsafe().forceFlush();
}
OP_WRITE事件:用于给对端发送消息,当调用flush时候会触发这个。
if ((readyOps & SelectionKey.OP_CONNECT) != 0) {
// remove OP_CONNECT as otherwise Selector.select(..) will always return without blocking
// See https://github.com/netty/netty/issues/924
int ops = k.interestOps();
ops &= ~SelectionKey.OP_CONNECT;
k.interestOps(ops);
unsafe.finishConnect();
}
OP_CONNECT这个是客户端程序会进入,表示tcp连接完成。这个地方需要把OP_CONNECT标志清除掉。
我们着重分析一下Read事件。Reactor线程模式:如果有新的接入,则创建一个新的线程,为新连接服务。那么我们顺着unsafe.read(),去查看在什么地方创建的新线程?这个unsafe.read是一个接口,它的实现有两个:
1)如果是监听线程--NioServerSocketChannel,主要处理客户端接入请求Accept
实现方法在类AbstractNioMessageChannel.java中read()
@Override
public void read() {
assert eventLoop().inEventLoop();
if (!config().isAutoRead()) {
removeReadOp();
}
final ChannelConfig config = config();
final int maxMessagesPerRead = config.getMaxMessagesPerRead();
final boolean autoRead = config.isAutoRead();
final ChannelPipeline pipeline = pipeline();
boolean closed = false;
Throwable exception = null;
try {
for (;;) {
int localRead = doReadMessages(readBuf);
if (localRead == 0) {
break;
}
if (localRead < 0) {
closed = true;
break;
}
if (readBuf.size() >= maxMessagesPerRead | !autoRead) {
break;
}
}
} catch (Throwable t) {
exception = t;
}
int size = readBuf.size();
for (int i = 0; i < size; i ++) {
pipeline.fireChannelRead(readBuf.get(i));
}
readBuf.clear();
pipeline.fireChannelReadComplete();
if (exception != null) {
if (exception instanceof IOException) {
// ServerChannel should not be closed even on IOException because it can often continue
// accepting incoming connections. (e.g. too many open files)
closed = !(AbstractNioMessageChannel.this instanceof ServerChannel);
}
pipeline.fireExceptionCaught(exception);
}
if (closed) {
if (isOpen()) {
close(voidPromise());
}
}
}
这方法中最重要的方法就是doReadMessages()
protected int doReadMessages(List buf) throws Exception {
SocketChannel ch = javaChannel().accept();
try {
if (ch != null) {
buf.add(new NioSocketChannel(this, childEventLoopGroup().next(), ch));
return 1;
}
} catch (Throwable t) {
logger.warn("Failed to create a new channel from an accepted socket.", t);
try {
ch.close();
} catch (Throwable t2) {
logger.warn("Failed to close a socket.", t2);
}
}
return 0;
}
注意:上面的add操作,其中childEventLoopGroup().next(),就是从workGroup中挑选一个线程,这个线程就是服务于客户端与服务端。这个地方就是Reactor线程模型核心之地。
2)如果是服务线程--即与客户端通信线程NioSocketChannel,主要处理对端发送过来的消息
如果是其他的消息(例如客户端正常发送消息)就会进入下面方法:
@Override
public void read() {
final ChannelConfig config = config();
final ChannelPipeline pipeline = pipeline();
final ByteBufAllocator allocator = config.getAllocator();
final int maxMessagesPerRead = config.getMaxMessagesPerRead();
RecvByteBufAllocator.Handle allocHandle = this.allocHandle;
if (allocHandle == null) {
this.allocHandle = allocHandle = config.getRecvByteBufAllocator().newHandle();
}
if (!config.isAutoRead()) {
removeReadOp();
}
ByteBuf byteBuf = null;
int messages = 0;
boolean close = false;
try {
int byteBufCapacity = allocHandle.guess();
int totalReadAmount = 0;
do {
byteBuf = allocator.ioBuffer(byteBufCapacity);
int writable = byteBuf.writableBytes();
int localReadAmount = doReadBytes(byteBuf);
if (localReadAmount <= 0) {
// not was read release the buffer
byteBuf.release();
close = localReadAmount < 0;
break;
}
pipeline.fireChannelRead(byteBuf);
byteBuf = null;
if (totalReadAmount >= Integer.MAX_VALUE - localReadAmount) {
// Avoid overflow.
totalReadAmount = Integer.MAX_VALUE;
break;
}
totalReadAmount += localReadAmount;
if (localReadAmount < writable) {
// Read less than what the buffer can hold,
// which might mean we drained the recv buffer completely.
break;
}
} while (++ messages < maxMessagesPerRead);
pipeline.fireChannelReadComplete();
allocHandle.record(totalReadAmount);
if (close) {
closeOnRead(pipeline);
close = false;
}
} catch (Throwable t) {
handleReadException(pipeline, byteBuf, t, close);
}
}
}
doReadBytes方法是通过socket读取报文,通过fireChannelRead方法将数据传递到handler进行处理。
通过上面两种场景,可以有一个总结:先通过底层socket读取数据,然后触发fireChannelRead事件,当所有数据读完成最后触发fireChannelReadComplete事件。
至此,netty服务启动以及Reactor线程模型源码分析就结束了。后面会介绍Channel以及流水线Pipe。
上面介绍Selector时候,会出现空轮训。什么是空轮训呢?就是本次select操作,没有发生任何事件,这样会造成Selector假死,CPU100%。这个是java epoll模型的bug。因此Netty提供了一个解决方法:重建Selector。就是重新new Selector然后把旧的Selector注册的事件全部移植到新的Selector中,然后重新轮训新的Selector。Netty中设置了一定次数,如果空轮训了N次(代码中有静态变量),就会重建Selector。Netty通过这种间接方式处理java epoll模型bug,不过还是希望java jdk能早日解决这个问题(java 7中仍然没有解决这个问题)。