在上一篇文章Hadoop源码分析之IPC机制中分析了Hadoop IPC中客户端与服务器端交互示例,现在从服务器端的角度来分析服务器端的初始化及启动过程
在org.apache.hadoop.ipc包中有如下几个Java文件:
这篇文章主要分析与服务器相关的代码,主要的代码在Server.java和RPC.java这两个文件中。
在示例的服务器端代码中有这么一行代码
Server server = RPC.getServer(queryService, "0.0.0.0", IPC_PORT, 1, true, conf);
这行代码即执行了Server的初始化,在服务器端创建了一个Server对象。Server.getServer()共有三个重载方法
getServer()方法有三个重载方法,分别为:
/** Construct a server for a protocol implementation instance listening on a
* port and address.
* @param instance 所要调用接口的实例,即IPC方法调用的目标
* @param bindAddress 服务器端绑定的IP地址
* @param port 监听端口
* @param conf 配置Server对象的配置参数
* */
public static Server getServer(final Object instance, final String bindAddress, final int port, Configuration conf)
throws IOException {
return getServer(instance, bindAddress, port, 1, false, conf);
}
/** Construct a server for a protocol implementation instance listening on a
* port and address.
* @see #getServer(Object, String, int, Configuration)
* @param numHandlers 处理器线程的个数,即Server端的Handler实例(线程)的个数
* @param verbose 是否打开调用方法日志
* */
public static Server getServer(final Object instance, final String bindAddress, final int port,
final int numHandlers,
final boolean verbose, Configuration conf)
throws IOException {
return getServer(instance, bindAddress, port, numHandlers, verbose, conf, null);
}
/** Construct a server for a protocol implementation instance listening on a
* port and address, with a secret manager.
* @see #getServer(Object, String, int, int, boolean, Configuration)
* @param secretManager
*/
public static Server getServer(final Object instance, final String bindAddress, final int port,
final int numHandlers,
final boolean verbose, Configuration conf,
SecretManager extends TokenIdentifier> secretManager)
throws IOException {
return new Server(instance, conf, bindAddress, port, numHandlers, verbose, secretManager);
}
上面两个方法都是调用的重载方法,在最下面第三个方法里创建Server对象。在RPC.java和Server.java中都有Server类,其中org.apache.hadoop.ipc.Server是一个抽象类,RPC.Server的内部类org.apache.hadoop.RPC.Server继承自org.apache.hadoop.ipc.Server类,
为何要如此设计?
RPC.getServer()方法直接调用了RPC.Server类的构造方法创建Server对象,该类的部分代码如下:
public static class Server extends org.apache.hadoop.ipc.Server {
private Object instance;
private boolean verbose;
/** Construct an RPC server.
* @param instance the instance whose methods will be called
* @param conf the configuration to use
* @param bindAddress the address to bind on to listen for connection
* @param port the port to listen for connections on
*/
public Server(Object instance, Configuration conf, String bindAddress, int port)
throws IOException {
this(instance, conf, bindAddress, port, 1, false, null);
}
private static String classNameBase(String className) {
String[] names = className.split("\\.", -1);
if (names == null || names.length == 0) {
return className;
}
return names[names.length-1];
}
/** Construct an RPC server.创建一个RPC服务器
* @param instance the instance whose methods will be called
* @param conf the configuration to use
* @param bindAddress the address to bind on to listen for connection
* @param port the port to listen for connections on
* @param numHandlers the number of method handler threads to run
* @param verbose whether each call should be logged,是否打开调用方法日志
*/
public Server(Object instance, Configuration conf, String bindAddress, int port,
int numHandlers, boolean verbose,
SecretManager extends TokenIdentifier> secretManager)
throws IOException {
super(bindAddress, port, Invocation.class, numHandlers, conf,
classNameBase(instance.getClass().getName()), secretManager);
this.instance = instance;
this.verbose = verbose;
}
}
这个类定义在org.apache.hadoop.ipc.RPC.java文件中,它继承自类org.apache.hadoop.ipc.Server这个抽象类。RPC.Server类有两个构造方法,第一个参数少的构造方法调用第二个参数多的构造方法,在第二个构造方法中调用了父类(org.apache.hadoop.ipc.Server)的构造函数,然后为成员变量instance和verbose赋值,其中instance变量表示所要调用接口的实例,即IPC方法调用的目标(记住这个比较重要),verbose表示是否打开调用方法日志,即是否将方法调用记录到日志中。再看org.apache.hadoop.ipc.Server类,这个类有五个内部类,分别为Call,Connection,Listener,Responser和Handler,其中Listener类的作用是监听客户端的连接请求,并与客户端进行连接;Responser的作用是返回调用结果给客户端;Handler用于处理客户端的调用信息;Connection为服务器端与客户端的连接类,保存了连接中与服务器端连接相关的信息和方法,由于服务器端和客户端对连接的抽象不太一样,所以客户端也有一个Client.Connection类用于保存与连接中与客户端相关的类;Call表示远程调用,保存了与调用相关的信息,与Connection类似,服务器端和客户端各有一个Call类。服务器端这些类相互配合完成,调用相关的操作。那么服务器端的调用过程到底是一个怎样的过程呢?简单的说是这样的:服务器端启动后,创建了一个Listener线程,多个Handler线程和一个Responser线程,Listener线程监听到客户端发过来连接请求,然后交给Listener中的Reader线程处理,Reader线程读取客户端发过来的调用数据,构造成一个Server.Call对象放入调用队列中。Handler线程在运行过程中从调用队列中取出调用对象,然后进行处理,处理完的结果通过Responser线程返回。
这个类中也有3个构造函数,也是参数少的函数调用参数多的构造函数,下面只看最终调用的这个构造函数,函数代码如下:
/** Constructs a server listening on the named port and address. Parameters passed must
* be of the named class. The handlerCount determines
* the number of handler threads that will be used to process calls.
* @param bindAddress 服务器绑定所监听的IP地址,0.0.0.0表示监听所有地址
* @param port 监听端口
* @param paramClass 调用句柄,包含方法名和方法参数,默认时RPC.Invocation
* @param handlerCount Handler处理器的数量
* @param conf 配置Server对象的配置参数
* @param serverName 创建Server对象时给Server命名
* @param secretManager
*/
@SuppressWarnings("unchecked")
protected Server(String bindAddress, int port,
Class extends Writable> paramClass, int handlerCount,
Configuration conf, String serverName, SecretManager extends TokenIdentifier> secretManager)
throws IOException {
this.bindAddress = bindAddress;
this.conf = conf;
this.port = port;
this.paramClass = paramClass;
this.handlerCount = handlerCount;
this.socketSendBufferSize = 0;//?
this.maxQueueSize = handlerCount * conf.getInt(
IPC_SERVER_HANDLER_QUEUE_SIZE_KEY,
IPC_SERVER_HANDLER_QUEUE_SIZE_DEFAULT);
this.maxRespSize = conf.getInt(IPC_SERVER_RPC_MAX_RESPONSE_SIZE_KEY,
IPC_SERVER_RPC_MAX_RESPONSE_SIZE_DEFAULT);
this.readThreads = conf.getInt(
IPC_SERVER_RPC_READ_THREADS_KEY,
IPC_SERVER_RPC_READ_THREADS_DEFAULT);
//调用队列
this.callQueue = new LinkedBlockingQueue(maxQueueSize);
this.maxIdleTime = 2*conf.getInt("ipc.client.connection.maxidletime", 1000);
this.maxConnectionsToNuke = conf.getInt("ipc.client.kill.max", 10);
this.thresholdIdleConnections = conf.getInt("ipc.client.idlethreshold", 4000);
this.secretManager = (SecretManager) secretManager;
this.authorize =
conf.getBoolean(HADOOP_SECURITY_AUTHORIZATION, false);
this.isSecurityEnabled = UserGroupInformation.isSecurityEnabled();
// Start the listener here and let it bind to the port
listener = new Listener();
this.port = listener.getAddress().getPort();
this.rpcMetrics = RpcInstrumentation.create(serverName, this.port);
this.tcpNoDelay = conf.getBoolean("ipc.server.tcpnodelay", false);
// Create the responder here
responder = new Responder();
if (isSecurityEnabled) {
SaslRpcServer.init(conf);
}
}
首先是给各个成员变量赋值,然后是创建Listener对象listener及Responser对象responser。
首先看Listener类的初始化代码:
public Listener() throws IOException {
address = new InetSocketAddress(bindAddress, port);
// Create a new server socket and set to non blocking mode
acceptChannel = ServerSocketChannel.open();
acceptChannel.configureBlocking(false);//设置为非阻塞
// Bind the server socket to the local host and port
bind(acceptChannel.socket(), address, backlogLength);
port = acceptChannel.socket().getLocalPort(); //Could be an ephemeral port,因为在ServerSocketChannel的bind方法的address为空时,
//使用 an ephemeral port and a valid local address to bind the socket,所以port可能会改变
// create a selector;
selector= Selector.open();
readers = new Reader[readThreads];//?
readPool = Executors.newFixedThreadPool(readThreads);
for (int i = 0; i < readThreads; i++) {
Selector readSelector = Selector.open();
Reader reader = new Reader(readSelector);
readers[i] = reader;
readPool.execute(reader);
}
// Register accepts on the server socket with the selector.
acceptChannel.register(selector, SelectionKey.OP_ACCEPT);
this.setName("IPC Server listener on " + port);
this.setDaemon(true);
}
Listener使用NIO中的ServerSocketChannel来与客户端进行网络通信,关于NIO的知识请参考:http://blog.csdn.net/workformywork/article/details/17024743。在Listener的构造函数中,语句bind(acceptChannel.socket(), address, backlogLength);的作用是将服务器端的接收客户端连接请求的socket与IP地址和端口绑定,绑定成功后会再将绑定的端口赋值给port成员变量,因为在ServerSocketChannel的bind方法的address为空时,会使用一个临时的端口和本地地址来与socket进行绑定,所以最终与socket绑定的端口并不一定是传入参数所指定的端口值。绑定成功之后再创建一个Reader数组,Reader是Listener类的内部类,用于读取客户端传过来的数据(方法调用也是传递的序列化的数据),每一个Reader对象对应一个线程,但是每个Reader中都对应一个Selector,Selector本来就可以监听多个通道(Channel)为什么使用多个线程呢?这点还没看明白,希望知道其中缘由的朋友给个提示,谢谢。最后设置Listener线程的线程名,并且将该线程设置为守护线程。
Listener对象创建完之后,再回到Server的构造函数,进行Responder对象的创建,Responder的构造函数更简单:
Responder() throws IOException {
this.setName("IPC Server Responder");
this.setDaemon(true);
writeSelector = Selector.open(); // create a selector
pending = 0;
}
设置线程名,设置线程为守护线程,打开一个Selector,然后设置等待Connection数量为0。执行Responder对象的创建之后,就完成了Server端的初始化,及RPC.getServer()过程结束
Hadoop IPC中的启动过程由Server.java文件中的Server.start()开始,在start()方法中启动了一个Responder线程,一个Listener线程,handlerCount个Handler线程,其中handlerCount参数在构造Server对象的时候构造参数中指定,Server.start()方法代码如下:
public synchronized void start() {
responder.start();
listener.start();
handlers = new Handler[handlerCount];
for (int i = 0; i < handlerCount; i++) {
handlers[i] = new Handler(i);
handlers[i].start();
}
}
Handler类的构造方法比较简单,设置Handler线程为守护线程并设置线程名。
EOF
《Hadoop技术内幕:深入理解Hadoop Common和HDFS架构设计与实现原理》,