在上一篇文章Hadoop源码分析之IPC机制中分析了Hadoop IPC中客户端与服务器端交互示例,现在从服务器端的角度来分析服务器端的初始化及启动过程
在org.apache.hadoop.ipc包中有如下几个Java文件:
这篇文章主要分析与服务器相关的代码,主要的代码在Server.java和RPC.java这两个文件中。
在示例的服务器端代码中有这么一行代码
Server server = RPC.getServer(queryService, "0.0.0.0", IPC_PORT, 1, true, conf);
这行代码即执行了Server的初始化,在服务器端创建了一个Server对象。Server.getServer()共有三个重载方法
getServer()方法有三个重载方法,分别为:
/** Construct a server for a protocol implementation instance listening on a * port and address.<br/> * @param instance 所要调用接口的实例,即IPC方法调用的目标 * @param bindAddress 服务器端绑定的IP地址 * @param port 监听端口 * @param conf 配置Server对象的配置参数 * */ public static Server getServer(final Object instance, final String bindAddress, final int port, Configuration conf) throws IOException { return getServer(instance, bindAddress, port, 1, false, conf); } /** Construct a server for a protocol implementation instance listening on a * port and address. * @see #getServer(Object, String, int, Configuration) * @param numHandlers 处理器线程的个数,即Server端的Handler实例(线程)的个数 * @param verbose 是否打开调用方法日志 * */ public static Server getServer(final Object instance, final String bindAddress, final int port, final int numHandlers, final boolean verbose, Configuration conf) throws IOException { return getServer(instance, bindAddress, port, numHandlers, verbose, conf, null); } /** Construct a server for a protocol implementation instance listening on a * port and address, with a secret manager. * @see #getServer(Object, String, int, int, boolean, Configuration) * @param secretManager */ public static Server getServer(final Object instance, final String bindAddress, final int port, final int numHandlers, final boolean verbose, Configuration conf, SecretManager<? extends TokenIdentifier> secretManager) throws IOException { return new Server(instance, conf, bindAddress, port, numHandlers, verbose, secretManager); }上面两个方法都是调用的重载方法,在最下面第三个方法里创建Server对象。在RPC.java和Server.java中都有Server类,其中org.apache.hadoop.ipc.Server是一个抽象类,RPC.Server的内部类org.apache.hadoop.RPC.Server继承自org.apache.hadoop.ipc.Server类, 为何要如此设计?
RPC.getServer()方法直接调用了RPC.Server类的构造方法创建Server对象,该类的部分代码如下:
public static class Server extends org.apache.hadoop.ipc.Server { private Object instance; private boolean verbose; /** Construct an RPC server. * @param instance the instance whose methods will be called * @param conf the configuration to use * @param bindAddress the address to bind on to listen for connection * @param port the port to listen for connections on */ public Server(Object instance, Configuration conf, String bindAddress, int port) throws IOException { this(instance, conf, bindAddress, port, 1, false, null); } private static String classNameBase(String className) { String[] names = className.split("\\.", -1); if (names == null || names.length == 0) { return className; } return names[names.length-1]; } /** Construct an RPC server.创建一个RPC服务器 * @param instance the instance whose methods will be called * @param conf the configuration to use * @param bindAddress the address to bind on to listen for connection * @param port the port to listen for connections on * @param numHandlers the number of method handler threads to run * @param verbose whether each call should be logged,是否打开调用方法日志 */ public Server(Object instance, Configuration conf, String bindAddress, int port, int numHandlers, boolean verbose, SecretManager<? extends TokenIdentifier> secretManager) throws IOException { super(bindAddress, port, Invocation.class, numHandlers, conf, classNameBase(instance.getClass().getName()), secretManager); this.instance = instance; this.verbose = verbose; } }
这个类定义在org.apache.hadoop.ipc.RPC.java文件中,它继承自类org.apache.hadoop.ipc.Server这个抽象类。RPC.Server类有两个构造方法,第一个参数少的构造方法调用第二个参数多的构造方法,在第二个构造方法中调用了父类(org.apache.hadoop.ipc.Server)的构造函数,然后为成员变量instance和verbose赋值,其中instance变量表示所要调用接口的实例,即IPC方法调用的目标(记住这个比较重要),verbose表示是否打开调用方法日志,即是否将方法调用记录到日志中。再看org.apache.hadoop.ipc.Server类,这个类有五个内部类,分别为Call,Connection,Listener,Responser和Handler,其中Listener类的作用是监听客户端的连接请求,并与客户端进行连接;Responser的作用是返回调用结果给客户端;Handler用于处理客户端的调用信息;Connection为服务器端与客户端的连接类,保存了连接中与服务器端连接相关的信息和方法,由于服务器端和客户端对连接的抽象不太一样,所以客户端也有一个Client.Connection类用于保存与连接中与客户端相关的类;Call表示远程调用,保存了与调用相关的信息,与Connection类似,服务器端和客户端各有一个Call类。服务器端这些类相互配合完成,调用相关的操作。那么服务器端的调用过程到底是一个怎样的过程呢?简单的说是这样的:服务器端启动后,创建了一个Listener线程,多个Handler线程和一个Responser线程,Listener线程监听到客户端发过来连接请求,然后交给Listener中的Reader线程处理,Reader线程读取客户端发过来的调用数据,构造成一个Server.Call对象放入调用队列中。Handler线程在运行过程中从调用队列中取出调用对象,然后进行处理,处理完的结果通过Responser线程返回。
这个类中也有3个构造函数,也是参数少的函数调用参数多的构造函数,下面只看最终调用的这个构造函数,函数代码如下:
/** Constructs a server listening on the named port and address. Parameters passed must * be of the named class. The <code>handlerCount</handlerCount> determines * the number of handler threads that will be used to process calls.<br/> * @param bindAddress 服务器绑定所监听的IP地址,0.0.0.0表示监听所有地址 * @param port 监听端口 * @param paramClass 调用句柄,包含方法名和方法参数,默认时<code>RPC.Invocation</coce> * @param handlerCount Handler处理器的数量 * @param conf 配置Server对象的配置参数 * @param serverName 创建Server对象时给Server命名 * @param secretManager */ @SuppressWarnings("unchecked") protected Server(String bindAddress, int port, Class<? extends Writable> paramClass, int handlerCount, Configuration conf, String serverName, SecretManager<? extends TokenIdentifier> secretManager) throws IOException { this.bindAddress = bindAddress; this.conf = conf; this.port = port; this.paramClass = paramClass; this.handlerCount = handlerCount; this.socketSendBufferSize = 0;//? this.maxQueueSize = handlerCount * conf.getInt( IPC_SERVER_HANDLER_QUEUE_SIZE_KEY, IPC_SERVER_HANDLER_QUEUE_SIZE_DEFAULT); this.maxRespSize = conf.getInt(IPC_SERVER_RPC_MAX_RESPONSE_SIZE_KEY, IPC_SERVER_RPC_MAX_RESPONSE_SIZE_DEFAULT); this.readThreads = conf.getInt( IPC_SERVER_RPC_READ_THREADS_KEY, IPC_SERVER_RPC_READ_THREADS_DEFAULT); //调用队列 this.callQueue = new LinkedBlockingQueue<Call>(maxQueueSize); this.maxIdleTime = 2*conf.getInt("ipc.client.connection.maxidletime", 1000); this.maxConnectionsToNuke = conf.getInt("ipc.client.kill.max", 10); this.thresholdIdleConnections = conf.getInt("ipc.client.idlethreshold", 4000); this.secretManager = (SecretManager<TokenIdentifier>) secretManager; this.authorize = conf.getBoolean(HADOOP_SECURITY_AUTHORIZATION, false); this.isSecurityEnabled = UserGroupInformation.isSecurityEnabled(); // Start the listener here and let it bind to the port listener = new Listener(); this.port = listener.getAddress().getPort(); this.rpcMetrics = RpcInstrumentation.create(serverName, this.port); this.tcpNoDelay = conf.getBoolean("ipc.server.tcpnodelay", false); // Create the responder here responder = new Responder(); if (isSecurityEnabled) { SaslRpcServer.init(conf); } }首先是给各个成员变量赋值,然后是创建Listener对象listener及Responser对象responser。
首先看Listener类的初始化代码:
public Listener() throws IOException { address = new InetSocketAddress(bindAddress, port); // Create a new server socket and set to non blocking mode acceptChannel = ServerSocketChannel.open(); acceptChannel.configureBlocking(false);//设置为非阻塞 // Bind the server socket to the local host and port bind(acceptChannel.socket(), address, backlogLength); port = acceptChannel.socket().getLocalPort(); //Could be an ephemeral port,因为在ServerSocketChannel的bind方法的address为空时, //使用 an ephemeral port and a valid local address to bind the socket,所以port可能会改变 // create a selector; selector= Selector.open(); readers = new Reader[readThreads];//? readPool = Executors.newFixedThreadPool(readThreads); for (int i = 0; i < readThreads; i++) { Selector readSelector = Selector.open(); Reader reader = new Reader(readSelector); readers[i] = reader; readPool.execute(reader); } // Register accepts on the server socket with the selector. acceptChannel.register(selector, SelectionKey.OP_ACCEPT); this.setName("IPC Server listener on " + port); this.setDaemon(true); }
Listener使用NIO中的ServerSocketChannel来与客户端进行网络通信,关于NIO的知识请参考:http://blog.csdn.net/workformywork/article/details/17024743。在Listener的构造函数中,语句bind(acceptChannel.socket(), address, backlogLength);的作用是将服务器端的接收客户端连接请求的socket与IP地址和端口绑定,绑定成功后会再将绑定的端口赋值给port成员变量,因为在ServerSocketChannel的bind方法的address为空时,会使用一个临时的端口和本地地址来与socket进行绑定,所以最终与socket绑定的端口并不一定是传入参数所指定的端口值。绑定成功之后再创建一个Reader数组,Reader是Listener类的内部类,用于读取客户端传过来的数据(方法调用也是传递的序列化的数据),每一个Reader对象对应一个线程,但是每个Reader中都对应一个Selector,Selector本来就可以监听多个通道(Channel)为什么使用多个线程呢?这点还没看明白,希望知道其中缘由的朋友给个提示,谢谢。最后设置Listener线程的线程名,并且将该线程设置为守护线程。
Listener对象创建完之后,再回到Server的构造函数,进行Responder对象的创建,Responder的构造函数更简单:
Responder() throws IOException { this.setName("IPC Server Responder"); this.setDaemon(true); writeSelector = Selector.open(); // create a selector pending = 0; }设置线程名,设置线程为守护线程,打开一个Selector,然后设置等待Connection数量为0。执行Responder对象的创建之后,就完成了Server端的初始化,及RPC.getServer()过程结束
Hadoop IPC中的启动过程由Server.java文件中的Server.start()开始,在start()方法中启动了一个Responder线程,一个Listener线程,handlerCount个Handler线程,其中handlerCount参数在构造Server对象的时候构造参数中指定,Server.start()方法代码如下:
public synchronized void start() { responder.start(); listener.start(); handlers = new Handler[handlerCount]; for (int i = 0; i < handlerCount; i++) { handlers[i] = new Handler(i); handlers[i].start(); } }Handler类的构造方法比较简单,设置Handler线程为守护线程并设置线程名。
EOF
《Hadoop技术内幕:深入理解Hadoop Common和HDFS架构设计与实现原理》,