最近学习Hadoop Rpc框架,Hadoop自己基于Protobuf框架实现了内部通信,先来看看标准的Rpc通信长什么样子
client采用适配器模式,通过Proxy代理实现Protobuf通信要求的MethodDescriptor
代理细节
ClientNamenodeProtocolPB proxy = RPC.getProtocolProxy(
ClientNamenodeProtocolPB.class, version, address, ugi, conf,
NetUtils.getDefaultSocketFactory(conf),
org.apache.hadoop.ipc.Client.getTimeout(conf), defaultPolicy,
fallbackToSimpleAuth, alignmentContext).getProxy();
client 发送请求细节
public Message invoke(Object proxy, final Method method, Object[] args)
throws ServiceException {
... ...
val = (RpcWritable.Buffer) client.call(RPC.RpcKind.RPC_PROTOCOL_BUFFER,
constructRpcRequest(method, theRequest), remoteId,
fallbackToSimpleAuth, alignmentContext);
... ...
}
server是典型的reactor设计模式,多路复用响应请求,这块具体不赘述,主要看服务端怎么注册服务响应,以及处理响应Handler的部分Rpc通信实现
注册 Protocol (即响应服务)
this.serviceRpcServer = new RPC.Builder(conf)
.setProtocol(
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB.class)
.setInstance(clientNNPbService)
.setBindAddress(bindHost)
.setPort(serviceRpcAddr.getPort()).setNumHandlers(serviceHandlerCount)
.setVerbose(false)
.setSecretManager(namesystem.getDelegationTokenSecretManager())
.build();
... ...
DFSUtil.addPBProtocol(conf, TraceAdminProtocolPB.class,
traceAdminService, serviceRpcServer);
Reactor 模式 Handler的接受请求细节 call.run()
ProtoClassProtoImpl protocolImpl = getProtocolImpl(server, protoName,
clientVersion);
BlockingService service = (BlockingService) protocolImpl.protocolImpl;
MethodDescriptor methodDescriptor = service.getDescriptorForType()
.findMethodByName(methodName);
if (methodDescriptor == null) {
String msg = "Unknown method " + methodName + " called on " + protocol
+ " protocol.";
LOG.warn(msg);
throw new RpcNoSuchMethodException(msg);
}
Message prototype = service.getRequestPrototype(methodDescriptor);
Message param = request.getValue(prototype);
Message result;
Call currentCall = Server.getCurCall().get();
try {
... ...
result = service.callBlockingMethod(methodDescriptor, null, param);
由此可见,hadoop在protobuf的基础上做了很多自己的封装。Protobuf 除了实现消息的最简化通信之外,还提供了客户端服务端的接口,整体通信的流程大致可以划分为客户端传递,服务端注册服务,服务接受客户端消息并且转接对应服务。
提示:正文开始
Protobuf是Google用于序列化数据的框架,比xml,json定义的序列化形式占的字节更小、更快、更简单,方便传输。protobuf支持C++,java等多种语言以支持不同语言环境。官网链接
备注:Local即本地实现
先从官网的教程样例开始,如何定制一款自定义的消息体结构,并且在本地完成字节流文件输出以及字节流读取结构,实现消息的传递获取。
.proto
文件主要定义了你希望传递的数据结构,以及后面生成什么样的接口类用于实现,官网有数据结构的变量定义以及支持的变量类型
.proto
文件 支持的变量定义 链接
.proto
文件支持的变量类型 链接
示例文件 addressbook.proto
// See README.txt for information and build instructions.
package tutorial;
option java_package = "com.laozhaer.tutorial.protogen";
option java_outer_classname = "AddressBookProtos";
option java_multiple_files = true;
message Person {
required string name = 1;
required int32 id = 2; // Unique ID number for this person.
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
// Our address book file is just one of these.
message AddressBook {
repeated Person person = 1;
}
option
: java的参数选项,后面的proto执行器生成代码需要
outer_classname
:主要文件生成名
multiple_files
:是否生成多个类,比如上述文件就会生成Person和AddressBook类
Person 消息体有自己的姓名,id,邮件和手机号(消息体类型)
AddressBook 消息体记载了对应的 Person
.proto
相当于定义了数据结构,我们需要对应的代码实现类去操作和定义我们的数据。
这里官网提供了.proto
文件的代码生成器,可以方便快速地生成proto操作类,因为之前看的hadoop源码对应的protobuf版本是2.5.6,所以我这里选择的版本比较老,大家可以自行去官网选择下载。
PowerShell 命令行执行代码
./protoc.exe --java_out=${output_directory} addressbook.proto
只要在output_directory填写输出代码路径,就可以输出对应的proto执行类,具体类名可以看上图。
每个message(消息体)都可以视为一个原型,buidler是原型的构建器。
因为Person里面内嵌了PhoneNumber,所以构建Person消息体的同时,需要构建PhoneNumber消息体。
Person.Builder person = Person.newBuilder();
stdout.print("Enter person ID: ");
person.setId(Integer.valueOf(stdin.readLine()));
stdout.print("Enter name: ");
person.setName(stdin.readLine());
Person.PhoneNumber.Builder phoneNumber =
Person.PhoneNumber.newBuilder().setNumber(number);
stdout.print("Is this a mobile, home, or work phone? ");
String type = stdin.readLine();
if (type.equals("mobile")) {
phoneNumber.setType(Person.PhoneType.MOBILE);
} else if (type.equals("home")) {
phoneNumber.setType(Person.PhoneType.HOME);
} else if (type.equals("work")) {
phoneNumber.setType(Person.PhoneType.WORK);
} else {
stdout.println("Unknown phone type. Using default.");
}
person.addPhone(phoneNumber);
最后buidler.build() 返回Person结构体数据AddressBook.Builder addressBook = AddressBook.newBuilder();
... ...
addressBook.addPerson(
PromptForAddress(new BufferedReader(new InputStreamReader(System.in)),
System.out));
... ...
FileOutputStream output = new FileOutputStream("hadoop-rpc/target/ADDRESS_BOOK_FILE");
addressBook.build().writeTo(output);
最后 ADDRESS_BOOK_FILE
便是输出的字节流文件Message是消息体的父类
method | 操作说明 |
---|---|
AddressBook#newBuilder() | 构建AddressBook的消息体 |
AddressBook#parseFrom() | 从流、字节数组解析成消息体 |
AbstractMessage.Builder#mergeFrom() | 从输入流合并消息体(字节流不全对应一个消息体) |
AbstractMessageLite#writeTo | 写到输出流 |
注:只要是不能直接解析成消息体的字节流,都会在Builder中进行截取解析或者合并解析
备注:Stub指的是客户端存根,Server指的是服务端,Protobuf提供了客户端和服务端的实现
简单说一下整体流程
client 可以通过 new Stub ,连接channel对Server发送请求。
channel 是 包装 request 数据的地方,类型是Message,这里不需要知道Message 的具体类,只需要传递给服务端执行即可。
服务端需要判断Message的类型和执行方法,并且找到对应的service(预注册)去委托Handler执行,最后由Handler处理数据返回给客户端,客户端通过Channel反序列化获取服务端Response。
package rpctest;
option java_package = "com.laozhaer.tutorialweb.protogen.msg";
option java_generic_services = true;
option java_generate_equals_and_hash = true;
option java_multiple_files = true;
message HelloRequest {
required string msg = 1;
}
message HelloResponse {
required string msg = 2;
}
service HelloService {
rpc greet(HelloRequest) returns (HelloResponse);
}
如果需要.proto
文件生成Service类,则需要添加 option java_generic_services = true;
通过protoc.exe
即可生成Service文件类,client通过greet来发送请求,server通过委托给handler,由handler执行greet完成请求的回复。
客户端实现:
HelloService.Stub clientStub = HelloService.newStub(channel);
clientStub.greet(controller,request,done);
客户端只需要完成channel的实现以及发送greet即可完成服务器的交互。其中channel只是一个隧道,具体的数据序列化和反序列方式需要用户自己实现。
clientStub.greet本质上是委托给channel去执行的
clietStub.greet 内部细节
public void greet(
com.google.protobuf.RpcController controller,
com.laozhaer.tutorialweb.protogen.msg.HelloRequest request,
com.google.protobuf.RpcCallback<com.laozhaer.tutorialweb.protogen.msg.HelloResponse> done) {
channel.callMethod(
getDescriptor().getMethods().get(0),
controller,
request,
com.laozhaer.tutorialweb.protogen.msg.HelloResponse.getDefaultInstance(),
com.google.protobuf.RpcUtil.generalizeCallback(
done,
com.laozhaer.tutorialweb.protogen.msg.HelloResponse.class,
com.laozhaer.tutorialweb.protogen.msg.HelloResponse.getDefaultInstance()));
}
}
Channel实现
Channel隧道的实现本质是通过新建socket来完成序列与反序列化的,这里需要注意的一点是发送数据包对于服务端来说,并不知道字节流的完整长度,以及选择什么Handler去对应操作,所以需要在Channel进行发送数据 Header和Body的构建。一般是字节数组前4位(即一个int的长度)表示下一段数据(即header)需要接受的长度。再由header得知下一段数据(即body)需要接受的长度。这段数据体包裹可以直观从下图进行理解:
注:header也是一个message结构,主要是告知服务端,服务端需要响应的servicehandler以及执行handler的哪个方法
header.proto
package rpctest;
option java_package = "com.laozhaer.tutorialweb.protogen.header";
option java_generic_services = true;
option java_generate_equals_and_hash = true;
option java_multiple_files = true;
message HelloHeader {
required string serviceName = 1;
required string methodName = 2;
required int32 msgSize = 3;
}
Channel 的代码实现
public class RpcChannelImpl implements RpcChannel {
/**
* here needn't to know Message Instance Type
* @param helloRequest
* @return
*/
public HelloHeader createHeader(Message helloRequest){
// fetch msg byte size
int msgSize = helloRequest.getSerializedSize();
HelloHeader.Builder builder = HelloHeader.newBuilder();
return builder.setMsgSize(msgSize).setMethodName("greet").setServiceName("HelloService").build();
}
/**
* here already know type
* @param method
* @param controller
* @param request
* @param responsePrototype
* @param done
*/
@Override
public void callMethod(Descriptors.MethodDescriptor method, RpcController controller, Message request, Message responsePrototype, RpcCallback<Message> done) {
HelloHeader requestHeader = createHeader(request);
int headerSize = requestHeader.getSerializedSize();
// token, always 4 bit save int
byte[] tokenBytes = ByteUtils.int2Byte(headerSize);
// header
byte[] headerBytes = requestHeader.toByteArray();
System.out.println("headerSize:"+headerSize);
// request
byte[] sendRequestBytes = request.toByteArray();
int sendRequestSize = request.getSerializedSize();
System.out.println("requestSize:"+sendRequestSize);
System.out.println("byte length"+sendRequestBytes.length);
ByteBuffer buffer=ByteBuffer.allocate(4+headerSize+sendRequestSize);
buffer.put(tokenBytes).put(headerBytes).put(sendRequestBytes);
try {
Socket client = new Socket("localhost", 8080);
DataInputStream inputStream = new DataInputStream(client.getInputStream());
OutputStream outputStream =new DataOutputStream(client.getOutputStream());
outputStream.write(buffer.array());
outputStream.flush();
byte[] message = new byte[1024];
int len = inputStream.read(message);
byte[] readInform = Arrays.copyOf(message,len);
Message response = responsePrototype.getParserForType().parseFrom(readInform);
done.run(response);
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(method.getFullName());
}
}
Channel需要注意的是发送和收回都是确定的类型,只有在发送数据时,需要4位字节+header进行信息标识。
在响应response时,可以直接读取字节流进行结构体解析,因为只有服务端才会有判断和委托的功能,客户端是点对点对应的。
服务端的Handler适配
服务端在响应前,会预先注册所有的服务句柄,这里只有service.greet操作,所以只需要注册一个即可。
对于客户端Stub传递过来的Message信息,服务端需要找到对应的handler去执行。这里就需要先前header返回的service和method信息辅助服务端进行委托。
委托代码
public byte[] handleRequestData(byte[] data,Socket socket) throws InvalidProtocolBufferException {
byte[] tokenBytesAfterSend = new byte[4];
ByteBuffer wrap = ByteBuffer.wrap(data);
//fetch token bytes array
wrap.get(tokenBytesAfterSend, 0, 4);
int headerSize = ByteUtils.byte2Int(tokenBytesAfterSend);
//fetch serviceName & methodName from header
byte[] headerBytes = new byte[headerSize];
wrap.get(headerBytes, 0, headerSize);
HelloHeader helloHeaderProto = HelloHeader.parseFrom(headerBytes);
String serviceName = helloHeaderProto.getServiceName();
String methodName = helloHeaderProto.getMethodName();
System.out.println("serviceName is: "+serviceName);
System.out.println("methodName is: "+methodName);
//fetch request message
int requestSize = helloHeaderProto.getMsgSize();
byte[] requestBytes = new byte[requestSize];
wrap.get(requestBytes,0,requestSize);
//Here recognize service
Service service = serviceManager.get(serviceName);
if (service!=null){
Descriptors.MethodDescriptor methodDescriptor = service.getDescriptorForType().findMethodByName(methodName);
//fetch protoType
Message prototype = service.getRequestPrototype(methodDescriptor);
//fetch request
Message requestMsg = prototype.newBuilderForType().mergeFrom(requestBytes).build();
//build responseBuilder
final Message.Builder responseBuilder =
service.getResponsePrototype(methodDescriptor).newBuilderForType();
service.callMethod(methodDescriptor,null,requestMsg,new RpcCallback<Message>() {
@Override
public void run(Message message) {
if (message != null) {
responseBuilder.mergeFrom(message);
}
}
});
return responseBuilder.build().toByteArray();
}
return null;
};
大致流程是服务端获取了客户端的数据包,对客户端数据包前四位进行解析,获取下一段字节流长度得到header的Message,通过对header的Message解析获得body的Message,并且通过header携带的信息进行methodDescriptor的指定(这样service就知道需要找handler的greet操作回应客户端的greet),最后通过service.callMethod
发送body Message到Handler对应的方法去执行。这里handler直接在callback发送文字给客户端进行响应,省略了服务端的执行操作。
Handler的代码
public class RpcHandlerImpl implements HelloService.Interface {
@Override
public void greet(RpcController controller, HelloRequest request, RpcCallback<HelloResponse> done) {
done.run(HelloResponse.newBuilder().setMsg("you success!").build());
System.out.println("doSomething");
}
}
注:Server是全程不知道且不应该去向下转型的,只能是Message类。不应该在Server中直接对Message进行向下转型,而是在handler中完成直接操作,这里的response也是参照了hbase的rpc,在callback中进行回调,这样就避免了response Message的强制转型。
Hbase源码
final Message.Builder responseBuilder =
service.getResponsePrototype(methodDesc).newBuilderForType();
service.callMethod(methodDesc, controller, request, new RpcCallback<Message>() {
@Override
public void run(Message message) {
if (message != null) {
responseBuilder.mergeFrom(message);
}
}
});
if (coprocessorHost != null) {
coprocessorHost.postEndpointInvocation(service, methodName, request, responseBuilder);
}
IOException exception =
org.apache.hadoop.hbase.ipc.CoprocessorRpcUtils.getControllerException(controller);
if (exception != null) {
throw exception;
}
method | 具体说明 |
---|---|
HelloService#newReflectiveService | 添加服务Handler |
Service#getDescriptorForType | 获得服务Handler句柄 |
ServiceDescriptor#findMethodByName | 获得服务Handler的方法句柄 |
callMethod中除了实现了方法的调用,还可以指定Controller和Callback(客户端和服务端都有)
Controller是为了获得状态的返回,Callback主要是针对状态进行对应的善后操作(客户端关闭连接;服务端响应客户端)
Controller抽象定义(这里就不再实现了)
public interface RpcController {
// -----------------------------------------------------------------
// These calls may be made from the client side only. Their results
// are undefined on the server side (may throw RuntimeExceptions).
/**
* Resets the RpcController to its initial state so that it may be reused in
* a new call. This can be called from the client side only. It must not
* be called while an RPC is in progress.
*/
void reset();
/**
* After a call has finished, returns true if the call failed. The possible
* reasons for failure depend on the RPC implementation. {@code failed()}
* most only be called on the client side, and must not be called before a
* call has finished.
*/
boolean failed();
/**
* If {@code failed()} is {@code true}, returns a human-readable description
* of the error.
*/
String errorText();
/**
* Advises the RPC system that the caller desires that the RPC call be
* canceled. The RPC system may cancel it immediately, may wait awhile and
* then cancel it, or may not even cancel the call at all. If the call is
* canceled, the "done" callback will still be called and the RpcController
* will indicate that the call failed at that time.
*/
void startCancel();
// -----------------------------------------------------------------
// These calls may be made from the server side only. Their results
// are undefined on the client side (may throw RuntimeExceptions).
/**
* Causes {@code failed()} to return true on the client side. {@code reason}
* will be incorporated into the message returned by {@code errorText()}.
* If you find you need to return machine-readable information about
* failures, you should incorporate it into your response protocol buffer
* and should NOT call {@code setFailed()}.
*/
void setFailed(String reason);
/**
* If {@code true}, indicates that the client canceled the RPC, so the server
* may as well give up on replying to it. This method must be called on the
* server side only. The server should still call the final "done" callback.
*/
boolean isCanceled();
/**
* Asks that the given callback be called when the RPC is canceled. The
* parameter passed to the callback will always be {@code null}. The
* callback will always be called exactly once. If the RPC completes without
* being canceled, the callback will be called after completion. If the RPC
* has already been canceled when NotifyOnCancel() is called, the callback
* will be called immediately.
*
* {@code notifyOnCancel()} must be called no more than once per request.
* It must be called on the server side only.
*/
void notifyOnCancel(RpcCallback<Object> callback);
}
callMethod调用Handler句柄,Handler里面进行操作,伪代码实现如下:
@Override
public void greet(RpcController controller, HelloRequest request, RpcCallback<HelloResponse> done) {
... ...
do(things)
... ...
if things.getWrong() {
controller.setFailed();
}
done.run(new Thread( () -> System.out.println("callback logic") ).start(););
System.out.println("doSomething");
}
资源链接