在flink中,集群内部的组件之间通过akka来互相通信,其中采用了akka中的actor模型。
当需要提交一个可用的任务交由jobManager来处理并分配资源时,将会在ClusterClinet中的run()方法中,交由jobClient通过submitJobAndWait()方法去提交相应的jobGraph给jobManager。
public JobExecutionResult run(JobGraph jobGraph, ClassLoader classLoader) throws ProgramInvocationException {
waitForClusterToBeReady();
final ActorSystem actorSystem;
try {
actorSystem = actorSystemLoader.get();
} catch (FlinkException fe) {
throw new ProgramInvocationException("Could not start the ActorSystem needed to talk to the " +
"JobManager.", jobGraph.getJobID(), fe);
}
try {
logAndSysout("Submitting job with JobID: " + jobGraph.getJobID() + ". Waiting for job completion.");
this.lastJobExecutionResult = JobClient.submitJobAndWait(
actorSystem,
flinkConfig,
highAvailabilityServices,
jobGraph,
timeout,
printStatusDuringExecution,
classLoader);
return lastJobExecutionResult;
} catch (JobExecutionException e) {
throw new ProgramInvocationException("The program execution failed: " + e.getMessage(), jobGraph.getJobID(), e);
}
}
此处,在调用jobClinet提交任务的submitJobAndWait()方法之前,通过actorSystemLoader的get()方法获取得到了akka中创建具体的actor对象时需要的actorSystem对象,这里的默认实现为LazyActorSystemLoader,在其get()方法中,如果之前没有创建过actorSystem,将会在此处首次创建。
在得到actorSystem之后,将会在jobClinet的submitJob()中继续完成akka的构造。
Props jobClientActorProps = JobSubmissionClientActor.createActorProps(
highAvailabilityServices.getJobManagerLeaderRetriever(HighAvailabilityServices.DEFAULT_JOB_ID),
timeout,
sysoutLogUpdates,
config);
ActorRef jobClientActor = actorSystem.actorOf(jobClientActorProps);
Future
首先会构造得到一个actorRef必要的Props,此处在getJobManagerLeaderRetriever()方法中读取配置文件中配置的jonManager的主机名或者ip和端口,作为提交作业的actor所要投递的消息对象的地址。
之后得到jobClientActor即可作为投递作业消息的actor对象,并发出JobClinetMessage消息给当前jobClientActor,在接收到该消息之后,该actor将会根据jobManager地址提交任务。
具体接收消息逻辑在JobSubmissionClientActor的handleCustomMessage()方法中,具体关于提交任务的代码如下。
if (message instanceof SubmitJobAndWait) {
// sanity check that this no job was submitted through this actor before -
// it is a one-shot actor after all
if (this.client == null) {
jobGraph = ((SubmitJobAndWait) message).jobGraph();
if (jobGraph == null) {
LOG.error("Received null JobGraph");
sender().tell(
decorateMessage(new Status.Failure(new Exception("JobGraph is null"))),
getSelf());
} else {
LOG.info("Received job {} ({}).", jobGraph.getName(), jobGraph.getJobID());
this.client = getSender();
// is only successful if we already know the job manager leader
if (jobManager != null) {
tryToSubmitJob();
}
}
} else {
// repeated submission - tell failure to sender and kill self
String msg = "Received repeated 'SubmitJobAndWait'";
LOG.error(msg);
getSender().tell(
decorateMessage(new Status.Failure(new Exception(msg))), ActorRef.noSender());
terminate();
}
}
此处actor会根据消息类型选择相应的逻辑,前方采用的是jobClientMessage中的submitJobAndWait消息,此处会根据此类消息得到jobManager准备提交消息中的jobGraph。
在tryToSubmit()方法中,为具体的与jobManager通信的逻辑。
final CompletableFuture jarUploadFuture = blobServerAddressFuture.thenAcceptAsync(
(InetSocketAddress blobServerAddress) -> {
try {
ClientUtils.extractAndUploadJobGraphFiles(jobGraph, () -> new BlobClient(blobServerAddress, clientConfig));
} catch (FlinkException e) {
throw new CompletionException(e);
}
},
getContext().dispatcher());
jarUploadFuture
.thenAccept(
(Void ignored) -> {
LOG.info("Submit job to the job manager {}.", jobManager.path());
jobManager.tell(
decorateMessage(
new JobManagerMessages.SubmitJob(
jobGraph,
ListeningBehaviour.EXECUTION_RESULT_AND_STATE_CHANGES)),
getSelf());
// issue a SubmissionTimeout message to check that we submit the job within
// the given timeout
getContext().system().scheduler().scheduleOnce(
timeout,
getSelf(),
decorateMessage(JobClientMessages.getSubmissionTimeout()),
getContext().dispatcher(),
ActorRef.noSender());
})
.whenComplete(
(Void ignored, Throwable throwable) -> {
if (throwable != null) {
getSelf().tell(
decorateMessage(new JobManagerMessages.JobResultFailure(
new SerializedThrowable(ExceptionUtils.stripCompletionException(throwable)))),
ActorRef.noSender());
}
});
此处,在上传jar包的同时,发送JobManagerMessages的SubmitJob消息给jobManager完成任务提交消息的真正发送。
在JobManager中的handleMessage()方法中,具体实现了接受jobGraph的逻辑实现,JobManager继承自FlinkActor类,是jobManager中的actor具体实现。
case SubmitJob(jobGraph, listeningBehaviour) =>
val client = sender()
val jobInfo = new JobInfo(client, listeningBehaviour, System.currentTimeMillis(),
jobGraph.getSessionTimeout)
submitJob(jobGraph, jobInfo)
在jobManager接收提交的任务成功之后,将会发送成功的消息给jobClient。
jobInfo.notifyClients(
decorateMessage(JobSubmitSuccess(jobGraph.getJobID)))
同样jobClient会同样的方式处理JobSubmitSuccess的消息并记录日志改变相应的状态。
以上是具体的关于提交任务akka的使用。