flink中akka的使用 以jobClient提交任务为例子

在flink中,集群内部的组件之间通过akka来互相通信,其中采用了akka中的actor模型。

当需要提交一个可用的任务交由jobManager来处理并分配资源时,将会在ClusterClinet中的run()方法中,交由jobClient通过submitJobAndWait()方法去提交相应的jobGraph给jobManager。

public JobExecutionResult run(JobGraph jobGraph, ClassLoader classLoader) throws ProgramInvocationException {

   waitForClusterToBeReady();

   final ActorSystem actorSystem;

   try {
      actorSystem = actorSystemLoader.get();
   } catch (FlinkException fe) {
      throw new ProgramInvocationException("Could not start the ActorSystem needed to talk to the " +
         "JobManager.", jobGraph.getJobID(), fe);
   }

   try {
      logAndSysout("Submitting job with JobID: " + jobGraph.getJobID() + ". Waiting for job completion.");
      this.lastJobExecutionResult = JobClient.submitJobAndWait(
         actorSystem,
         flinkConfig,
         highAvailabilityServices,
         jobGraph,
         timeout,
         printStatusDuringExecution,
         classLoader);

      return lastJobExecutionResult;
   } catch (JobExecutionException e) {
      throw new ProgramInvocationException("The program execution failed: " + e.getMessage(), jobGraph.getJobID(), e);
   }
}

此处,在调用jobClinet提交任务的submitJobAndWait()方法之前,通过actorSystemLoader的get()方法获取得到了akka中创建具体的actor对象时需要的actorSystem对象,这里的默认实现为LazyActorSystemLoader,在其get()方法中,如果之前没有创建过actorSystem,将会在此处首次创建。

 

在得到actorSystem之后,将会在jobClinet的submitJob()中继续完成akka的构造。

Props jobClientActorProps = JobSubmissionClientActor.createActorProps(
   highAvailabilityServices.getJobManagerLeaderRetriever(HighAvailabilityServices.DEFAULT_JOB_ID),
   timeout,
   sysoutLogUpdates,
   config);

ActorRef jobClientActor = actorSystem.actorOf(jobClientActorProps);

Future submissionFuture = Patterns.ask(
      jobClientActor,
      new JobClientMessages.SubmitJobAndWait(jobGraph),
      new Timeout(AkkaUtils.INF_TIMEOUT())); 
  

首先会构造得到一个actorRef必要的Props,此处在getJobManagerLeaderRetriever()方法中读取配置文件中配置的jonManager的主机名或者ip和端口,作为提交作业的actor所要投递的消息对象的地址。

之后得到jobClientActor即可作为投递作业消息的actor对象,并发出JobClinetMessage消息给当前jobClientActor,在接收到该消息之后,该actor将会根据jobManager地址提交任务。

具体接收消息逻辑在JobSubmissionClientActor的handleCustomMessage()方法中,具体关于提交任务的代码如下。

if (message instanceof SubmitJobAndWait) {
   // sanity check that this no job was submitted through this actor before -
   // it is a one-shot actor after all
   if (this.client == null) {
      jobGraph = ((SubmitJobAndWait) message).jobGraph();
      if (jobGraph == null) {
         LOG.error("Received null JobGraph");
         sender().tell(
            decorateMessage(new Status.Failure(new Exception("JobGraph is null"))),
            getSelf());
      } else {
         LOG.info("Received job {} ({}).", jobGraph.getName(), jobGraph.getJobID());

         this.client = getSender();

         // is only successful if we already know the job manager leader
         if (jobManager != null) {
            tryToSubmitJob();
         }
      }
   } else {
      // repeated submission - tell failure to sender and kill self
      String msg = "Received repeated 'SubmitJobAndWait'";
      LOG.error(msg);
      getSender().tell(
         decorateMessage(new Status.Failure(new Exception(msg))), ActorRef.noSender());

      terminate();
   }
} 

此处actor会根据消息类型选择相应的逻辑,前方采用的是jobClientMessage中的submitJobAndWait消息,此处会根据此类消息得到jobManager准备提交消息中的jobGraph。

 

在tryToSubmit()方法中,为具体的与jobManager通信的逻辑。

final CompletableFuture jarUploadFuture = blobServerAddressFuture.thenAcceptAsync(
   (InetSocketAddress blobServerAddress) -> {
      try {
         ClientUtils.extractAndUploadJobGraphFiles(jobGraph, () -> new BlobClient(blobServerAddress, clientConfig));
      } catch (FlinkException e) {
         throw new CompletionException(e);
      }
   },
   getContext().dispatcher());

jarUploadFuture
   .thenAccept(
      (Void ignored) -> {
         LOG.info("Submit job to the job manager {}.", jobManager.path());

         jobManager.tell(
            decorateMessage(
               new JobManagerMessages.SubmitJob(
                  jobGraph,
                  ListeningBehaviour.EXECUTION_RESULT_AND_STATE_CHANGES)),
            getSelf());

         // issue a SubmissionTimeout message to check that we submit the job within
         // the given timeout
         getContext().system().scheduler().scheduleOnce(
            timeout,
            getSelf(),
            decorateMessage(JobClientMessages.getSubmissionTimeout()),
            getContext().dispatcher(),
            ActorRef.noSender());
      })
   .whenComplete(
      (Void ignored, Throwable throwable) -> {
         if (throwable != null) {
            getSelf().tell(
               decorateMessage(new JobManagerMessages.JobResultFailure(
                  new SerializedThrowable(ExceptionUtils.stripCompletionException(throwable)))),
               ActorRef.noSender());
         }
      });

此处,在上传jar包的同时,发送JobManagerMessages的SubmitJob消息给jobManager完成任务提交消息的真正发送。

 

在JobManager中的handleMessage()方法中,具体实现了接受jobGraph的逻辑实现,JobManager继承自FlinkActor类,是jobManager中的actor具体实现。

case SubmitJob(jobGraph, listeningBehaviour) =>
  val client = sender()

  val jobInfo = new JobInfo(client, listeningBehaviour, System.currentTimeMillis(),
    jobGraph.getSessionTimeout)

  submitJob(jobGraph, jobInfo)

在jobManager接收提交的任务成功之后,将会发送成功的消息给jobClient。

jobInfo.notifyClients(
  decorateMessage(JobSubmitSuccess(jobGraph.getJobID)))

同样jobClient会同样的方式处理JobSubmitSuccess的消息并记录日志改变相应的状态。

 

以上是具体的关于提交任务akka的使用。

你可能感兴趣的:(flink)