Flink yarn.application-attempts 与 yarn.resourcemanager.am.max-attempts 区别

yarn.resourcemanager.am.max-attempts

是yarn集群上的全局配置,对运行在集群上的所有flink任务起作用;

在yarn-site.xml中配置;

 

yarn.application-attempts

只对当前的flink Job起作用,且不能大于yarn.resourcemanager.am.max-attempts,不然会被yarn.resourcemanager.am.max-attempts的值覆盖掉;

在flink-conf.yaml中配置(每个Job有一个 application master,每个application master 有一个 flink-conf.yaml);

集群初始化会占用一次(比如设置为5,则允许失败重启4次)

 

A hadoop job in Yarn is called an application. For example, a mapreduce job is one-to-one mapped to an Yarn application currently. If an application fails for a certain reason, Yarn will retry the application, i.e., launch a new application master to manage the lifecycle of the application. The number of maximum attempts is set by the configuration "yarn.resourcemanager.am.max-attempts" with a default value 2 and it's a global setting for all application masters.
 

  public static final String RM_AM_MAX_ATTEMPTS =
    RM_PREFIX + "am.max-attempts";
  public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2;

However, each application can set its own maximum number as long as the individual number is less than the global upper bound. Otherwise, the resourcemanager will override it. The ApplicationSubmissionContext class provides a method to set this number.
 

public abstract class ApplicationSubmissionContext {

  /**
   * Set the number of max attempts of the application to be submitted. WARNING:
   * it should be no larger than the global number of max attempts in the Yarn
   * configuration.
   * @param maxAppAttempts the number of max attempts of the application
   * to be submitted.
   */
  @Public
  @Stable
  public abstract void setMaxAppAttempts(int maxAppAttempts);

}

The retry logic lives inside the class RMAppImpl, 
which is a representation of an Yarn application on resource manager side.

public class RMAppImpl implements RMApp, Recoverable {
  private final int maxAppAttempts;
  private boolean isNumAttemptsBeyondThreshold = false;

  public RMAppImpl(ApplicationId applicationId, RMContext rmContext,
      Configuration config, String name, String user, String queue,
      ApplicationSubmissionContext submissionContext, YarnScheduler scheduler,
      ApplicationMasterService masterService, long submitTime,
      String applicationType, Set applicationTags, 
      ResourceRequest amReq) {

    int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
        YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
    int individualMaxAppAttempts = submissionContext.getMaxAppAttempts();
    if (individualMaxAppAttempts <= 0 ||
        individualMaxAppAttempts > globalMaxAppAttempts) {
      this.maxAppAttempts = globalMaxAppAttempts;
      LOG.warn("The specific max attempts: " + individualMaxAppAttempts
          + " for application: " + applicationId.getId()
          + " is invalid, because it is out of the range [1, "
          + globalMaxAppAttempts + "]. Use the global max attempts instead.");
    } else {
      this.maxAppAttempts = individualMaxAppAttempts;
    }
    ...
  }
}

As we can see, the globalMaxAppAttempts is obtained from Yarn configuration file and the maxAppAttempts is obtained from ApplicationSubmissionContext and then compared with the globalMaxAppAttempts.

 

https://johnjianfang.blogspot.com/2015/04/the-number-of-maximum-attempts-of-yarn.html  The Number of Maximum Attempts of an Yarn Application in Hadoop Two

https://my.oschina.net/flyxiang/blog/3050575/print Flink on Yarn模式启动流程分析

https://www.codenong.com/cs105691815/  写给忙人看的Flink任务提交流程

你可能感兴趣的:(Flink,Flink,yarn,application,attempts,resourcemanager)