MapReduce Job submit源码追踪

本文首发于我的个人博客：HTMADAO的Blog 转载请注明出处

Job提交

在mapreduce程序的job类中，我们通过set Configuration对象，得到相应的job对象，在job对象中指定Mapper类、Reducer类，Job类等属性后，通过waitForCompletion(true)方法提交并等待job执行。传入的boolean类型参数决定是否监控并打印job的执行情况。


public class CheckJob {

    public static void main(String[] args) throws Exception {

        if (args.length!=2){

            System.out.println("你传够/多参数了吗？");

            System.exit(1);

        }else {

            //本程序要处理的数据在HDFS上的目录

            String inputPath = args[0];

            //本程序处理结果存在HDFS上的目录，注意：这个目录不能存在

            String outputPath = args[1];

            if (inputPath == null || inputPath == "" || outputPath == null || outputPath == ""){

                System.out.println("参数不对，滚回去重配");

                System.exit(1);

            }else {

                //构建Job类的对象

                Job checkRandomJob = Job.getInstance(HadoopUtil.getRemoteHadoopConf());

                //给当前job类的对象设置job名称

                checkRandomJob.setJobName("check Random app");

                //设置运行主类

                checkRandomJob.setJarByClass(CheckJob.class);

                //设置job的Mapper及其输出K,V的类型

                checkRandomJob.setMapperClass(CheckMapper.class);

                checkRandomJob.setMapOutputKeyClass(Text.class);

                checkRandomJob.setMapOutputValueClass(LongWritable.class);

                //设置job的输出K,V的类型，也可以说是Reducer输出的K,V的类型

                checkRandomJob.setReducerClass(CheckReducer.class);

                checkRandomJob.setOutputKeyClass(Text.class);

                checkRandomJob.setOutputValueClass(LongWritable.class);

                //设置要处理的HDFS上的文件的路径

                FileInputFormat.addInputPath(checkRandomJob,new Path(inputPath));

                //设置最终输出结果的路径

                FileOutputFormat.setOutputPath(checkRandomJob,new Path(outputPath));

                //等待程序完成后自动结束程序

                System.exit(checkRandomJob.waitForCompletion(true)?0:1);

            }

        }

    }

}

job.waitForCompletion()方法

进入Job类中的waitForCompletion()方法查看，该方法传入一个布尔值参数。方法首先检查Job状态，若处于DEFINE状态则通过submit()方法提交job。而后根据传入的参数决定是否监控并打印job的运行状况。


  public boolean waitForCompletion(boolean verbose

                                  ) throws IOException, InterruptedException,

                                            ClassNotFoundException {

    //首先检查Job状态，若处于DEFINE状态则通过submit()方法向集群提交job

    if (state == JobState.DEFINE) {

      submit();

    }

    //若传入参数为true，则监控并打印job运行情况

    if (verbose) {

      monitorAndPrintJob();

    } else {

      // get the completion poll interval from the client.

      int completionPollIntervalMillis =

        Job.getCompletionPollInterval(cluster.getConf());

      while (!isComplete()) {

        try {

          Thread.sleep(completionPollIntervalMillis);

        } catch (InterruptedException ie) {

        }

      }

    }

    return isSuccessful();

  }

submit()方法

该方法负责向集群提交job，方法首先再次检查job的状态，如果不是DEFINE则不能提交作业，setUseNewAPI()方法作用是指定job使用的是新版mapreduce的API，即org.apache.hadoop.mapreduce包下的Mapper和Reducer，而不是老版的mapred包下的类。

submit()中执行了两个比较重要的方法：其一的connect()方法会对Job类中的Cluster类型的成员进行初始化，该成员对象中封装了通过Configuration设置的集群的信息，其内部创建了真正的通信协议对象，它将用于最终的job提交。

getJobSubmitter()方法通过cluster中封装的集群信息(这里是文件系统和客户端)获取JobSubmitter对象，该对象负责最终向集群提交job并返回job的运行进度。最后job提交器对象submitter.submitJobInternal(Job.this, cluster)将当前job对象提交到cluster中，并返回job运行状态给status成员，该方法是JobSubmitter中最核心的功能代码。提交成功后，JobState被设置为RUNNING，表示当前job进入运行阶段，最后控制台中打印跟踪job运行状况的URL。


  public void submit()

        throws IOException, InterruptedException, ClassNotFoundException {

    ensureState(JobState.DEFINE);

    setUseNewAPI();

    connect();

    //通过cluster中封装的集群信息(这里是文件系统和客户端)获取JobSubmitter对象，该对象负责最终向集群提交job并返回job的运行进度

    final JobSubmitter submitter =

        getJobSubmitter(cluster.getFileSystem(), cluster.getClient());

    status = ugi.doAs(new PrivilegedExceptionAction() {

      public JobStatus run() throws IOException, InterruptedException,

      ClassNotFoundException {

        return submitter.submitJobInternal(Job.this, cluster);

      }

    });

    state = JobState.RUNNING;

    LOG.info("The url to track the job: " + getTrackingURL());

  }

submitJobInternal()方法

任务提交器（JobSubmitter）最终提交任务到集群的方法。

首先checkSpecs(job)方法检查作业输出路径是否配置并且是否存在。正确情况是已经配置且不存在，输出路径的配置参数为mapreduce.output.fileoutputformat.outputdir

而后获取job中封装的Configuration对象，添加MAPREDUCE_APPLICATION_FRAMEWORK_PATH（应用框架路径）到分布式缓存中。

通过JobSubmissionFiles中的静态方法getStagingDir()获取作业执行时相关资源的存放路径。默认路径是： /tmp/hadoop-yarn/staging/root/.staging

关于ip地址的方法则是用于获取提交任务的当前主机的IP，并将ip、主机名等相关信息封装进Configuration对象中。

生成jobID并将其设置进job对象中，构造提交job的路径。然后是对该路径设置一系列权限的操作，此处略过不表

writeConf()方法，将Job文件(jar包)上传到任务提交文件夹(HDFS)

（重要）writeSplits()方法，写分片数据文件job.splits和分片元数据文件job.splitmetainfo到任务提交文件夹，计算maptask数量。

submitClient.submitJob()方法，真正的提交作业到集群，并返回作业状态到status成员。submitClient是前面初始化Cluster对象时构建的。


  JobStatus submitJobInternal(Job job, Cluster cluster)

  throws ClassNotFoundException, InterruptedException, IOException {

    //validate the jobs output specs 检查作业输出路径是否配置并且是否存在

    checkSpecs(job);

    Configuration conf = job.getConfiguration();

    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);

    //configure the command line options correctly on the submitting dfs

    //获取提交任务的当前主机的IP，并将ip、主机名等相关信息封装仅Configuration对象中

    InetAddress ip = InetAddress.getLocalHost();

    if (ip != null) {

      submitHostAddress = ip.getHostAddress();

      submitHostName = ip.getHostName();

      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);

      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);

    }

    //生成作业ID,即jobID

    JobID jobId = submitClient.getNewJobID();

    //将jobID设置入job

    job.setJobID(jobId);

    //构造提交作业路径,jobStagingArea后接/jobID

    Path submitJobDir = new Path(jobStagingArea, jobId.toString());

    JobStatus status = null;

    try {

      conf.set(MRJobConfig.USER_NAME,

          UserGroupInformation.getCurrentUser().getShortUserName());

      conf.set("hadoop.http.filter.initializers",

          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");

      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());

      LOG.debug("Configuring job " + jobId + " with " + submitJobDir

          + " as the submit dir");

      // get delegation token for the dir 一系列关于作业提交路径权限的设置

      TokenCache.obtainTokensForNamenodes(job.getCredentials(),

          new Path[] { submitJobDir }, conf);



      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers

      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {

        KeyGenerator keyGen;

        try {

          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);

          keyGen.init(SHUFFLE_KEY_LENGTH);

        } catch (NoSuchAlgorithmException e) {

          throw new IOException("Error generating shuffle secret key", e);

        }

        SecretKey shuffleKey = keyGen.generateKey();

        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),

            job.getCredentials());

      }

            if (CryptoUtils.isEncryptedSpillEnabled(conf)) {

        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);

        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +

                "data spill is enabled");

      }

  //复制并配置相关文件

      copyAndConfigureFiles(job, submitJobDir);

  //获取配置文件路径

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);



      // Create the splits for the job

      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));

      //writeSplits()方法，写分片数据文件job.splits和分片元数据文件job.splitmetainfo,计算map任务数

      int maps = writeSplits(job, submitJobDir);

      conf.setInt(MRJobConfig.NUM_MAPS, maps);

      LOG.info("number of splits:" + maps);

      int maxMaps = conf.getInt(MRJobConfig.JOB_MAX_MAP,

          MRJobConfig.DEFAULT_JOB_MAX_MAP);

      if (maxMaps >= 0 && maxMaps < maps) {

        throw new IllegalArgumentException("The number of map tasks " + maps +

            " exceeded limit " + maxMaps);

      }

      // write "queue admins of the queue to which job is being submitted"

      // to job file.

      String queue = conf.get(MRJobConfig.QUEUE_NAME,

          JobConf.DEFAULT_QUEUE_NAME);

      AccessControlList acl = submitClient.getQueueAdmins(queue);

      conf.set(toFullPropertyName(queue,

          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS

      // as the tasks don't need this setting, actually they may break

      // because of it if present as the referral will point to a

      // different job.

      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(

          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,

          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {

        // Add HDFS tracking ids

        ArrayList trackingIds = new ArrayList();

        for (Token t :

            job.getCredentials().getAllTokens()) {

          trackingIds.add(t.decodeIdentifier().getTrackingId());

        }

        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,

            trackingIds.toArray(new String[trackingIds.size()]));

      }

      // Set reservation info if it exists

      ReservationId reservationId = job.getReservationId();

      if (reservationId != null) {

        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());

      }

      // Write job file to submit dir将Job文件(jar包)上传到任务提交文件夹(HDFS)

      writeConf(conf, submitJobFile);



      //

      // Now, actually submit the job (using the submit name)

      //

      printTokens(jobId, job.getCredentials());

      status = submitClient.submitJob(

          jobId, submitJobDir.toString(), job.getCredentials());

      if (status != null) {

        return status;

      } else {

        throw new IOException("Could not launch job");

      }

    } finally {

      if (status == null) {

        LOG.info("Cleaning up the staging area " + submitJobDir);

        if (jtFs != null && submitJobDir != null)

          jtFs.delete(submitJobDir, true);

      }

    }

  }

writeSplits()方法

使用newAPI将会调用writeNewSplits()方法


  private int writeSplits(org.apache.hadoop.mapreduce.JobContext job,

      Path jobSubmitDir) throws IOException,

      InterruptedException, ClassNotFoundException {

    JobConf jConf = (JobConf)job.getConfiguration();

    int maps;

    //如果使用newAPI则调用writeNewSplits

    if (jConf.getUseNewMapper()) {

      maps = writeNewSplits(job, jobSubmitDir);

    } else {

      maps = writeOldSplits(jConf, jobSubmitDir);

    }

    return maps;

  }

writeNewSplits()方法

writeNewSplits()方法将会根据我们设置的inputFormat.class通过反射获得inputFormat对象input，然后调用inputFormat对象的getSplits方法，当获得分片信息之后调用JobSplitWriter.createSplitFiles方法将分片的信息写入到submitJobDir/job.split文件中。


  private 

  int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,

      InterruptedException, ClassNotFoundException {

    Configuration conf = job.getConfiguration();

    InputFormat input =

      ReflectionUtils.newInstance(job.getInputFormatClass(), conf);

    List splits = input.getSplits(job);

    T[] array = (T[]) splits.toArray(new InputSplit[splits.size()]);

    // sort the splits into order based on size, so that the biggest

    // go first

    Arrays.sort(array, new SplitComparator());

    JobSplitWriter.createSplitFiles(jobSubmitDir, conf,

        jobSubmitDir.getFileSystem(conf), array);

    return array.length;

  }

FileInputFormat类中的getSplits()方法

这里我们需要注意这一代码：


while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {

    int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);

    splits.add(makeSplit(path, length-bytesRemaining, splitSize,

                        blkLocations[blkIndex].getHosts(),

                        blkLocations[blkIndex].getCachedHosts()));

    bytesRemaining -= splitSize;

}

if (bytesRemaining != 0) {

    int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);

    splits.add(makeSplit(path, length-bytesRemaining, bytesRemaining,

                        blkLocations[blkIndex].getHosts(),

                        blkLocations[blkIndex].getCachedHosts()));

}

这里的while循环中的判定语句作用是判断分块中剩余的字节大小与预设分片大小的比例是否超过某个限定值SPLIT_SLOP，该值是一个常量，为1.1，在FileInputFormat类中定义。也就是说当剩余字节大于预设分片大小的110%后，对剩余的文件继续分片，否则不足110%，直接将剩余文件生成一个分片。


private static final double SPLIT_SLOP = 1.1;

getSplits()方法全览：


  public List getSplits(JobContext job) throws IOException {

    StopWatch sw = new StopWatch().start();

    long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));

    long maxSize = getMaxSplitSize(job);

    // generate splits

    List splits = new ArrayList();

    List files = listStatus(job);

    for (FileStatus file: files) {

      Path path = file.getPath();

      long length = file.getLen();

      if (length != 0) {

        BlockLocation[] blkLocations;

        if (file instanceof LocatedFileStatus) {

          blkLocations = ((LocatedFileStatus) file).getBlockLocations();

        } else {

          FileSystem fs = path.getFileSystem(job.getConfiguration());

          blkLocations = fs.getFileBlockLocations(file, 0, length);

        }

        if (isSplitable(job, path)) {

          long blockSize = file.getBlockSize();

          long splitSize = computeSplitSize(blockSize, minSize, maxSize);

          long bytesRemaining = length;

          while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {

            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);

            splits.add(makeSplit(path, length-bytesRemaining, splitSize,

                        blkLocations[blkIndex].getHosts(),

                        blkLocations[blkIndex].getCachedHosts()));

            bytesRemaining -= splitSize;

          }

          if (bytesRemaining != 0) {

            int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining);

            splits.add(makeSplit(path, length-bytesRemaining, bytesRemaining,

                      blkLocations[blkIndex].getHosts(),

                      blkLocations[blkIndex].getCachedHosts()));

          }

        } else { // not splitable

          if (LOG.isDebugEnabled()) {

            // Log only if the file is big enough to be splitted

            if (length > Math.min(file.getBlockSize(), minSize)) {

              LOG.debug("File is not splittable so no parallelization "

                  + "is possible: " + file.getPath());

            }

          }

          splits.add(makeSplit(path, 0, length, blkLocations[0].getHosts(),

                      blkLocations[0].getCachedHosts()));

        }

      } else {

        //Create empty hosts array for zero length files

        splits.add(makeSplit(path, 0, length, new String[0]));

      }

    }

    // Save the number of input files for metrics/loadgen

    job.getConfiguration().setLong(NUM_INPUT_FILES, files.size());

    sw.stop();

    if (LOG.isDebugEnabled()) {

      LOG.debug("Total # of splits generated by getSplits: " + splits.size()

          + ", TimeTaken: " + sw.now(TimeUnit.MILLISECONDS));

    }

    return splits;

  }