setNumMapTasks() 在Eclipse中无效

情景:

    使用 TotalOrderPartitioner 进行全排序,但是程序始终抛出java.io.IOException: Wrong number of partitions in keyset 的异常

14/05/11 17:22:56 INFO input.FileInputFormat: Total input paths to process : 1
14/05/11 17:22:56 WARN snappy.LoadSnappy: Snappy native library is available
14/05/11 17:22:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/05/11 17:22:56 INFO snappy.LoadSnappy: Snappy native library loaded
14/05/11 17:22:56 INFO partition.InputSampler: Using 81 samples
14/05/11 17:22:56 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/05/11 17:22:56 INFO compress.CodecPool: Got brand-new compressor
14/05/11 17:35:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/05/11 17:35:13 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/05/11 17:35:13 INFO input.FileInputFormat: Total input paths to process : 1
14/05/11 17:35:28 INFO mapred.JobClient: Running job: job_local2039601594_0001
14/05/11 17:35:29 INFO mapred.JobClient:  map 0% reduce 0%
14/05/11 17:35:58 INFO mapred.LocalJobRunner: Waiting for map tasks
14/05/11 17:35:58 INFO mapred.LocalJobRunner: Starting task: attempt_local2039601594_0001_m_000000_0
14/05/11 17:36:13 INFO util.ProcessTree: setsid exited with exit code 0
14/05/11 17:36:13 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1b5dc81
14/05/11 17:36:13 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/unsorted_data:0+7151
14/05/11 17:36:13 INFO mapred.MapTask: io.sort.mb = 100
14/05/11 17:36:22 INFO mapred.MapTask: data buffer = 79691776/99614720
14/05/11 17:36:22 INFO mapred.MapTask: record buffer = 262144/327680
14/05/11 17:36:32 INFO compress.CodecPool: Got brand-new decompressor
14/05/11 17:36:32 INFO mapred.LocalJobRunner: Map task executor complete.
14/05/11 17:36:32 WARN mapred.LocalJobRunner: job_local2039601594_0001
java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions file
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.IllegalArgumentException: Can't read partitions file
	at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:676)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Wrong number of partitions in keyset
	at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:90)
	... 11 more
14/05/11 17:36:33 INFO mapred.JobClient: Job complete: job_local2039601594_0001
14/05/11 17:36:33 INFO mapred.JobClient: Counters: 0



通过 调试最终到 是下边这段代码抛出的异常:
  // In TotalOrderPartitioner.class
  public void setConf(Configuration conf) {
    try {
      this.conf = conf;
      String parts = getPartitionFile(conf);
      final Path partFile = new Path(parts);
      final FileSystem fs = (DEFAULT_PATH.equals(parts))
        ? FileSystem.getLocal(conf)     // assume in DistributedCache
        : partFile.getFileSystem(conf);


      Job job = new Job(conf);
      Class keyClass = (Class)job.getMapOutputKeyClass();
      K[] splitPoints = readPartitions(fs, partFile, keyClass, conf);
      // 通过调试得到:
      // splitPoints.length      : 3
      // job.getNumReduceTasks() : 1

      if (splitPoints.length != job.getNumReduceTasks() - 1) {
        throw new IOException("Wrong number of partitions in keyset");
      }
      ...
    }
  }


在我的 main 函数里面,设置了 Reduce 的个数:
job.setNumReduceTasks( 3 );


显然,在程序执行的过程中,MR 框架自己修改了这个值!

继续追查:

// In JobConf.class 
public void init(JobConf conf) throws IOException {
    // Here
    String tracker = conf.get("mapred.job.tracker", "local");
    tasklogtimeout = conf.getInt(
      TASKLOG_PULL_TIMEOUT_KEY, DEFAULT_TASKLOG_TIMEOUT);
    this.ugi = UserGroupInformation.getCurrentUser();
    // Here
    if ("local".equals(tracker)) {
      conf.setNumMapTasks(1);
      this.jobSubmitClient = new LocalJobRunner(conf);
    } else {
      this.rpcJobSubmitClient = 
          createRPCProxy(JobTracker.getAddress(conf), conf);
      this.jobSubmitClient = createProxy(this.rpcJobSubmitClient, conf);
    }        
  }

根据 mapred.job.tracker 这个参数的意义:定义 job tracker 交互端口(localhost:9001)
我部署的是 hadoop的伪分布式


原因:

    程序是在 Eclipse 上使用 hadoop-eclispe 插件跑的,只能运行 本地模式。要运行其他 模式 还要在配置。

解决方法:

    打包程序,不要在 Eclipse 上运行

你可能感兴趣的:(hadoop)