第51课:HadoopMapReduce多维排序解析与实战

基于以下的失败过程,我们修改了数据文件再测试了一次,将tab分割改成了逗号“,”,相应的程序里面也进行了修改String[] splited = data.split(",");,再次运行,测试ok

数据文件

[root@master IMFdatatest]#hadoop dfs -cat /library/dataForMutipleSorting.txt
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

16/02/27 04:01:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark,100
Hadoop,60
Kafka,95
Spark,99
Hadoop,65
Kafka,98
Spark,99
Hadoop,63
Kafka,97

[root@master IMFdatatest]#hadoop dfs -cat /library/outputdataForMutipleSorting8/part-r-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

16/02/27 04:04:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hadoop  60,63,65
Kafka   95,97,98
Spark   99,99,100

 

运行结果

第51课:HadoopMapReduce多维排序解析与实战_第1张图片

 

 

 

失败要查原因

问题定位:数组越界

是数据读入时解析有问题,我们先搞一个随机数来测试,而将读入的数据屏蔽,程序可以运行了。说明算法没有问题。

    int splited1 =  (int)(Math.random() * 1000);
   // intMultiplePair.setSecond(Integer.valueOf(splited[1]));   
   // intWritable.set(Integer.valueOf(splited[1]));
    intMultiplePair.setSecond(splited1); //排除数据预处理问题
    intWritable.set(splited1);

输出结果

[root@master IMFdatatest]#hadoop dfs -cat /library/outputdataForMutipleSorting6/part-r-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

16/02/26 19:41:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hadoop  223,377
Hadoop  63      481
Kafka   147,188
Kafka   97      991
Spark   542,613
Spark   99      244
[root@master IMFdatatest]#

 

 

 

1、数据文件

[root@master IMFdatatest]#hadoop dfs -cat /library/dataForMutipleSorting.txt
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

16/02/26 07:56:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark   100
Hadoop  60
Kafka   95
Spark   99
Hadoop  65
Kafka   98
Spark   99
Hadoop  63
Kafka   97

 

2、运行结果失败

 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:456) 2016-02-26 23:00:04,681 ---- map task executor complete.
 WARN [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:560) 2016-02-26 23:00:05,687 ---- job_local1144770356_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1
 at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
 at com.dtspark.hadoop.hellomapreduce.MutipleSorting$DataMapper.map(MutipleSorting.java:40)
 at com.dtspark.hadoop.hellomapreduce.MutipleSorting$DataMapper.map(MutipleSorting.java:1)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

打的日志

 INFO [main] (org.apache.hadoop.conf.Configuration.deprecation:1049) 2016-02-26 22:59:53,289 ---- session.id is deprecated. Instead, use dfs.metrics.session-id
 INFO [main] (org.apache.hadoop.metrics.jvm.JvmMetrics:76) 2016-02-26 22:59:53,296 ---- Initializing JVM Metrics with processName=JobTracker, sessionId=
 WARN [main] (org.apache.hadoop.mapreduce.JobSubmitter:261) 2016-02-26 22:59:54,773 ---- No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
 INFO [main] (org.apache.hadoop.mapreduce.lib.input.FileInputFormat:281) 2016-02-26 22:59:54,848 ---- Total input paths to process : 1
 INFO [main] (org.apache.hadoop.mapreduce.JobSubmitter:494) 2016-02-26 22:59:55,276 ---- number of splits:1
 INFO [main] (org.apache.hadoop.mapreduce.JobSubmitter:583) 2016-02-26 22:59:55,743 ---- Submitting tokens for job: job_local1144770356_0001
 INFO [main] (org.apache.hadoop.mapreduce.Job:1300) 2016-02-26 22:59:56,147 ---- The url to track the job:http://localhost:8080/
 INFO [main] (org.apache.hadoop.mapreduce.Job:1345) 2016-02-26 22:59:56,147 ---- Running job: job_local1144770356_0001
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:471) 2016-02-26 22:59:56,150 ---- OutputCommitter set in config null
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:489) 2016-02-26 22:59:56,162 ---- OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:448) 2016-02-26 22:59:56,362 ---- Waiting for map tasks
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.LocalJobRunner:224) 2016-02-26 22:59:56,363 ---- Starting task: attempt_local1144770356_0001_m_000000_0
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.yarn.util.ProcfsBasedProcessTree:181) 2016-02-26 22:59:56,489 ---- ProcfsBasedProcessTree currently is supported only on Linux.
 INFO [main] (org.apache.hadoop.mapreduce.Job:1366) 2016-02-26 22:59:57,150 ---- Job job_local1144770356_0001 running in uber mode : false
 INFO [main] (org.apache.hadoop.mapreduce.Job:1373) 2016-02-26 22:59:57,232 ----  map 0% reduce 0%
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.Task:587) 2016-02-26 22:59:57,697 ----  Using ResourceCalculatorProcessTree :org.apache.hadoop.yarn.util.WindowsBasedProcessTree@1fa97f4
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:753) 2016-02-26 22:59:57,731 ---- Processing split: hdfs://192.168.2.100:9000/library/dataForMutipleSorting.txt:0+90
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1202) 2016-02-26 22:59:57,979 ---- (EQUATOR) 0 kvi 26214396(104857584)
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:995) 2016-02-26 22:59:57,979 ---- mapreduce.task.io.sort.mb: 100
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:996) 2016-02-26 22:59:57,980 ---- soft limit at 83886080
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:997) 2016-02-26 22:59:57,980 ---- bufstart = 0; bufvoid = 104857600
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:998) 2016-02-26 22:59:57,980 ---- kvstart = 26214396; length = 6553600
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:402) 2016-02-26 22:59:58,010 ---- Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
Map Methond Invoked!!!
Spark
100
100
Map Methond Invoked!!!
Hadoop
60
60
Map Methond Invoked!!!
Kafka
95
95
Map Methond Invoked!!!
Spark
99
99
Map Methond Invoked!!!
Hadoop
65
65
Map Methond Invoked!!!
Kafka
98
98
Map Methond Invoked!!!
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1457) 2016-02-26 23:00:01,125 ---- Starting flush of map output
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1475) 2016-02-26 23:00:01,125 ---- Spilling map output
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1476) 2016-02-26 23:00:01,125 ---- bufstart = 0; bufend = 92; bufvoid = 104857600
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1478) 2016-02-26 23:00:01,125 ---- kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1660) 2016-02-26 23:00:03,684 ---- Finished spill 0
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:456) 2016-02-26 23:00:04,681 ---- map task executor complete.
 WARN [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:560) 2016-02-26 23:00:05,687 ---- job_local1144770356_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 1
 at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
 at com.dtspark.hadoop.hellomapreduce.MutipleSorting$DataMapper.map(MutipleSorting.java:40)
 at com.dtspark.hadoop.hellomapreduce.MutipleSorting$DataMapper.map(MutipleSorting.java:1)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 INFO [main] (org.apache.hadoop.mapreduce.Job:1386) 2016-02-26 23:00:06,265 ---- Job job_local1144770356_0001 failed with state FAILED due to: NA
 INFO [communication thread] (org.apache.hadoop.mapred.LocalJobRunner:591) 2016-02-26 23:00:06,403 ---- map > sort
 INFO [main] (org.apache.hadoop.mapreduce.Job:1391) 2016-02-26 23:00:06,647 ---- Counters: 25
 File System Counters
  FILE: Number of bytes read=175
  FILE: Number of bytes written=254813
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=90
  HDFS: Number of bytes written=0
  HDFS: Number of read operations=4
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=2
 Map-Reduce Framework
  Map input records=7
  Map output records=6
  Map output bytes=92
  Map output materialized bytes=110
  Input split bytes=124
  Combine input records=0
  Spilled Records=6
  Failed Shuffles=0
  Merged Map outputs=0
  GC time elapsed (ms)=25
  CPU time spent (ms)=0
  Physical memory (bytes) snapshot=0
  Virtual memory (bytes) snapshot=0
  Total committed heap usage (bytes)=234754048
 File Input Format Counters
  Bytes Read=90

 

3、源代码

package com.dtspark.hadoop.hellomapreduce;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class MutipleSorting {
 
  public static class DataMapper
     extends Mapper<LongWritable, Text, IntMultiplePair, IntWritable>{
  private  IntMultiplePair intMultiplePair = new IntMultiplePair();
  private IntWritable intWritable = new IntWritable(0);


 public void map(LongWritable key, Text value, Context context
                   ) throws IOException, InterruptedException {
 
    System.out.println("Map Methond Invoked!!!");
   
    String data = value.toString();
    String[] splited = data.split("\t");
   
    intMultiplePair.setFirst(splited[0]);
    intMultiplePair.setSecond(Integer.valueOf(splited[1]));
   
    intWritable.set(Integer.valueOf(splited[1]));
   
    System.out.println(intMultiplePair.getFirst());
    System.out.println(intMultiplePair.getSecond());
    System.out.println(intWritable);
   
    context.write(intMultiplePair, intWritable);
   
  
    }


 
 
     
            
}

 


public static class DataReducer
     extends Reducer<IntMultiplePair,IntWritable,Text, Text> {
 
 public void reduce(IntMultiplePair key , Iterable<IntWritable> values,
                      Context context
                      ) throws IOException, InterruptedException {
    System.out.println("Reduce Methond Invoked!!!" );
   
   
    StringBuffer buffered = new StringBuffer();
   
   Iterator<IntWritable> iter = values.iterator();
   while(iter.hasNext()){
   buffered.append(iter.next().get() + ",");
   }
   
   int length = buffered.toString().length();
  
    String result = buffered.toString().substring(0, length -1);
   
    context.write(new Text(key.getFirst()), new Text(result));
 }
  
}
 
 public static void main(String[] args) throws Exception {
  
  
  
   Configuration conf = new Configuration();
   String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
   if (otherArgs.length < 2) {
     System.err.println("Usage: MutlpleSorting <in> [<in>...] <out>");
     System.exit(2);
   }
 
  
  
   Job job = Job.getInstance(conf, "MutlpleSorting");
   job.setJarByClass(MutipleSorting.class);
 
   job.setMapperClass(DataMapper.class);
   job.setReducerClass(DataReducer.class);
  
  
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(Text.class);
   job.setMapOutputKeyClass(IntMultiplePair.class);
   job.setMapOutputValueClass(IntWritable.class);
  
  
   job.setPartitionerClass(MyMultipleSortingPartitioner.class);
   job.setSortComparatorClass(IntMultipleSortingComparator.class);
   job.setGroupingComparatorClass(GroupingMultipleComparator.class);
  
   for (int i = 0; i < otherArgs.length - 1; ++i) {
     FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
   }
   FileOutputFormat.setOutputPath(job,
     new Path(otherArgs[otherArgs.length - 1]));
   System.exit(job.waitForCompletion(true) ? 0 : 1);
 }

}


class IntMultiplePair implements WritableComparable<IntMultiplePair>{
 private String first;
 private int second;
 
 
 

 


 public String getFirst() {
  return first;
 }

 

 public void setFirst(String first) {
  this.first = first;
 }

 

 public int getSecond() {
  return second;
 }

 

 public void setSecond(int second) {
  this.second = second;
 }

 

 public IntMultiplePair(){}
 
 

 public IntMultiplePair(String first, int second) {
  
  this.first = first;
  this.second = second;
 }

 

 @Override
 public void readFields(DataInput input) throws IOException {
  this.first = input.readUTF();
  this.second = input.readInt();
  
 }

 @Override
 public void write(DataOutput output) throws IOException {
  output.writeUTF(this.first);
  output.writeInt(this.second);
  
 }

 @Override
 public int compareTo(IntMultiplePair o) {
  return 0;
 }
 
}

class IntMultipleSortingComparator extends WritableComparator{
 public IntMultipleSortingComparator(){
  super(IntMultiplePair.class, true);
 }

 @Override
 public int compare(WritableComparable a, WritableComparable b) {
  IntMultiplePair x = (IntMultiplePair)a;
  IntMultiplePair y = (IntMultiplePair)b;
  
  if(!x.getFirst().equals(y.getFirst())){
   return x.getFirst().compareTo(y.getFirst());
   
  } else {
   return x.getSecond() - y.getSecond();
  }
  
  
 }
 
 
}

class GroupingMultipleComparator extends WritableComparator{
 public GroupingMultipleComparator(){
  super(IntMultiplePair.class, true);
 }

 @Override
 public int compare(WritableComparable a, WritableComparable b) {
  IntMultiplePair x = (IntMultiplePair)a;
  IntMultiplePair y = (IntMultiplePair)b;
  
  return x.getFirst().compareTo(y.getFirst());
  
  
  
 }
 
 
}

class MyMultipleSortingPartitioner extends Partitioner<IntMultiplePair, IntWritable>{

 @Override
 public int getPartition(IntMultiplePair arg0, IntWritable arg1, int arg2) {
  return (arg0.getFirst().hashCode() & Integer.MAX_VALUE)%arg2;
 }
 
}

完整的代码

package com.dtspark.hadoop.hellomapreduce;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class MutipleSorting {
 
  public static class DataMapper
     extends Mapper<LongWritable, Text, IntMultiplePair, IntWritable>{
  private  IntMultiplePair intMultiplePair = new IntMultiplePair();
  private IntWritable intWritable = new IntWritable(0);


 public void map(LongWritable key, Text value, Context context
                   ) throws IOException, InterruptedException {
 
    System.out.println("Map Methond Invoked!!!");
   
    String data = value.toString();
   // String[] splited = data.split("\t");
    String[] splited = data.split(",");
   
    intMultiplePair.setFirst(splited[0]);
   
 
   intMultiplePair.setSecond(Integer.valueOf(splited[1]));   
     intWritable.set(Integer.valueOf(splited[1]));

    context.write(intMultiplePair, intWritable);
   
  
    }


 
 
     
            
}

 


public static class DataReducer
     extends Reducer<IntMultiplePair,IntWritable,Text, Text> {
 
 public void reduce(IntMultiplePair key , Iterable<IntWritable> values,
                      Context context
                      ) throws IOException, InterruptedException {
    System.out.println("Reduce Methond Invoked!!!" );
   
   
    StringBuffer buffered = new StringBuffer();
   
   Iterator<IntWritable> iter = values.iterator();
   while(iter.hasNext()){
   buffered.append(iter.next().get() + ",");
   }
   
   int length = buffered.toString().length();
  
    String result = buffered.toString().substring(0, length -1);
   
    context.write(new Text(key.getFirst()), new Text(result));
 }
  
}
 
 public static void main(String[] args) throws Exception {
  
  
  
   Configuration conf = new Configuration();
   String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
   if (otherArgs.length < 2) {
     System.err.println("Usage: MutlpleSorting <in> [<in>...] <out>");
     System.exit(2);
   }
 
  
  
   Job job = Job.getInstance(conf, "MutlpleSorting");
   job.setJarByClass(MutipleSorting.class);
 
   job.setMapperClass(DataMapper.class);
   job.setReducerClass(DataReducer.class);
  
  
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(Text.class);
   job.setMapOutputKeyClass(IntMultiplePair.class);
   job.setMapOutputValueClass(IntWritable.class);
  
  
   job.setPartitionerClass(MyMultipleSortingPartitioner.class);
   job.setSortComparatorClass(IntMultipleSortingComparator.class);
   job.setGroupingComparatorClass(GroupingMultipleComparator.class);
  
   for (int i = 0; i < otherArgs.length - 1; ++i) {
     FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
   }
   FileOutputFormat.setOutputPath(job,
     new Path(otherArgs[otherArgs.length - 1]));
   System.exit(job.waitForCompletion(true) ? 0 : 1);
 }

}


class IntMultiplePair implements WritableComparable<IntMultiplePair>{
 private String first;
 private int second;
 
 
 

 


 public String getFirst() {
  return first;
 }

 

 public void setFirst(String first) {
  this.first = first;
 }

 

 public int getSecond() {
  return second;
 }

 

 public void setSecond(int second) {
  this.second = second;
 }

 

 public IntMultiplePair(){}
 
 

 public IntMultiplePair(String first, int second) {
  
  this.first = first;
  this.second = second;
 }

 

 @Override
 public void readFields(DataInput input) throws IOException {
  this.first = input.readUTF();
  this.second = input.readInt();
  
 }

 @Override
 public void write(DataOutput output) throws IOException {
  output.writeUTF(this.first);
  output.writeInt(this.second);
  
 }

 @Override
 public int compareTo(IntMultiplePair o) {
  return 0;
 }
 
}

class IntMultipleSortingComparator extends WritableComparator{
 public IntMultipleSortingComparator(){
  super(IntMultiplePair.class, true);
 }

 @Override
 public int compare(WritableComparable a, WritableComparable b) {
  IntMultiplePair x = (IntMultiplePair)a;
  IntMultiplePair y = (IntMultiplePair)b;
 
  
  if(!x.getFirst().equals(y.getFirst())){
  System.out.println("排序开始了,比较第一个first:  "  +x.getFirst() +"    "+ y.getFirst()  +"    "+ x.getFirst().compareTo(y.getFirst()));
   return x.getFirst().compareTo(y.getFirst());
   
  } else {
   System.out.println("排序开始了,比较第二个second:  "  +x.getSecond()  +"    "+ y.getSecond() +"    " +( x.getSecond() - y.getSecond()));
   
   return x.getSecond() - y.getSecond();
  }
  
  
 }
 
 
}

class GroupingMultipleComparator extends WritableComparator{
 public GroupingMultipleComparator(){
  super(IntMultiplePair.class, true);
 }

 @Override
 public int compare(WritableComparable a, WritableComparable b) {
  IntMultiplePair x = (IntMultiplePair)a;
  IntMultiplePair y = (IntMultiplePair)b;
  System.out.println("分组开始了 :  "  +x.getFirst()  +"    "+ y.getFirst()  +"    "+ x.getFirst().compareTo(y.getFirst()));
  return x.getFirst().compareTo(y.getFirst());
  
  
  
 }
 
 
}

class MyMultipleSortingPartitioner extends Partitioner<IntMultiplePair, IntWritable>{

 @Override
 public int getPartition(IntMultiplePair arg0, IntWritable arg1, int arg2) {
  System.out.println("getPartition分区的计算过程     !!!!!!! "  +arg0.getFirst().hashCode() +"    " + Integer.MAX_VALUE + arg2);
  System.out.println("getPartition的值       "  + (arg0.getFirst().hashCode() & Integer.MAX_VALUE)%arg2);
  return (arg0.getFirst().hashCode() & Integer.MAX_VALUE)%arg2;
 }
 
}

 

打得日志

 INFO [main] (org.apache.hadoop.conf.Configuration.deprecation:1049) 2016-02-27 17:32:57,577 ---- session.id is deprecated. Instead, use dfs.metrics.session-id
 INFO [main] (org.apache.hadoop.metrics.jvm.JvmMetrics:76) 2016-02-27 17:32:57,582 ---- Initializing JVM Metrics with processName=JobTracker, sessionId=
 WARN [main] (org.apache.hadoop.mapreduce.JobSubmitter:261) 2016-02-27 17:32:57,978 ---- No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
 INFO [main] (org.apache.hadoop.mapreduce.lib.input.FileInputFormat:281) 2016-02-27 17:32:58,014 ---- Total input paths to process : 1
 INFO [main] (org.apache.hadoop.mapreduce.JobSubmitter:494) 2016-02-27 17:32:58,092 ---- number of splits:1
 INFO [main] (org.apache.hadoop.mapreduce.JobSubmitter:583) 2016-02-27 17:32:58,167 ---- Submitting tokens for job: job_local1851923379_0001
 INFO [main] (org.apache.hadoop.mapreduce.Job:1300) 2016-02-27 17:32:58,358 ---- The url to track the job: http://localhost:8080/
 INFO [main] (org.apache.hadoop.mapreduce.Job:1345) 2016-02-27 17:32:58,359 ---- Running job: job_local1851923379_0001
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:471) 2016-02-27 17:32:58,360 ---- OutputCommitter set in config null
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:489) 2016-02-27 17:32:58,367 ---- OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:448) 2016-02-27 17:32:58,415 ---- Waiting for map tasks
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.LocalJobRunner:224) 2016-02-27 17:32:58,415 ---- Starting task: attempt_local1851923379_0001_m_000000_0
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.yarn.util.ProcfsBasedProcessTree:181) 2016-02-27 17:32:58,447 ---- ProcfsBasedProcessTree currently is supported only on Linux.
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.Task:587) 2016-02-27 17:32:58,986 ----  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@16c89e
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:753) 2016-02-27 17:32:58,991 ---- Processing split: hdfs://192.168.2.100:9000/library/dataForMutipleSorting.txt:0+85
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1202) 2016-02-27 17:32:59,082 ---- (EQUATOR) 0 kvi 26214396(104857584)
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:995) 2016-02-27 17:32:59,083 ---- mapreduce.task.io.sort.mb: 100
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:996) 2016-02-27 17:32:59,083 ---- soft limit at 83886080
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:997) 2016-02-27 17:32:59,083 ---- bufstart = 0; bufvoid = 104857600
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:998) 2016-02-27 17:32:59,083 ---- kvstart = 26214396; length = 6553600
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:402) 2016-02-27 17:32:59,087 ---- Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
 INFO [main] (org.apache.hadoop.mapreduce.Job:1366) 2016-02-27 17:32:59,362 ---- Job job_local1851923379_0001 running in uber mode : false
 INFO [main] (org.apache.hadoop.mapreduce.Job:1373) 2016-02-27 17:32:59,363 ----  map 0% reduce 0%
Map Methond Invoked!!!
Map Methond Invoked!!!
Map Methond Invoked!!!
Map Methond Invoked!!!
Map Methond Invoked!!!
Map Methond Invoked!!!
Map Methond Invoked!!!
Map Methond Invoked!!!
Map Methond Invoked!!!
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.LocalJobRunner:591) 2016-02-27 17:33:00,086 ----
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1457) 2016-02-27 17:33:00,089 ---- Starting flush of map output
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1475) 2016-02-27 17:33:00,089 ---- Spilling map output
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1476) 2016-02-27 17:33:00,089 ---- bufstart = 0; bufend = 138; bufvoid = 104857600
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1478) 2016-02-27 17:33:00,089 ---- kvstart = 26214396(104857584); kvend = 26214364(104857456); length = 33/6553600
排序开始了,比较第一个first:  Kafka    Hadoop    3
排序开始了,比较第一个first:  Kafka    Spark    -8
排序开始了,比较第一个first:  Spark    Kafka    8
排序开始了,比较第二个second:  97    98    -1
排序开始了,比较第一个first:  Spark    Hadoop    11
排序开始了,比较第一个first:  Kafka    Hadoop    3
排序开始了,比较第一个first:  Kafka    Hadoop    3
排序开始了,比较第二个second:  63    65    -2
排序开始了,比较第二个second:  99    99    0
排序开始了,比较第一个first:  Spark    Kafka    8
排序开始了,比较第一个first:  Spark    Kafka    8
排序开始了,比较第二个second:  98    95    3
排序开始了,比较第二个second:  97    95    2
排序开始了,比较第一个first:  Hadoop    Kafka    -3
排序开始了,比较第一个first:  Spark    Hadoop    11
排序开始了,比较第一个first:  Spark    Hadoop    11
排序开始了,比较第一个first:  Kafka    Hadoop    3
排序开始了,比较第一个first:  Kafka    Hadoop    3
排序开始了,比较第一个first:  Kafka    Hadoop    3
排序开始了,比较第二个second:  65    60    5
排序开始了,比较第二个second:  63    60    3
排序开始了,比较第二个second:  99    100    -1
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.MapTask:1660) 2016-02-27 17:33:00,166 ---- Finished spill 0
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.Task:1001) 2016-02-27 17:33:00,178 ---- Task:attempt_local1851923379_0001_m_000000_0 is done. And is in the process of committing
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.LocalJobRunner:591) 2016-02-27 17:33:00,195 ---- map
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.Task:1121) 2016-02-27 17:33:00,196 ---- Task 'attempt_local1851923379_0001_m_000000_0' done.
 INFO [LocalJobRunner Map Task Executor #0] (org.apache.hadoop.mapred.LocalJobRunner:249) 2016-02-27 17:33:00,196 ---- Finishing task: attempt_local1851923379_0001_m_000000_0
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:456) 2016-02-27 17:33:00,196 ---- map task executor complete.
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:448) 2016-02-27 17:33:00,198 ---- Waiting for reduce tasks
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.LocalJobRunner:302) 2016-02-27 17:33:00,199 ---- Starting task: attempt_local1851923379_0001_r_000000_0
 INFO [pool-6-thread-1] (org.apache.hadoop.yarn.util.ProcfsBasedProcessTree:181) 2016-02-27 17:33:00,207 ---- ProcfsBasedProcessTree currently is supported only on Linux.
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Task:587) 2016-02-27 17:33:00,326 ----  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@675767
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.ReduceTask:362) 2016-02-27 17:33:00,330 ---- Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@902038
 INFO [pool-6-thread-1] (org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:196) 2016-02-27 17:33:00,345 ---- MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 INFO [EventFetcher for fetching Map Completion Events] (org.apache.hadoop.mapreduce.task.reduce.EventFetcher:61) 2016-02-27 17:33:00,349 ---- attempt_local1851923379_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
 INFO [main] (org.apache.hadoop.mapreduce.Job:1373) 2016-02-27 17:33:00,365 ----  map 100% reduce 0%
 INFO [localfetcher#1] (org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:141) 2016-02-27 17:33:00,392 ---- localfetcher#1 about to shuffle output of map attempt_local1851923379_0001_m_000000_0 decomp: 158 len: 162 to MEMORY
 INFO [localfetcher#1] (org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput:100) 2016-02-27 17:33:00,397 ---- Read 158 bytes from map-output for attempt_local1851923379_0001_m_000000_0
 INFO [localfetcher#1] (org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:314) 2016-02-27 17:33:00,400 ---- closeInMemoryFile -> map-output of size: 158, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->158
 INFO [EventFetcher for fetching Map Completion Events] (org.apache.hadoop.mapreduce.task.reduce.EventFetcher:76) 2016-02-27 17:33:00,402 ---- EventFetcher is interrupted.. Returning
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.LocalJobRunner:591) 2016-02-27 17:33:00,403 ---- 1 / 1 copied.
 INFO [pool-6-thread-1] (org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:674) 2016-02-27 17:33:00,403 ---- finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Merger:597) 2016-02-27 17:33:00,422 ---- Merging 1 sorted segments
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Merger:696) 2016-02-27 17:33:00,423 ---- Down to the last merge-pass, with 1 segments left of total size: 144 bytes
 INFO [pool-6-thread-1] (org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:751) 2016-02-27 17:33:00,426 ---- Merged 1 segments, 158 bytes to disk to satisfy reduce memory limit
 INFO [pool-6-thread-1] (org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:781) 2016-02-27 17:33:00,427 ---- Merging 1 files, 162 bytes from disk
 INFO [pool-6-thread-1] (org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:796) 2016-02-27 17:33:00,429 ---- Merging 0 segments, 0 bytes from memory into reduce
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Merger:597) 2016-02-27 17:33:00,429 ---- Merging 1 sorted segments
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Merger:696) 2016-02-27 17:33:00,431 ---- Down to the last merge-pass, with 1 segments left of total size: 144 bytes
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.LocalJobRunner:591) 2016-02-27 17:33:00,431 ---- 1 / 1 copied.
 INFO [pool-6-thread-1] (org.apache.hadoop.conf.Configuration.deprecation:1049) 2016-02-27 17:33:00,453 ---- mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
分组开始了 :  Hadoop    Hadoop    0
Reduce Methond Invoked!!!
分组开始了 :  Hadoop    Hadoop    0
分组开始了 :  Hadoop    Kafka    -3
分组开始了 :  Kafka    Kafka    0
Reduce Methond Invoked!!!
分组开始了 :  Kafka    Kafka    0
分组开始了 :  Kafka    Spark    -8
分组开始了 :  Spark    Spark    0
Reduce Methond Invoked!!!
分组开始了 :  Spark    Spark    0
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Task:1001) 2016-02-27 17:33:00,560 ---- Task:attempt_local1851923379_0001_r_000000_0 is done. And is in the process of committing
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.LocalJobRunner:591) 2016-02-27 17:33:00,563 ---- 1 / 1 copied.
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Task:1162) 2016-02-27 17:33:00,563 ---- Task attempt_local1851923379_0001_r_000000_0 is allowed to commit now
 INFO [pool-6-thread-1] (org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:439) 2016-02-27 17:33:00,573 ---- Saved output of task 'attempt_local1851923379_0001_r_000000_0' to hdfs://192.168.2.100:9000/library/outputdataForMutipleSorting12/_temporary/0/task_local1851923379_0001_r_000000
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.LocalJobRunner:591) 2016-02-27 17:33:00,574 ---- reduce > reduce
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.Task:1121) 2016-02-27 17:33:00,574 ---- Task 'attempt_local1851923379_0001_r_000000_0' done.
 INFO [pool-6-thread-1] (org.apache.hadoop.mapred.LocalJobRunner:325) 2016-02-27 17:33:00,574 ---- Finishing task: attempt_local1851923379_0001_r_000000_0
 INFO [Thread-3] (org.apache.hadoop.mapred.LocalJobRunner:456) 2016-02-27 17:33:00,574 ---- reduce task executor complete.
 INFO [main] (org.apache.hadoop.mapreduce.Job:1373) 2016-02-27 17:33:01,365 ----  map 100% reduce 100%
 INFO [main] (org.apache.hadoop.mapreduce.Job:1384) 2016-02-27 17:33:01,367 ---- Job job_local1851923379_0001 completed successfully
 INFO [main] (org.apache.hadoop.mapreduce.Job:1391) 2016-02-27 17:33:01,412 ---- Counters: 38
 File System Counters
  FILE: Number of bytes read=706
  FILE: Number of bytes written=509896
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=170
  HDFS: Number of bytes written=47
  HDFS: Number of read operations=13
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=4
 Map-Reduce Framework
  Map input records=9
  Map output records=9
  Map output bytes=138
  Map output materialized bytes=162
  Input split bytes=124
  Combine input records=0
  Combine output records=0
  Reduce input groups=3
  Reduce shuffle bytes=162
  Reduce input records=9
  Reduce output records=3
  Spilled Records=18
  Shuffled Maps =1
  Failed Shuffles=0
  Merged Map outputs=1
  GC time elapsed (ms)=23
  CPU time spent (ms)=0
  Physical memory (bytes) snapshot=0
  Virtual memory (bytes) snapshot=0
  Total committed heap usage (bytes)=469508096
 Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
 File Input Format Counters
  Bytes Read=85
 File Output Format Counters
  Bytes Written=47

 

你可能感兴趣的:(第51课:HadoopMapReduce多维排序解析与实战)