【Hadoop】7.MapReduce框架原理-Shuffle机制-7.1Patition分区

什么是Shuffle

在Map方法之后,Reduce方法之前的数据处理过程称之为Shuffle

1. Partition分区

默认分区

官方默认的分区是哈希分区

@Public
@Stable
public class HashPartitioner<K2, V2> implements Partitioner<K2, V2> {
    public HashPartitioner() {
    }

    public void configure(JobConf job) {
    }

    public int getPartition(K2 key, V2 value, int numReduceTasks) {
    	// key的hashcode值与Integer的最大值 在和reduceTask数量求余
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
    }
}

默认分区是根据key的hashCode值对ReduceTeasks的个数取模得到的。用户没法控制哪个key存储到哪个分区的

自定义分区

如果我们想要按照我们的要求输出文件,就需要我们的自定义Partitioner才可以。
步骤:

  1. 自定义类继承Partitioner,重写getPartition()方法
  2. 在Job驱动中,设置自定义Partitioner
  3. 自定义Partition后,需要根据自定义Partitioner的逻辑设置相应的数量的Reducetask
示例

对以下数据分别根据a,b,c,d开头的分别放到一个文件,剩下的放到另一个文件。
【Hadoop】7.MapReduce框架原理-Shuffle机制-7.1Patition分区_第1张图片
呢么我们开始吧。
PatitionDriver 类

package com.xing.MapReduce.PatitionDemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class PatitionDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {


        System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.1");
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(configuration);

        Job job = Job.getInstance(configuration);
        job.setJobName("Patitioner");
        job.setJarByClass(PatitionDriver.class);

        job.setMapperClass(PatitionMapper.class);
        job.setReducerClass(PatitionReduce.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
		// 设置了自定义Partitioner 就必须设置好相应的ReduceTask
        job.setPartitionerClass(PatitionCustom.class);
        job.setNumReduceTasks(5);
        
        Path inPath = new Path("E:\\hdfs\\data\\patition\\input\\demo.txt");
        Path outPath = new Path("E:\\hdfs\\data\\patition\\output");

        if (fs.exists(outPath)){
            fs.delete(outPath,true );
        }

        FileInputFormat.setInputPaths(job,inPath);
        FileOutputFormat.setOutputPath(job,outPath);

        boolean b = job.waitForCompletion(true);
        System.exit(b?0:-1);

    }
}

PatitionMapper
最简单的分割加输出

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class PatitionMapper extends Mapper<LongWritable,Text,Text,Text> {

    private Text k = new Text();
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] split = value.toString().split("\t");
        k.set(split[0]);
        context.write(k,value);
    }
}

**PatitionReduce **
只有输出

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class PatitionReduce extends Reducer<Text,Text,Text,Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        // 就只是简单的输出数据
        for (Text value : values) {
            context.write(key,value);
        }
    }
}

PatitionCustom
最重要的自定义Patitioner,自定义的Patitioner返回值一定是从0开始,逐一增加,不能说我返回的值为1,4,6,7这种。

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

// Partitioner 这里是Map的输出
public class PatitionCustom extends Partitioner<Text,Text> {

    /**
     *  重写getPartition方法
     * @param key
     * @param value
     * @param numPartitions
     * @return
     */
    public int getPartition(Text key, Text value, int numPartitions) {
        System.out.println("###########我是自定义分区的key:"+key+",value:"+value+",分区数目:"+numPartitions);
        String keyString = key.toString();
        if (keyString.startsWith("a")){
            return 0;
        }else if(keyString.startsWith("b")){
            return 1;
        }else if(keyString.startsWith("c")){
            return 2;
        }else if(keyString.startsWith("d")){
            return 3;
        }else {
            return 4;
        }
    }
}

输出结果
【Hadoop】7.MapReduce框架原理-Shuffle机制-7.1Patition分区_第2张图片
【Hadoop】7.MapReduce框架原理-Shuffle机制-7.1Patition分区_第3张图片

"D:\Program Files\Java\jdk1.8.0_162\bin\java.exe" "-javaagent:D:\Program Files\JetBrains\IntelliJIDEA\lib\idea_rt.jar=50517:D:\Program Files\JetBrains\IntelliJIDEA\bin" -Dfile.encoding=UTF-8 -classpath "D:\Program Files\Java\jdk1.8.0_162\jre\lib\charsets.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\deploy.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\access-bridge-64.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\cldrdata.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\dnsns.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\jaccess.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\jfxrt.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\localedata.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\nashorn.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\sunec.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\sunjce_provider.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\sunmscapi.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\sunpkcs11.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\ext\zipfs.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\javaws.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\jce.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\jfr.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\jfxswt.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\jsse.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\management-agent.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\plugin.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\resources.jar;D:\Program Files\Java\jdk1.8.0_162\jre\lib\rt.jar;E:\ideawork\MapReduce\target\classes;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-client\2.7.1\hadoop-client-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-common\2.7.1\hadoop-common-2.7.1.jar;C:\Users\Administrator\.m2\repository\com\google\guava\guava\11.0.2\guava-11.0.2.jar;C:\Users\Administrator\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\Administrator\.m2\repository\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;C:\Users\Administrator\.m2\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Users\Administrator\.m2\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;C:\Users\Administrator\.m2\repository\commons-codec\commons-codec\1.4\commons-codec-1.4.jar;C:\Users\Administrator\.m2\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\Users\Administrator\.m2\repository\commons-net\commons-net\3.1\commons-net-3.1.jar;C:\Users\Administrator\.m2\repository\commons-collections\commons-collections\3.2.1\commons-collections-3.2.1.jar;C:\Users\Administrator\.m2\repository\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;C:\Users\Administrator\.m2\repository\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\Users\Administrator\.m2\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\Administrator\.m2\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;C:\Users\Administrator\.m2\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;C:\Users\Administrator\.m2\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;C:\Users\Administrator\.m2\repository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;C:\Users\Administrator\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\Users\Administrator\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\Users\Administrator\.m2\repository\org\apache\avro\avro\1.7.4\avro-1.7.4.jar;C:\Users\Administrator\.m2\repository\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\Users\Administrator\.m2\repository\org\xerial\snappy\snappy-java\1.0.4.1\snappy-java-1.0.4.1.jar;C:\Users\Administrator\.m2\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\Users\Administrator\.m2\repository\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-auth\2.7.1\hadoop-auth-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\httpcomponents\httpclient\4.2.5\httpclient-4.2.5.jar;C:\Users\Administrator\.m2\repository\org\apache\httpcomponents\httpcore\4.2.4\httpcore-4.2.4.jar;C:\Users\Administrator\.m2\repository\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;C:\Users\Administrator\.m2\repository\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;C:\Users\Administrator\.m2\repository\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;C:\Users\Administrator\.m2\repository\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;C:\Users\Administrator\.m2\repository\org\apache\curator\curator-framework\2.7.1\curator-framework-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\curator\curator-client\2.7.1\curator-client-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\curator\curator-recipes\2.7.1\curator-recipes-2.7.1.jar;C:\Users\Administrator\.m2\repository\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;C:\Users\Administrator\.m2\repository\org\apache\htrace\htrace-core\3.1.0-incubating\htrace-core-3.1.0-incubating.jar;C:\Users\Administrator\.m2\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;C:\Users\Administrator\.m2\repository\io\netty\netty\3.7.0.Final\netty-3.7.0.Final.jar;C:\Users\Administrator\.m2\repository\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;C:\Users\Administrator\.m2\repository\org\tukaani\xz\1.0\xz-1.0.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-hdfs\2.7.1\hadoop-hdfs-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\Administrator\.m2\repository\io\netty\netty-all\4.0.23.Final\netty-all-4.0.23.Final.jar;C:\Users\Administrator\.m2\repository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;C:\Users\Administrator\.m2\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;C:\Users\Administrator\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.7.1\hadoop-mapreduce-client-app-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.7.1\hadoop-mapreduce-client-common-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-yarn-client\2.7.1\hadoop-yarn-client-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-yarn-server-common\2.7.1\hadoop-yarn-server-common-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.7.1\hadoop-mapreduce-client-shuffle-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-yarn-api\2.7.1\hadoop-yarn-api-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.7.1\hadoop-mapreduce-client-core-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-yarn-common\2.7.1\hadoop-yarn-common-2.7.1.jar;C:\Users\Administrator\.m2\repository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;C:\Users\Administrator\.m2\repository\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;C:\Users\Administrator\.m2\repository\javax\activation\activation\1.1\activation-1.1.jar;C:\Users\Administrator\.m2\repository\javax\servlet\servlet-api\2.5\servlet-api-2.5.jar;C:\Users\Administrator\.m2\repository\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;C:\Users\Administrator\.m2\repository\com\sun\jersey\jersey-client\1.9\jersey-client-1.9.jar;C:\Users\Administrator\.m2\repository\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;C:\Users\Administrator\.m2\repository\org\codehaus\jackson\jackson-xc\1.9.13\jackson-xc-1.9.13.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.7.1\hadoop-mapreduce-client-jobclient-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-annotations\2.7.1\hadoop-annotations-2.7.1.jar;C:\Users\Administrator\.m2\repository\org\slf4j\slf4j-api\1.7.7\slf4j-api-1.7.7.jar;C:\Users\Administrator\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar" com.xing.MapReduce.PatitionDemo.PatitionDriver
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[INFO ] 2019-05-09 15:57:38,223 session.id is deprecated. Instead, use dfs.metrics.session-id
[INFO ] 2019-05-09 15:57:38,239 Initializing JVM Metrics with processName=JobTracker, sessionId=
[WARN ] 2019-05-09 15:57:39,003 Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
[WARN ] 2019-05-09 15:57:39,054 No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
[INFO ] 2019-05-09 15:57:39,067 Total input paths to process : 1
[INFO ] 2019-05-09 15:57:39,142 number of splits:1
[INFO ] 2019-05-09 15:57:39,258 Submitting tokens for job: job_local1859959178_0001
[INFO ] 2019-05-09 15:57:39,534 The url to track the job: http://localhost:8080/
[INFO ] 2019-05-09 15:57:39,534 Running job: job_local1859959178_0001
[INFO ] 2019-05-09 15:57:39,534 OutputCommitter set in config null
[INFO ] 2019-05-09 15:57:39,550 File Output Committer Algorithm version is 1
[INFO ] 2019-05-09 15:57:39,550 OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
[INFO ] 2019-05-09 15:57:39,628 Waiting for map tasks
[INFO ] 2019-05-09 15:57:39,628 Starting task: attempt_local1859959178_0001_m_000000_0
[INFO ] 2019-05-09 15:57:39,659 File Output Committer Algorithm version is 1
[INFO ] 2019-05-09 15:57:39,674 ProcfsBasedProcessTree currently is supported only on Linux.
[INFO ] 2019-05-09 15:57:39,737  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@1f41aa45
[INFO ] 2019-05-09 15:57:39,737 Processing split: file:/E:/hdfs/data/patition/input/demo.txt:0+228
[INFO ] 2019-05-09 15:57:39,799 (EQUATOR) 0 kvi 26214396(104857584)
[INFO ] 2019-05-09 15:57:39,799 mapreduce.task.io.sort.mb: 100
[INFO ] 2019-05-09 15:57:39,799 soft limit at 83886080
[INFO ] 2019-05-09 15:57:39,799 bufstart = 0; bufvoid = 104857600
[INFO ] 2019-05-09 15:57:39,799 kvstart = 26214396; length = 6553600
[INFO ] 2019-05-09 15:57:39,799 Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
###########我是自定义分区的key:apple,value:apple	没错我是苹果,分区数目:5
###########我是自定义分区的key:base,value:base	哈?你是谁?,分区数目:5
###########我是自定义分区的key:case,value:case	我也不知道,分区数目:5
###########我是自定义分区的key:airplane,value:airplane	我是飞机,分区数目:5
###########我是自定义分区的key:back,value:back	快后退,分区数目:5
###########我是自定义分区的key:duck,value:duck	啊啊啊,分区数目:5
###########我是自定义分区的key:disk,value:disk	拉拉阿拉,分区数目:5
###########我是自定义分区的key:fish,value:fish	我要吃鱼,分区数目:5
###########我是自定义分区的key:girlfriend,value:girlfriend	没错我缺个女朋友,分区数目:5
###########我是自定义分区的key:badboy,value:badboy	你是个坏男孩,分区数目:5
[INFO ] 2019-05-09 15:57:39,815 
[INFO ] 2019-05-09 15:57:39,815 Starting flush of map output
[INFO ] 2019-05-09 15:57:39,815 Spilling map output
[INFO ] 2019-05-09 15:57:39,815 bufstart = 0; bufend = 283; bufvoid = 104857600
[INFO ] 2019-05-09 15:57:39,815 kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
[INFO ] 2019-05-09 15:57:39,846 Finished spill 0
[INFO ] 2019-05-09 15:57:39,862 Task:attempt_local1859959178_0001_m_000000_0 is done. And is in the process of committing
[INFO ] 2019-05-09 15:57:39,877 map
[INFO ] 2019-05-09 15:57:39,877 Task 'attempt_local1859959178_0001_m_000000_0' done.
[INFO ] 2019-05-09 15:57:39,877 Finishing task: attempt_local1859959178_0001_m_000000_0
[INFO ] 2019-05-09 15:57:39,877 map task executor complete.
[INFO ] 2019-05-09 15:57:39,877 Waiting for reduce tasks
[INFO ] 2019-05-09 15:57:39,877 Starting task: attempt_local1859959178_0001_r_000000_0
[INFO ] 2019-05-09 15:57:39,908 File Output Committer Algorithm version is 1
[INFO ] 2019-05-09 15:57:39,924 ProcfsBasedProcessTree currently is supported only on Linux.
[INFO ] 2019-05-09 15:57:40,018  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@628cb174
[INFO ] 2019-05-09 15:57:40,033 Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2308a302
[INFO ] 2019-05-09 15:57:40,049 MergerManager: memoryLimit=657666880, maxSingleShuffleLimit=164416720, mergeThreshold=434060160, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[INFO ] 2019-05-09 15:57:40,111 attempt_local1859959178_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO ] 2019-05-09 15:57:40,158 localfetcher#1 about to shuffle output of map attempt_local1859959178_0001_m_000000_0 decomp: 68 len: 72 to MEMORY
[INFO ] 2019-05-09 15:57:40,174 Read 68 bytes from map-output for attempt_local1859959178_0001_m_000000_0
[INFO ] 2019-05-09 15:57:40,189 closeInMemoryFile -> map-output of size: 68, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->68
[INFO ] 2019-05-09 15:57:40,189 EventFetcher is interrupted.. Returning
[INFO ] 2019-05-09 15:57:40,189 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,189 finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[INFO ] 2019-05-09 15:57:40,220 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,220 Down to the last merge-pass, with 1 segments left of total size: 57 bytes
[INFO ] 2019-05-09 15:57:40,220 Merged 1 segments, 68 bytes to disk to satisfy reduce memory limit
[INFO ] 2019-05-09 15:57:40,220 Merging 1 files, 72 bytes from disk
[INFO ] 2019-05-09 15:57:40,220 Merging 0 segments, 0 bytes from memory into reduce
[INFO ] 2019-05-09 15:57:40,220 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,236 Down to the last merge-pass, with 1 segments left of total size: 57 bytes
[INFO ] 2019-05-09 15:57:40,236 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,252 mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
[INFO ] 2019-05-09 15:57:40,252 Task:attempt_local1859959178_0001_r_000000_0 is done. And is in the process of committing
[INFO ] 2019-05-09 15:57:40,267 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,267 Task attempt_local1859959178_0001_r_000000_0 is allowed to commit now
[INFO ] 2019-05-09 15:57:40,267 Saved output of task 'attempt_local1859959178_0001_r_000000_0' to file:/E:/hdfs/data/patition/output/_temporary/0/task_local1859959178_0001_r_000000
[INFO ] 2019-05-09 15:57:40,267 reduce > reduce
[INFO ] 2019-05-09 15:57:40,267 Task 'attempt_local1859959178_0001_r_000000_0' done.
[INFO ] 2019-05-09 15:57:40,267 Finishing task: attempt_local1859959178_0001_r_000000_0
[INFO ] 2019-05-09 15:57:40,267 Starting task: attempt_local1859959178_0001_r_000001_0
[INFO ] 2019-05-09 15:57:40,267 File Output Committer Algorithm version is 1
[INFO ] 2019-05-09 15:57:40,283 ProcfsBasedProcessTree currently is supported only on Linux.
[INFO ] 2019-05-09 15:57:40,330  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@7e106404
[INFO ] 2019-05-09 15:57:40,330 Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@570b67ab
[INFO ] 2019-05-09 15:57:40,330 MergerManager: memoryLimit=657666880, maxSingleShuffleLimit=164416720, mergeThreshold=434060160, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[INFO ] 2019-05-09 15:57:40,330 attempt_local1859959178_0001_r_000001_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO ] 2019-05-09 15:57:40,330 localfetcher#2 about to shuffle output of map attempt_local1859959178_0001_m_000000_0 decomp: 90 len: 94 to MEMORY
[INFO ] 2019-05-09 15:57:40,330 Read 90 bytes from map-output for attempt_local1859959178_0001_m_000000_0
[INFO ] 2019-05-09 15:57:40,330 closeInMemoryFile -> map-output of size: 90, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->90
[INFO ] 2019-05-09 15:57:40,330 EventFetcher is interrupted.. Returning
[INFO ] 2019-05-09 15:57:40,330 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,330 finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[INFO ] 2019-05-09 15:57:40,345 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,345 Down to the last merge-pass, with 1 segments left of total size: 83 bytes
[INFO ] 2019-05-09 15:57:40,345 Merged 1 segments, 90 bytes to disk to satisfy reduce memory limit
[INFO ] 2019-05-09 15:57:40,361 Merging 1 files, 94 bytes from disk
[INFO ] 2019-05-09 15:57:40,361 Merging 0 segments, 0 bytes from memory into reduce
[INFO ] 2019-05-09 15:57:40,361 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,361 Down to the last merge-pass, with 1 segments left of total size: 83 bytes
[INFO ] 2019-05-09 15:57:40,361 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,361 Task:attempt_local1859959178_0001_r_000001_0 is done. And is in the process of committing
[INFO ] 2019-05-09 15:57:40,361 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,361 Task attempt_local1859959178_0001_r_000001_0 is allowed to commit now
[INFO ] 2019-05-09 15:57:40,376 Saved output of task 'attempt_local1859959178_0001_r_000001_0' to file:/E:/hdfs/data/patition/output/_temporary/0/task_local1859959178_0001_r_000001
[INFO ] 2019-05-09 15:57:40,376 reduce > reduce
[INFO ] 2019-05-09 15:57:40,376 Task 'attempt_local1859959178_0001_r_000001_0' done.
[INFO ] 2019-05-09 15:57:40,376 Finishing task: attempt_local1859959178_0001_r_000001_0
[INFO ] 2019-05-09 15:57:40,376 Starting task: attempt_local1859959178_0001_r_000002_0
[INFO ] 2019-05-09 15:57:40,376 File Output Committer Algorithm version is 1
[INFO ] 2019-05-09 15:57:40,376 ProcfsBasedProcessTree currently is supported only on Linux.
[INFO ] 2019-05-09 15:57:40,439  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@8b6647c
[INFO ] 2019-05-09 15:57:40,439 Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@48c7cfe
[INFO ] 2019-05-09 15:57:40,439 MergerManager: memoryLimit=657666880, maxSingleShuffleLimit=164416720, mergeThreshold=434060160, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[INFO ] 2019-05-09 15:57:40,439 attempt_local1859959178_0001_r_000002_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO ] 2019-05-09 15:57:40,439 localfetcher#3 about to shuffle output of map attempt_local1859959178_0001_m_000000_0 decomp: 30 len: 34 to MEMORY
[INFO ] 2019-05-09 15:57:40,439 Read 30 bytes from map-output for attempt_local1859959178_0001_m_000000_0
[INFO ] 2019-05-09 15:57:40,439 closeInMemoryFile -> map-output of size: 30, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->30
[INFO ] 2019-05-09 15:57:40,439 EventFetcher is interrupted.. Returning
[INFO ] 2019-05-09 15:57:40,439 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,439 finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[INFO ] 2019-05-09 15:57:40,454 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,454 Down to the last merge-pass, with 1 segments left of total size: 23 bytes
[INFO ] 2019-05-09 15:57:40,470 Merged 1 segments, 30 bytes to disk to satisfy reduce memory limit
[INFO ] 2019-05-09 15:57:40,470 Merging 1 files, 34 bytes from disk
[INFO ] 2019-05-09 15:57:40,470 Merging 0 segments, 0 bytes from memory into reduce
[INFO ] 2019-05-09 15:57:40,470 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,470 Down to the last merge-pass, with 1 segments left of total size: 23 bytes
[INFO ] 2019-05-09 15:57:40,470 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,501 Task:attempt_local1859959178_0001_r_000002_0 is done. And is in the process of committing
[INFO ] 2019-05-09 15:57:40,501 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,501 Task attempt_local1859959178_0001_r_000002_0 is allowed to commit now
[INFO ] 2019-05-09 15:57:40,517 Saved output of task 'attempt_local1859959178_0001_r_000002_0' to file:/E:/hdfs/data/patition/output/_temporary/0/task_local1859959178_0001_r_000002
[INFO ] 2019-05-09 15:57:40,517 reduce > reduce
[INFO ] 2019-05-09 15:57:40,517 Task 'attempt_local1859959178_0001_r_000002_0' done.
[INFO ] 2019-05-09 15:57:40,517 Finishing task: attempt_local1859959178_0001_r_000002_0
[INFO ] 2019-05-09 15:57:40,517 Starting task: attempt_local1859959178_0001_r_000003_0
[INFO ] 2019-05-09 15:57:40,517 File Output Committer Algorithm version is 1
[INFO ] 2019-05-09 15:57:40,517 ProcfsBasedProcessTree currently is supported only on Linux.
[INFO ] 2019-05-09 15:57:40,548 Job job_local1859959178_0001 running in uber mode : false
[INFO ] 2019-05-09 15:57:40,548  map 100% reduce 100%
[INFO ] 2019-05-09 15:57:40,579  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@35a9936c
[INFO ] 2019-05-09 15:57:40,579 Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@e079421
[INFO ] 2019-05-09 15:57:40,579 MergerManager: memoryLimit=657666880, maxSingleShuffleLimit=164416720, mergeThreshold=434060160, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[INFO ] 2019-05-09 15:57:40,579 attempt_local1859959178_0001_r_000003_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO ] 2019-05-09 15:57:40,579 localfetcher#4 about to shuffle output of map attempt_local1859959178_0001_m_000000_0 decomp: 49 len: 53 to MEMORY
[INFO ] 2019-05-09 15:57:40,579 Read 49 bytes from map-output for attempt_local1859959178_0001_m_000000_0
[INFO ] 2019-05-09 15:57:40,579 closeInMemoryFile -> map-output of size: 49, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->49
[INFO ] 2019-05-09 15:57:40,579 EventFetcher is interrupted.. Returning
[INFO ] 2019-05-09 15:57:40,579 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,579 finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[INFO ] 2019-05-09 15:57:40,595 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,595 Down to the last merge-pass, with 1 segments left of total size: 42 bytes
[INFO ] 2019-05-09 15:57:40,610 Merged 1 segments, 49 bytes to disk to satisfy reduce memory limit
[INFO ] 2019-05-09 15:57:40,610 Merging 1 files, 53 bytes from disk
[INFO ] 2019-05-09 15:57:40,610 Merging 0 segments, 0 bytes from memory into reduce
[INFO ] 2019-05-09 15:57:40,610 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,610 Down to the last merge-pass, with 1 segments left of total size: 42 bytes
[INFO ] 2019-05-09 15:57:40,610 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,626 Task:attempt_local1859959178_0001_r_000003_0 is done. And is in the process of committing
[INFO ] 2019-05-09 15:57:40,626 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,626 Task attempt_local1859959178_0001_r_000003_0 is allowed to commit now
[INFO ] 2019-05-09 15:57:40,642 Saved output of task 'attempt_local1859959178_0001_r_000003_0' to file:/E:/hdfs/data/patition/output/_temporary/0/task_local1859959178_0001_r_000003
[INFO ] 2019-05-09 15:57:40,642 reduce > reduce
[INFO ] 2019-05-09 15:57:40,642 Task 'attempt_local1859959178_0001_r_000003_0' done.
[INFO ] 2019-05-09 15:57:40,642 Finishing task: attempt_local1859959178_0001_r_000003_0
[INFO ] 2019-05-09 15:57:40,642 Starting task: attempt_local1859959178_0001_r_000004_0
[INFO ] 2019-05-09 15:57:40,642 File Output Committer Algorithm version is 1
[INFO ] 2019-05-09 15:57:40,642 ProcfsBasedProcessTree currently is supported only on Linux.
[INFO ] 2019-05-09 15:57:40,688  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@54fd7bab
[INFO ] 2019-05-09 15:57:40,688 Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6902049
[INFO ] 2019-05-09 15:57:40,688 MergerManager: memoryLimit=657666880, maxSingleShuffleLimit=164416720, mergeThreshold=434060160, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[INFO ] 2019-05-09 15:57:40,688 attempt_local1859959178_0001_r_000004_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO ] 2019-05-09 15:57:40,704 localfetcher#5 about to shuffle output of map attempt_local1859959178_0001_m_000000_0 decomp: 76 len: 80 to MEMORY
[INFO ] 2019-05-09 15:57:40,704 Read 76 bytes from map-output for attempt_local1859959178_0001_m_000000_0
[INFO ] 2019-05-09 15:57:40,704 closeInMemoryFile -> map-output of size: 76, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->76
[INFO ] 2019-05-09 15:57:40,704 EventFetcher is interrupted.. Returning
[INFO ] 2019-05-09 15:57:40,704 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,704 finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[INFO ] 2019-05-09 15:57:40,720 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,720 Down to the last merge-pass, with 1 segments left of total size: 69 bytes
[INFO ] 2019-05-09 15:57:40,720 Merged 1 segments, 76 bytes to disk to satisfy reduce memory limit
[INFO ] 2019-05-09 15:57:40,720 Merging 1 files, 80 bytes from disk
[INFO ] 2019-05-09 15:57:40,720 Merging 0 segments, 0 bytes from memory into reduce
[INFO ] 2019-05-09 15:57:40,720 Merging 1 sorted segments
[INFO ] 2019-05-09 15:57:40,720 Down to the last merge-pass, with 1 segments left of total size: 69 bytes
[INFO ] 2019-05-09 15:57:40,720 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,751 Task:attempt_local1859959178_0001_r_000004_0 is done. And is in the process of committing
[INFO ] 2019-05-09 15:57:40,751 1 / 1 copied.
[INFO ] 2019-05-09 15:57:40,751 Task attempt_local1859959178_0001_r_000004_0 is allowed to commit now
[INFO ] 2019-05-09 15:57:40,766 Saved output of task 'attempt_local1859959178_0001_r_000004_0' to file:/E:/hdfs/data/patition/output/_temporary/0/task_local1859959178_0001_r_000004
[INFO ] 2019-05-09 15:57:40,766 reduce > reduce
[INFO ] 2019-05-09 15:57:40,766 Task 'attempt_local1859959178_0001_r_000004_0' done.
[INFO ] 2019-05-09 15:57:40,766 Finishing task: attempt_local1859959178_0001_r_000004_0
[INFO ] 2019-05-09 15:57:40,766 reduce task executor complete.
[INFO ] 2019-05-09 15:57:41,549 Job job_local1859959178_0001 completed successfully
[INFO ] 2019-05-09 15:57:41,580 Counters: 30
	File System Counters
		FILE: Number of bytes read=8840
		FILE: Number of bytes written=1733562
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=10
		Map output records=10
		Map output bytes=283
		Map output materialized bytes=333
		Input split bytes=107
		Combine input records=0
		Combine output records=0
		Reduce input groups=10
		Reduce shuffle bytes=333
		Reduce input records=10
		Reduce output records=10
		Spilled Records=20
		Shuffled Maps =5
		Failed Shuffles=0
		Merged Map outputs=5
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=1321205760
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=228
	File Output Format Counters 
		Bytes Written=343

Process finished with exit code 0

但是如果我们设置的ReduceTask不正好怎么办?

  1. 设置reduceTask为1
    结果是只有一个文件。因为你只有一个reducetask,所有的分区只能都放到这个reduce里面去,所以只有一个文件
    【Hadoop】7.MapReduce框架原理-Shuffle机制-7.1Patition分区_第4张图片
  2. 设置reduceTask为2
    报异常了,这是因为你只有2个reduceTask,但是你有5个分区,它不知道到底该怎么放。所以会报错
    【Hadoop】7.MapReduce框架原理-Shuffle机制-7.1Patition分区_第5张图片
  3. 设置reduceTask为6
    你会发现有6个文件,但是第6个文件是空的,也即是说我有5个分区,你有6个reduceTask 那我就按照顺序放尽可以了,剩下一个就闲置在那里。
    【Hadoop】7.MapReduce框架原理-Shuffle机制-7.1Patition分区_第6张图片

你可能感兴趣的:(hadoop)