MapReduce(六):OutputFormat数据输出

OutputFormat接口实现类

OutputFormat是MapReduce输出的基类,所有实现MapReduce输出都实现了OutputFormat接口。


3.1 OutputFormat接口实现类.png

默认输出格式TextOutputFormat

自定义OutputFormat

1)应用场景

输出数据到MySQL/HBase/ES等存储框架中。

2)自定义OutputFormat步骤

  • 自定义一个类继承FileOutputFormat。

  • 重写RecordWriter,具体重写输出数据的方法write()。

实战

LogRecordWriter.java

public class LogRecordWriter extends RecordWriter {

    private FSDataOutputStream oneOut;
    private FSDataOutputStream otherOut;

    public LogRecordWriter(TaskAttemptContext job) {
        // 创建流
        try {
            FileSystem fileSystem = FileSystem.get(job.getConfiguration());
            oneOut = fileSystem.create(
                new Path(System.getProperty("user.dir")+"/output/outputfromat/one.txt"));
            otherOut = fileSystem.create(
                new Path(System.getProperty("user.dir")+"/output/outputfromat/other.txt"));

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    @Override
    public void write(Text key, NullWritable value) throws IOException, InterruptedException {
        // 具体写
        String line = key.toString();
        if(line.contains("https")){
            oneOut.writeBytes(line+"\n");
        }else{
            otherOut.writeBytes(line+"\n");
        }
    }

    @Override
    public void close(TaskAttemptContext context) throws IOException, InterruptedException {
        IOUtils.closeStream(oneOut);
        IOUtils.closeStream(otherOut);
    }
}

LogOutputFormat.java

public class LogOutputFormat extends FileOutputFormat {

    @Override
    public RecordWriter getRecordWriter(TaskAttemptContext job)
        throws IOException, InterruptedException {
        LogRecordWriter logRecordWriter = new LogRecordWriter(job);
        return logRecordWriter;
    }
}

LogMapper.java

public class LogMapper extends Mapper {

    @Override
    protected void map(LongWritable key, Text value,
        Mapper.Context context)
        throws IOException, InterruptedException {
        context.write(value, NullWritable.get());
    }
}

LogReducer.java

public class LogReducer  extends Reducer {

    @Override
    protected void reduce(Text key, Iterable values,
        Reducer.Context context)
        throws IOException, InterruptedException {
        for (NullWritable value : values) {
            context.write(key,NullWritable.get());
        }
    }
}

LogDriver.java

public class LogDriver {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(LogDriver.class);
        job.setMapperClass(LogMapper.class);
        job.setReducerClass(LogReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(NullWritable.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(NullWritable.class);

        // 设置自定义的OutputFormat
        job.setOutputFormatClass(LogOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(System.getProperty("user.dir")+"/input/outputfromat"));
        // 虽然我们定义类OutputFormat,但是因为我们的OutputFormat继承自FileOutputFormat
        // 而FileOutputFormat要输出一个_SUCCESS文件,所以还得指定一个输出目录
        FileOutputFormat.setOutputPath(job, new Path(System.getProperty("user.dir")+"/output/outputfromat"));

        boolean b = job.waitForCompletion(true);
        System.exit(b?0:1);
    }

}

代码gitee地址

小结

本节我们了解到了OutputFormat数据输出,认识到自定义输出流。在实际应用中,FileOutputFormat并不能完全满足需求,可以通过继承OutputFormat来自定义输出。

你可能感兴趣的:(MapReduce(六):OutputFormat数据输出)