KeyValueTextInputFormat类型切分格式

KeyValueTextInputFormat数据格式:

1,Linux mont bb xx zz dd fff 
2,Linux windows linux shell xhell
3,yy vv nn mm 

Drive类代码:

        Configuration conf = new Configuration();
        conf.set(KeyValueTextInputFormat.INPUT_DIR_RECURSIVE,",");//INPUT_DIR_RECURSIVE不是切分,这个还是相当于默认的用了“\t"。

        //KeyValueTextInputFormat 格式
        job.setInputFormatClass(KeyValueTextInputFormat.class);

 

        //实例化配置文件
        Configuration conf = new Configuration();
        conf.set(KeyValueTextInputFormat.INPUT_DIR_RECURSIVE,",");
        //定义一个job任务
        Job job = Job.getInstance(conf);

        //配置job的信息
        job.setJarByClass(WCDriver.class);

        //指定自定义的mapper以及mapper的数据类型到job中
        job.setMapperClass(WCMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        
        //指自定义的reduce以及reduce的数据类型<总输出的类型>到job
        job.setReducerClass(WCReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //KeyValueTextInputFormat 格式
        job.setInputFormatClass(KeyValueTextInputFormat.class);
        //配置输入数据的路径
        FileInputFormat.setInputPaths(job, new Path("D:\\input\\test\\plus\\aa3.txt"));
        //配置输出的路径
        FileOutputFormat.setOutputPath(job, new Path("D:\\input\\test\\plus\\2"));
         //提交任务
        job.waitForCompletion(true);

 

Map类:注意有坑

public class WCMap extends Mapper {
    //实现父类的快捷键 alt+Ins
    //ctrl+O继承父类,重写方法(直接的)


    @Override
    protected void map(LongWritable LongWritable, Text value, Context context) throws IOException, InterruptedException 

 来测试下结果:报错了。

原因是因为:数据类型不匹配,输出的是Test类型,输入的不应该是LongWritable类型了,转换为Test类型就可以了。

java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
	at com.itstar.mr.wc0908.WCMap.map(WCMap.java:21)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local376701184_0001 running in uber mode : false
[main] INFO org.apache.hadoop.mapreduce.Job -  map 0% reduce 0%
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local376701184_0001 failed with state FAILED due to: NA
[main] INFO org.apache.hadoop.mapreduce.Job - Counters: 0

现在修改Map类:  都是Text类型

 @Override
    protected void map(Text LongWritable, Text value, Context context) throws IOException, InterruptedException 

结果成功:

		Reduce shuffle bytes=27
		Reduce input records=3
		Reduce output records=1
		Spilled Records=6
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=514850816
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=80
	File Output Format Counters 
		Bytes Written=15

Process finished with exit code 0

查看生成的结果:

KeyValueTextInputFormat类型切分格式_第1张图片

 

 

下面就来设置,KeyValueTextInputFormat数据格式:

 

drive主类:

conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR,",");//这才是真正切分

        //实例化配置文件
        Configuration conf = new Configuration();
        conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR,",");

 

查看结果:切分成功

 

KeyValueTextInputFormat类型切分格式_第2张图片

 

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(#,MapReduce)