问题描述如下:
1.环境
虚拟机: VMware station 10
OS: CentOS 6.4
eclipse : ------不记得了
JDK : 1.7.06
hadoop: 1.0.4
2.代码:
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, Text>{ private Text word = new Text("line:"); public void map(Text key, Text value, Context context ) throws IOException, InterruptedException { context.write(word,value); } } public static class IntSumReducer extends Reducer<Text,Text,Text,Text> { public void reduce(Text key, Iterable<Text> values, Context context ) throws IOException, InterruptedException { Text result = new Text(); String add= new String(); for (Text val : values) { add.concat(val.toString()); } result.set(add); context.write(key,result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
3 错误提示如下:
12/08/27 15:49:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/08/27 15:49:40 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/08/27 15:49:41 INFO input.FileInputFormat: Total input paths to process : 4
12/08/27 15:49:41 INFO mapred.JobClient: Running job: job_local_0001
12/08/27 15:49:41 INFO util.ProcessTree: setsid exited with exit code 0
12/08/27 15:49:41 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3249256e
12/08/27 15:49:41 INFO mapred.MapTask: io.sort.mb = 100
12/08/27 15:49:41 INFO mapred.MapTask: data buffer = 79691776/99614720
12/08/27 15:49:41 INFO mapred.MapTask: record buffer = 262144/327680
12/08/27 15:49:41 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at SmallFilesToSequenceFileConverter$SequenceFileMapper.map(SmallFilesToSequenceFileConverter.java:38)
at SmallFilesToSequenceFileConverter$SequenceFileMapper.map(SmallFilesToSequenceFileConverter.java:1)
3.解决思路。
网上有两种解决思路
(1)首先你看一下你map的输出和reduce的输入是不是对应的,然后看看你的map和reduce里的参数和下面的是不是设置的一样(来自:点击打开链接)
job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class);
这部分,我有试过不过没有效果.
(2)http://www.360doc.com/content/11/0524/16/7000788_119067361.shtml 这篇文章分析深入,但是解决思路太麻烦,而且我也没弄懂实际该怎么操作。很有可能,我的错误和这里提到的错误不是同一个问题。此处只做引用吧~
我的方法:
网上有人说,因为hadoop 版本不一致,mapreduce里面的map 和reduce方法需要重载,于是我按照他们的说法,载map方法和reduce方法前面加了一个 @Override ,这时eclipse 提示错误
the method map(Text,Text,Mapper<Object,Text,text,Text>.Context) of type SortAndUpper.SpliMapper must overrdide or implement a supertype
表明,我所写的map方法有误,仔细一看才知道我的map方法里面的参数 第一个参数为Text ,查看hadoop API发现,map方法里没有全部都是Text 类型的参数序列。第一个参数修改为Object 就可以了。