Mapreduce实现Wordcount

Mapreduce实现Wordcount

  • 一、程序实现
    • 1.1 mapper类:
    • 1.2 reducer类:
    • 1.3 main类:
  • 二、操作实例
    • 2.1 打包
    • 2.2 数据操作

一、程序实现

1.1 mapper类:

 // Mapper的四个参数:第一个Object表示输入key的类型;第二个Text表示输入value的类型;第三个Text表示表示输出键的类型;第四个IntWritable表示输出值的类型。
 
    public static class doMapper extends Mapper {
        public static final IntWritable one = new IntWritable(1);
        public static final Text word = new Text();

        // map参数,将处理后的数据写入context并传给reduce
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.tostring();
            String []arr = line.split("\t");
            for(String wd : arr){
				word.set(wd);
				context.write(word,val);   //把word存到容器中,计数
			}
		}
	}
            

1.2 reducer类:

public class WordReducerReducer extends Reducer {
        private IntWritable val = new IntWritable();
        protected void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            val.set(sum);
            context.write(key, result);//将结果保存到context中,最终输出形式为"key" + "result"
        }
    }

1.3 main类:

public class WordCount {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
	String input = null;
	String output = null;
	if(null != args && args.length == 2){
		input = args[0];
		output = args[1];
		Job job = new Job(new Configuration(),"word count");//创建一个job
		//以jar包的形式运行
		job.setJarByClass(WordCount.class);
		//设置Mapper类和Reducer类
		job.setMapperClass(Mapper.class);
		job.setReducerClass(Reducer.class);
		//设置输出的key/value的输出数据类型
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		//设置输入输出目录
		FileInputFormat.addInputPath(job,new Path(intput));
		FileOutputFormat.setOutputPath(job,new Path(Output));
		//提交运行job
		System.exit(job.waitForCompletion(true) ? 0 : 1);
}else{
		System.err.println(" wordcount ")
}
}
}

二、操作实例

2.1 打包

经过本地测试后,将程序打包打包 -> wordcount.jar。(exports => jar file选项)

2.2 数据操作

vim file.txt
hadoop fs -put file_a.txt     //将数据文件上传带到HDFS文件系统的根目录下
hadoop jar wordcount.jar /file_a.txt /wordcount_output    //运行输出目录为wordcount_output
hadoop fs -ls /wordcount_output  //查看目录中文件,再cat查看文件内容 

你可能感兴趣的:(大数据,hadoop,mapreduce)