比如我们现在写好了一个mapred程序如下:
package com.besttone.mapred; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class SingleWordCount { public static class SingleWordCountMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); String keyword = context.getConfiguration().get("word"); while (itr.hasMoreTokens()) { String nextkey = itr.nextToken(); if (nextkey.trim().equals(keyword)) { word.set(nextkey); context.write(word, one); } } } } public static class SingleWordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } /** * @param args * @throws IOException */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args) .getRemainingArgs(); if (otherArgs.length != 3) { System.err.println("Usage: singlewordcount <in> <out> <word>"); System.exit(2); } conf.set("word", otherArgs[2]); Job job = new Job(conf, "single word count"); job.setJarByClass(SingleWordCount.class); job.setMapperClass(SingleWordCountMapper.class); job.setCombinerClass(SingleWordCountReducer.class); job.setReducerClass(SingleWordCountReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
这个mapred程序是用来统计指定单词的数量。
然后我们可以把这个类打包成JAR,比如命名叫:myexample.jar。拷贝到远程HADOOP_HOME目录下,比如我们统计input目录下的"hello"这个单词的个数,执行bin/hadoop jar myexample.jar com.besttone.mapred.SingleWordCount hdfs://master:9000/user/hadoop/input/* hdfs://master:9000/user/hadoop/output hello 。
另外一只执行方式是写一个Driver程序:
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.besttone.mapred; import org.apache.hadoop.util.ProgramDriver; /** * A description of an example program based on its class and a * human-readable description. */ public class MapRedDriver { public static void main(String argv[]){ int exitCode = -1; ProgramDriver pgd = new ProgramDriver(); try { pgd.addClass("singlewordcount", SingleWordCount.class, "A map/reduce program that counts the words in the input files."); pgd.driver(argv); // Success exitCode = 0; } catch(Throwable e){ e.printStackTrace(); } System.exit(exitCode); } }
然后和上面那个类一起重新打包成JAR,双击jar文件打开,修改META-INF下的MANIFEST.MF文件如下:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.1
Created-By: 20.6-b01 (Sun Microsystems Inc.)
Main-Class: com/besttone/mapred/MapRedDriver
将Main-Class设置为Driver的全路径名,然后将jar包拷贝到hadoop_home目录下。这时候就可以不用写mapred的全路径名了,而是使用Driver里定义的别名:
bin/hadoop jar myexample.jar singlewordcount hdfs://master:9000/user/hadoop/input/* hdfs://master:9000/user/hadoop/output hello 。
执行过程中可能会遇到hadoop mapred执行目录文件权限问题
错误信息如下:
job Submission failed with exception 'java.io.IOException(The ownership/permissions on the staging directory /tmp/hadoop-hadoop-user1/mapred/staging/hadoop-user1/.staging is not as expected. It is owned by hadoop-user1 and permissions are rwxrwxrwx. The directory must be owned by the submitter hadoop-user1 or by hadoop-user1 and permissions must be rwx------)
修改权限:
bin/hadoop fs -chmod -R 700 /home/hadoop/tmp