Ubuntu系统下的Hadoop集群(2)_使用命令行编译打包运行自己的MapReduce程序

使用命令行编译打包运行自己的MapReduce程序 Hadoop2.4.1

网上的MapReduce WordCount教程对于如何编译WordCount.java几乎是一笔带过… 而有写到的,大多又是 0.20 等旧版本版本的做法,即 javac -classpath /usr/local/hadoop/hadoop-1.0.1/hadoop-core-1.0.1.jar WordCount.java,但较新的 2.X 版本中,已经没有 hadoop-core*.jar 这个文件,因此编辑和打包自己的MapReduce程序与旧版本有所不同。

本文以 Hadoop 2.4.1 环境下的WordCount实例来介绍 2.x 版本中如何编辑自己的MapReduce程序。

Hadoop 2.x 版本中的依赖 jar

Hadoop 2.x 版本中jar不再集中在一个 hadoop-core*.jar 中,而是分成多个 jar,如运行WordCount实例需要如下三个 jar:

  • $HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar
  • $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar
  • $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

编译、打包 Hadoop MapReduce 程序

将上述 jar 添加至 classpath 路径:

export CLASSPATH="$HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH"

接着就可以编译 WordCount.java 了(使用的是 2.4.1 源码中的 WordCount.java,源码在文本最后面):

javac WordCount.java

(注:

按照上面的操作不能成功,推荐使用命令行如下,

设置环境变量,

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

编译,

/usr/local/hadoop/bin/hadoop com.sun.tools.javac.Main WordCount.java

编译时会有警告,可以忽略。编译后可以看到生成了几个.class文件。

使用Javac编译自己的MapReduce程序

接着把 .class 文件打包成 jar,才能在 Hadoop 中运行:

jar -cvf WordCount.jar ./WordCount*.class

打包完成后,运行试试,创建几个输入文件:

Mkdir input
echo "echo of the rainbow" > ./input/file0
echo "the waiting game" > ./input/file1

创建WordCount的输入

开始运行:

/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output

不过这边可能会遇到如下的提示 Exception in thread "main" java.lang.NoClassDefFoundError: WordCount :

提示找不到 WordCount 类

因为程序中声明了 package ,所以在命令中也要 org.apache.hadoop.examples 写完整:

/usr/local/hadoop/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output

(注:

在这里,不会遇到上述的问题,

但是会遇到如下问题,

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop/input

和hadoop单机版遇到的问题相同,

暂时先用以前的方法解决,

使用命令,vim /usr/local/hadoop/etc/hadoop/core-site.xml

将core-site.xml文件重新编辑为原来的样子,也就是删除新添加的内容,

vim的删除行指令,dd为删除当前行,i为从当前位置插入

删除后,顺利完成,符合实验预期,


正确运行后的结果如下:

WordCount 运行结果

注:

第二条指令也可以使用,

cat output/*


进阶:使用Eclipse编译运行MapReduce程序

使用命令行编译运行MapReduce程序毕竟有些麻烦,修改一次就得手动编译、打包一次,使用Eclipse编译运行MapReduce程序会更加方便。

WordCount.java 源码

文件位于 hadoop-2.4.1-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples 中:

 
   
  1. /**
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements. See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership. The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License. You may obtain a copy of the License at
  9. *
  10. * http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing, software
  13. * distributed under the License is distributed on an "AS IS" BASIS,
  14. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. * See the License for the specific language governing permissions and
  16. * limitations under the License.
  17. */
  18. package org.apache.hadoop.examples;
  19. import java.io.IOException;
  20. import java.util.StringTokenizer;
  21. import org.apache.hadoop.conf.Configuration;
  22. import org.apache.hadoop.fs.Path;
  23. import org.apache.hadoop.io.IntWritable;
  24. import org.apache.hadoop.io.Text;
  25. import org.apache.hadoop.mapreduce.Job;
  26. import org.apache.hadoop.mapreduce.Mapper;
  27. import org.apache.hadoop.mapreduce.Reducer;
  28. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  29. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  30. import org.apache.hadoop.util.GenericOptionsParser;
  31. public class WordCount {
  32. public static class TokenizerMapper
  33. extends Mapper<Object, Text, Text, IntWritable>{
  34. private final static IntWritable one = new IntWritable(1);
  35. private Text word = new Text();
  36. public void map(Object key, Text value, Context context
  37. ) throws IOException, InterruptedException {
  38. StringTokenizer itr = new StringTokenizer(value.toString());
  39. while (itr.hasMoreTokens()) {
  40. word.set(itr.nextToken());
  41. context.write(word, one);
  42. }
  43. }
  44. }
  45. public static class IntSumReducer
  46. extends Reducer<Text,IntWritable,Text,IntWritable> {
  47. private IntWritable result = new IntWritable();
  48. public void reduce(Text key, Iterable<IntWritable> values,
  49. Context context
  50. ) throws IOException, InterruptedException {
  51. int sum = 0;
  52. for (IntWritable val : values) {
  53. sum += val.get();
  54. }
  55. result.set(sum);
  56. context.write(key, result);
  57. }
  58. }
  59. public static void main(String[] args) throws Exception {
  60. Configuration conf = new Configuration();
  61. String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
  62. if (otherArgs.length != 2) {
  63. System.err.println("Usage: wordcount ");
  64. System.exit(2);
  65. }
  66. Job job = new Job(conf, "word count");
  67. job.setJarByClass(WordCount.class);
  68. job.setMapperClass(TokenizerMapper.class);
  69. job.setCombinerClass(IntSumReducer.class);
  70. job.setReducerClass(IntSumReducer.class);
  71. job.setOutputKeyClass(Text.class);
  72. job.setOutputValueClass(IntWritable.class);
  73. FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
  74. FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
  75. System.exit(job.waitForCompletion(true) ? 0 : 1);
  76. }
  77. }

你可能感兴趣的:(Hadoop,转载)