hadoop版本 hadoop-0.20.2
执行一个wordcount的mapreduce作业
1.启动hadoop
$ cd hadoop-0.20.2/bin
bin$ hadoop namenode -format
bin$ start-all.sh
bin$ jps
bin$ hadoop dfsadmin -safemode leave
2.生成数据
bin$ hadoop fs -mkdir input
bin$ hadoop fs -put /home/dic/input/163 input
bin$ hadoop fs -put /home/dic/input/sina input
bin$ hadoop fs -ls input
3.运行wordcount
bin$ hadoop jar ../hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount input output
4.检测输出
bin$ hadoop fs -ls output
--------------------------------------------------------------------------------------------------------------
遇到的问题
1:运行wordcount程序,控制台没任何显示
原因:因为我的hadoop启动後默认进入safemode模式,该模式下不允许对文件进行修改和删除。所以使用
hadoop dfsadmin -safemode leave
离开安全模式
2:控制台报错
Exception in thread "main" java.io.IOException: Error opening job jar: hadoop-0.20.2-examples.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:131)
at java.util.jar.JarFile.<init>(JarFile.java:150)
at java.util.jar.JarFile.<init>(JarFile.java:87)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
这是因为找不到 job jar 而出现的错误
正确命令
bin$ hadoop jar ../hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount input output
或者
$ bin/hadoop jar hadoop-0.20.1-examples.jar wordcount input output
3:控制台报错
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists
这是因为mapreduce的输出都应该是单独的输出文件,不能有重名的情况