04-运行wordcount例子程序
下面可以看下hadoop的例子程序:
hadoop-0.20.2-examples.jar
注意这里用到的wordcount.txt中的内容为:
hello hadoop credream hello hadoop credream hello hadoop credream hello hadoop credream hello hadoop credream
重复了15行
然后上传到hdfs中的test文件夹下
hadoop fs –put wordcount.txt /test/
hadoop fs –ls /test
可以看到已经上传成功了。
xiaofeng@xiaofeng-PC /opt/hadoop
$ cd /opt/hadoop//进入hadoop目录
xiaofeng@xiaofeng-PC /opt/hadoop
$ ls//查看hadoop中的文件
CHANGES.txt conf hadoop-0.20.2-core.jar librecordio
LICENSE.txt conf-local hadoop-0.20.2-examples.jar logs
NOTICE.txt conf-pseudo hadoop-0.20.2-test.jar src
README.txt conf.lnk hadoop-0.20.2-tools.jar webapps
bin contrib ivy
build.xml docs ivy.xml
c++ hadoop-0.20.2-ant.jar lib
hadoop-0.20.2-examples.jar//这个就是hadoop提供的一个wordcount例子程序
xiaofeng@xiaofeng-PC /opt/hadoop
$ hadoop jar hadoop-0.20.2-examples.jar//运行这条命令可以看到
An example program must be given as thefirst argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that countsthe word
s in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computesthe his
togram of the words in the input files.
dbcount: An example job that count the pageview counts from a database.
grep: A map/reduce program that counts the matches of a regex in theinput.
join: A job that effects a join over sorted, equally partitioneddatasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions topentomino pro
blems.
pi:A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter: A map/reduce program that writes 10GB of randomtextual data
pernode.
randomwriter: A map/reduce program that writes 10GB of random data pernode.
secondarysort:An example defining a secondary sort to the reduce.
sleep: A job that sleeps at each map and reduce task.
sort: A map/reduce program that sorts the data written by the randomwriter.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
//这个程序可以统计文件中的每个单词的出现次数
wordcount: A map/reduce program that countsthe words in the input files.
xiaofeng@xiaofeng-PC/opt/hadoop
$ hadoop jarhadoop-0.20.2-examples.jar wordcount
Usage:wordcount <in> <out>
//这里给出了这个程序的使用方法
xiaofeng@xiaofeng-PC /opt/hadoop
$ hadoop jar hadoop-0.20.2-examples.jarwordcount /test/wordcount.txt /test/result
//这里是运行hadoop-0.20.2-examples.jar这个jar文件来统计hdfs系统中test/下的wordcount.txt文件然后把统计结果放到/test/result文件中
运行的时候可以在http://localhost:50030/jobtracker.jsp这里查看进程的运行状态
RunningJobs
Jobid |
Priority |
User |
Name |
Map % Complete |
Map Total |
Maps Completed |
Reduce % Complete |
Reduce Total |
Reduces Completed |
Job Scheduling Information |
job_201304211117_0001 |
NORMAL |
xiaofeng-pc\xiaofeng |
word count |
0.00% |
1 |
0 |
0.00% |
1 |
0 |
NA |
Jobid |
Priority |
User |
Name |
Map % Complete |
Map Total |
Maps Completed |
Reduce % Complete |
Reduce Total |
Reduces Completed |
Job Scheduling Information |
job_201304211117_0001 |
NORMAL |
xiaofeng-pc\xiaofeng |
word count |
100.00% |
1 |
1 |
100.00% |
1 |
1 |
NA |
可以看到运行这个程序的时候先执行了Map,然后执行了Reduce,等完成了之后就不在RunningJobs中显示了,就会在Completed Jobs中显示100%
下面查看统计结果http://127.0.0.1:50070/点击进入hdfs系统,然后进入test下的result文件,查看
Goto parent directory
Name |
Type |
Size |
Replication |
Block Size |
Modification Time |
Permission |
Owner |
Group |
_logs |
dir |
2013-04-21 11:52 |
rwxr-xr-x |
xiaofeng-pc\xiaofeng |
supergroup |
|||
part-r-00000 |
file |
0.03 KB |
3 |
64 MB |
2013-04-21 11:52 |
rw-r--r-- |
xiaofeng-pc\xiaofeng |
supergroup |
这里的part-r-00000点击这个就可以查看了。
credream 65
hadoop 65
hello 65
可以看到已经统计出了结果
也可以通过命令查看:
xiaofeng@xiaofeng-PC /opt/hadoop
$ hadoop fs -ls /test/result//列出文件
Found 2 items
drwxr-xr-x - xiaofeng-pc\xiaofeng supergroup 0 2013-04-21 11:52 /test
/result/_logs
-rw-r--r-- 3 xiaofeng-pc\xiaofeng supergroup 31 2013-04-21 11:52 /test
/result/part-r-00000
xiaofeng@xiaofeng-PC /opt/hadoop
$ hadoop fs -cat /test/result/part*//注意这里使用了通配符,也就是开头是part的文件
credream 65
hadoop 65
hello 65
xiaofeng@xiaofeng-PC/opt/hadoop
另外在D:\hadoop4win\opt\hadoop也可以看到hadoop-0.20.2-examples.jar这个例子程序
先向服务器上传一文件:
执行统计:
root@linux:/home/wangjian#hadoop fs -copyFromLocal a.txt /wj/a.txt
root@linux:/home/wangjian#cd /opt/hadoop-0.20.2/
root@linux:/opt/hadoop-0.20.2#hadoop jar hadoop-0.20.2-examples.jar wordcount /wj/a.txt /wj/result2
4、查看结果:
默认情况下结果文件名为:/wj/result2/part-0000
root@linux:/opt/hadoop-0.20.2#hadoop fs -cat /wj/result2/part*