云计算学习笔记---04-运行hadoop的例子程序:统计字符--wordcount例子程序

04-运行wordcount例子程序

下面可以看下hadoop的例子程序:

hadoop-0.20.2-examples.jar

注意这里用到的wordcount.txt中的内容为:

hello hadoop credream  hello hadoop credream  hello hadoop credream hello hadoop credream  hello hadoop credream

重复了15

然后上传到hdfs中的test文件夹下

hadoop fs –put wordcount.txt /test/

hadoop fs –ls /test

可以看到已经上传成功了。

xiaofeng@xiaofeng-PC /opt/hadoop

$ cd /opt/hadoop//进入hadoop目录

 

xiaofeng@xiaofeng-PC /opt/hadoop

$ ls//查看hadoop中的文件

CHANGES.txt conf                  hadoop-0.20.2-core.jar     librecordio

LICENSE.txt conf-local            hadoop-0.20.2-examples.jar  logs

NOTICE.txt  conf-pseudo            hadoop-0.20.2-test.jar      src

README.txt  conf.lnk              hadoop-0.20.2-tools.jar    webapps

bin         contrib                ivy

build.xml   docs                   ivy.xml

c++          hadoop-0.20.2-ant.jar  lib

hadoop-0.20.2-examples.jar//这个就是hadoop提供的一个wordcount例子程序

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop jar hadoop-0.20.2-examples.jar//运行这条命令可以看到

An example program must be given as thefirst argument.

Valid program names are:

 aggregatewordcount: An Aggregate based map/reduce program that countsthe word

s in the input files.

 aggregatewordhist: An Aggregate based map/reduce program that computesthe his

togram of the words in the input files.

 dbcount: An example job that count the pageview counts from a database.

 grep: A map/reduce program that counts the matches of a regex in theinput.

 join: A job that effects a join over sorted, equally partitioneddatasets

 multifilewc: A job that counts words from several files.

 pentomino: A map/reduce tile laying program to find solutions topentomino pro

blems.

  pi:A map/reduce program that estimates Pi using monte-carlo method.

 randomtextwriter: A map/reduce program that writes 10GB of randomtextual data

 pernode.

 randomwriter: A map/reduce program that writes 10GB of random data pernode.

  secondarysort:An example defining a secondary sort to the reduce.

 sleep: A job that sleeps at each map and reduce task.

 sort: A map/reduce program that sorts the data written by the randomwriter.

 sudoku: A sudoku solver.

 teragen: Generate data for the terasort

 terasort: Run the terasort

 teravalidate: Checking results of terasort

//这个程序可以统计文件中的每个单词的出现次数

  wordcount: A map/reduce program that countsthe words in the input files.

xiaofeng@xiaofeng-PC/opt/hadoop

$ hadoop jarhadoop-0.20.2-examples.jar wordcount

Usage:wordcount <in> <out>

//这里给出了这个程序的使用方法

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop jar hadoop-0.20.2-examples.jarwordcount /test/wordcount.txt /test/result

//这里是运行hadoop-0.20.2-examples.jar这个jar文件来统计hdfs系统中test/下的wordcount.txt文件然后把统计结果放到/test/result文件中

运行的时候可以在http://localhost:50030/jobtracker.jsp这里查看进程的运行状态

RunningJobs

Jobid

Priority

User

Name

Map  % Complete

Map  Total

Maps  Completed

Reduce  % Complete

Reduce  Total

Reduces  Completed

Job  Scheduling Information

job_201304211117_0001

NORMAL

xiaofeng-pc\xiaofeng

word  count

0.00% 

1

0

0.00% 

1

0

NA

Completed Jobs

Jobid

Priority

User

Name

Map  % Complete

Map  Total

Maps  Completed

Reduce  % Complete

Reduce  Total

Reduces  Completed

Job  Scheduling Information

job_201304211117_0001

NORMAL

xiaofeng-pc\xiaofeng

word  count

100.00% 

1

1

100.00% 

1

1

NA

可以看到运行这个程序的时候先执行了Map,然后执行了Reduce,等完成了之后就不在RunningJobs中显示了,就会在Completed Jobs中显示100%

下面查看统计结果http://127.0.0.1:50070/点击进入hdfs系统,然后进入test下的result文件,查看

Goto parent directory

Name

Type

Size

Replication

Block Size

Modification Time

Permission

Owner

Group

_logs

dir

2013-04-21 11:52

rwxr-xr-x

xiaofeng-pc\xiaofeng

supergroup

part-r-00000

file

0.03 KB

3

64 MB

2013-04-21 11:52

rw-r--r--

xiaofeng-pc\xiaofeng

supergroup

 

这里的part-r-00000点击这个就可以查看了。

credream 65

hadoop    65

hello       65

可以看到已经统计出了结果

也可以通过命令查看:

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop fs -ls /test/result//列出文件

Found 2 items

drwxr-xr-x  - xiaofeng-pc\xiaofeng supergroup          0 2013-04-21 11:52 /test

/result/_logs

-rw-r--r--  3 xiaofeng-pc\xiaofeng supergroup        31 2013-04-21 11:52 /test

/result/part-r-00000

 

xiaofeng@xiaofeng-PC /opt/hadoop

$ hadoop fs -cat /test/result/part*//注意这里使用了通配符,也就是开头是part的文件

credream        65

hadoop 65

hello  65

 

xiaofeng@xiaofeng-PC/opt/hadoop

另外在D:\hadoop4win\opt\hadoop也可以看到hadoop-0.20.2-examples.jar这个例子程序

2MapReduce 范例操作-测试字符统计

先向服务器上传一文件:

执行统计:

root@linux:/home/wangjian#hadoop fs -copyFromLocal a.txt /wj/a.txt

root@linux:/home/wangjian#cd /opt/hadoop-0.20.2/

root@linux:/opt/hadoop-0.20.2#hadoop jar hadoop-0.20.2-examples.jar wordcount /wj/a.txt /wj/result2

4、查看结果:

默认情况下结果文件名为:/wj/result2/part-0000

 

root@linux:/opt/hadoop-0.20.2#hadoop fs -cat /wj/result2/part*

 

 

你可能感兴趣的:(云计算学习笔记---04-运行hadoop的例子程序:统计字符--wordcount例子程序)