昨天搭建好了hadoop的环境,现在执行hadoop的例子验证一下。
在NameNode下建立两个文件
root@wenbo00:/home/wenbo# echo 'Hello world bye world' > file01 root@wenbo00:/home/wenbo# echo 'hello hadoop goodbye hadoop' > file02
在hdfs中建立一个input目录
root@wenbo00:/home/wenbo# hadoop fs -mkdir input
将刚才建好的两个文件复制到input目录下
root@wenbo00:/home/wenbo# hadoop fs -copyFromLocal file0* input
执行以下命令可以看到复制结果
root@wenbo00:/home/hadoop-1.0.1# hadoop fs -ls input/
看到的结果为
Found 2 items -rw-r--r-- 1 root supergroup 22 2012-03-13 19:44 /user/root/input/file01 -rw-r--r-- 1 root supergroup 28 2012-03-13 19:44 /user/root/input/file02
执行hadoop自带的worldcount例子,并将结果输出到output目录中
root@wenbo00:/home/wenbo# hadoop jar /home/hadoop-1.0.1/hadoop-examples-1.0.1.jar wordcount input output
可以看到以下的输出log
****hdfs://wenbo00:9000/user/root/input 12/03/13 19:47:21 INFO input.FileInputFormat: Total input paths to process : 2 12/03/13 19:47:22 INFO mapred.JobClient: Running job: job_201203131940_0001 12/03/13 19:47:23 INFO mapred.JobClient: map 0% reduce 0% 12/03/13 19:47:37 INFO mapred.JobClient: map 50% reduce 0% 12/03/13 19:47:40 INFO mapred.JobClient: map 100% reduce 0% 12/03/13 19:47:52 INFO mapred.JobClient: map 100% reduce 100% 12/03/13 19:47:57 INFO mapred.JobClient: Job complete: job_201203131940_0001 12/03/13 19:47:57 INFO mapred.JobClient: Counters: 30 12/03/13 19:47:57 INFO mapred.JobClient: Job Counters 12/03/13 19:47:57 INFO mapred.JobClient: Launched reduce tasks=1 12/03/13 19:47:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19732 12/03/13 19:47:57 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/03/13 19:47:57 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/03/13 19:47:57 INFO mapred.JobClient: Rack-local map tasks=1 12/03/13 19:47:57 INFO mapred.JobClient: Launched map tasks=2 12/03/13 19:47:57 INFO mapred.JobClient: Data-local map tasks=1 12/03/13 19:47:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14004 12/03/13 19:47:57 INFO mapred.JobClient: File Output Format Counters 12/03/13 19:47:57 INFO mapred.JobClient: Bytes Written=49 12/03/13 19:47:57 INFO mapred.JobClient: FileSystemCounters 12/03/13 19:47:57 INFO mapred.JobClient: FILE_BYTES_READ=79 12/03/13 19:47:57 INFO mapred.JobClient: HDFS_BYTES_READ=264 12/03/13 19:47:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=64654 12/03/13 19:47:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=49 12/03/13 19:47:57 INFO mapred.JobClient: File Input Format Counters 12/03/13 19:47:57 INFO mapred.JobClient: Bytes Read=50 12/03/13 19:47:57 INFO mapred.JobClient: Map-Reduce Framework 12/03/13 19:47:57 INFO mapred.JobClient: Map output materialized bytes=85 12/03/13 19:47:57 INFO mapred.JobClient: Map input records=2 12/03/13 19:47:57 INFO mapred.JobClient: Reduce shuffle bytes=85 12/03/13 19:47:57 INFO mapred.JobClient: Spilled Records=12 12/03/13 19:47:57 INFO mapred.JobClient: Map output bytes=82 12/03/13 19:47:57 INFO mapred.JobClient: CPU time spent (ms)=3000 12/03/13 19:47:57 INFO mapred.JobClient: Total committed heap usage (bytes)=336404480 12/03/13 19:47:57 INFO mapred.JobClient: Combine input records=8 12/03/13 19:47:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=214 12/03/13 19:47:57 INFO mapred.JobClient: Reduce input records=6 12/03/13 19:47:57 INFO mapred.JobClient: Reduce input groups=6 12/03/13 19:47:57 INFO mapred.JobClient: Combine output records=6 12/03/13 19:47:57 INFO mapred.JobClient: Physical memory (bytes) snapshot=384741376 12/03/13 19:47:57 INFO mapred.JobClient: Reduce output records=6 12/03/13 19:47:57 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1573851136 12/03/13 19:47:57 INFO mapred.JobClient: Map output records=8
最终计算结果为8个单词
可以查看output中的输出查看以下
利用以下命令可以查看output文件夹下的生成文件
root@wenbo00:/home/hadoop-1.0.1# hadoop fs -ls output
结果为
Found 3 items -rw-r--r-- 1 root supergroup 0 2012-03-13 19:47 /user/root/output/_SUCCESS drwxr-xr-x - root supergroup 0 2012-03-13 19:47 /user/root/output/_logs -rw-r--r-- 1 root supergroup 49 2012-03-13 19:47 /user/root/output/part-r-00000
part-r-00000中存放了执行的结果
root@wenbo00:/home/hadoop-1.0.1# hadoop fs -cat output/part-r-00000
结果为
Hello 1 bye 1 goodbye 1 hadoop 2 hello 1 world 2
共8个单词,执行成功。