Linux执行 Hadoop WordCount
Ubuntu 终端进入快捷键 :ctrl + Alt +t
hadoop启动命令:start-all.sh
正常执行效果如下:
hadoop@HADOOP:~$ start-all.sh
Warning: $HADOOP_HOME is deprecated.
starting namenode, logging to /home/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-hadoop-namenode-HADOOP.MAIN.out
HADOOP.MAIN: starting datanode, logging to/home/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-hadoop-datanode-HADOOP.MAIN.out
HADOOP.MAIN: starting secondarynamenode,logging to/home/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-hadoop-secondarynamenode-HADOOP.MAIN.out
starting jobtracker, logging to/home/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-hadoop-jobtracker-HADOOP.MAIN.out
HADOOP.MAIN: starting tasktracker, loggingto /home/hadoop/hadoop-1.1.2/libexec/../logs/hadoop-hadoop-tasktracker-HADOOP.MAIN.out
jps命令查看启动的hadoop服务
hadoop@HADOOP:~$ jps
3615 Jps
2699 NameNode
3461 TaskTracker
2922 DataNode
3137 SecondaryNameNode
3231 JobTracker
本地创建一个文件夹
hadoop@HADOOP:~$ mkdir ~/file
在file文件创建两个txt文件
hadoop@HADOOP:~$ cd file
hadoop@HADOOP:~/file$ echo "Hello World" > file1.txt
hadoop@HADOOP:~/file$ echo "Hello Hadoop" > file2.txt
hadoop@HADOOP:~/file$ ls
file1.txt file2.txt
hadoop@HADOOP:~/file$
在HDFS上创建一个输入文件夹
hadoop@HADOOP:~/file$ hadoop fs -mkdir /input
查看创建的input文件夹路径
hadoop@HADOOP:~$ hadoop fs -ls
Warning: $HADOOP_HOME is deprecated.
Found 5 items
-rw-r--r-- 3 Administrator supergroup 6296230 2014-09-03 10:38 cloud.txt
drwxr-xr-x - hadoop supergroup 0 2014-09-02 16:31 hadi_curbm
drwxr-xr-x - hadoop supergroup 0 2014-09-04 09:59 /input
drwxr-xr-x - hadoop supergroup 0 2014-09-02 16:31 /pegasus
hadoop@HADOOP:~$
可以看到目录被创建到 /input 目录
上传本地file文件到input目录
hadoop@HADOOP:~$hadoop fs -put ~/file/*.txt /input
(如果要查看是否上传到了input目录下了可以用 hadoop fs -ls /input)
找到hadoop目录下的examples.jar 程序包(就是查找你打包好的jar包的位置,如果知道在什么位置可以不用这步)
hadoop@HADOOP:~$ cd hadoop-1.1.2
hadoop@HADOOP:~/hadoop-1.1.2$ ls
bin docs hadoop-test-1.1.2.jar LICENSE.txt src
build.xml hadoop-ant-1.1.2.jar hadoop-tools-1.1.2.jar logs webapps
c++ hadoop-client-1.1.2.jar ivy NOTICE.txt wordcount.jar
CHANGES.txt hadoop-core-1.1.2.jar ivy.xml README.txt
conf hadoop-examples-1.1.2.jar lib sbin
contrib hadoop-minicluster-1.1.2.jar libexec share
hadoop@HADOOP:~/hadoop-1.1.2$
执行jar程序代码 统计input目录下文件的Wordcount(注意:jar包后面填写的类名最好把包名带上,路径写全免得报错com.mc.WordCount,其中com.mc是包名WordCount是类名,是jar包中main方法所在类的类名)
hadoop@HADOOP:~$ hadoop jar /home/hadoop/hadoop-1.1.2/hadoop-examples-1.1.2.jar com.mc.WordCount /input /output
jar 包的位置 输入 输出
Warning: $HADOOP_HOME is deprecated.
14/09/04 10:10:44 INFOinput.FileInputFormat: Total input paths to process : 0
14/09/04 10:10:45 INFO mapred.JobClient:Running job: job_201409040943_0001
14/09/04 10:10:46 INFOmapred.JobClient: map 0% reduce 0%
14/09/04 10:10:54 INFOmapred.JobClient: map 0% reduce 100%
14/09/04 10:10:55 INFO mapred.JobClient:Job complete: job_201409040943_0001
14/09/04 10:10:55 INFO mapred.JobClient:Counters: 18
14/09/04 10:10:55 INFOmapred.JobClient: Job Counters
14/09/04 10:10:55 INFOmapred.JobClient: Launched reducetasks=1
14/09/04 10:10:55 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=4087
14/09/04 10:10:55 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/09/04 10:10:55 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/09/04 10:10:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=4068
14/09/04 10:10:55 INFOmapred.JobClient: File Output FormatCounters
14/09/04 10:10:55 INFOmapred.JobClient: Bytes Written=0
14/09/04 10:10:55 INFOmapred.JobClient: FileSystemCounters
14/09/04 10:10:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=55309
14/09/04 10:10:55 INFOmapred.JobClient: Map-Reduce Framework
14/09/04 10:10:55 INFOmapred.JobClient: Reduce inputgroups=0
14/09/04 10:10:55 INFOmapred.JobClient: Combine outputrecords=0
14/09/04 10:10:55 INFOmapred.JobClient: Reduce shufflebytes=0
14/09/04 10:10:55 INFOmapred.JobClient: Physical memory(bytes) snapshot=35037184
14/09/04 10:10:55 INFOmapred.JobClient: Reduce outputrecords=0
14/09/04 10:10:55 INFO mapred.JobClient: Spilled Records=0
14/09/04 10:10:55 INFOmapred.JobClient: CPU time spent(ms)=120
14/09/04 10:10:55 INFOmapred.JobClient: Total committedheap usage (bytes)=15925248
14/09/04 10:10:55 INFOmapred.JobClient: Virtual memory(bytes) snapshot=377499648
14/09/04 10:10:55 INFOmapred.JobClient: Combine inputrecords=0
14/09/04 10:10:55 INFOmapred.JobClient: Reduce inputrecords=0
hadoop@HADOOP:~$
显示结果
hadoop@HADOOP:~$ hadoop fs -ls /output
Warning: $HADOOP_HOME is deprecated.
Found 3 items
-rw-r--r-- 1 hadoop supergroup 02014-09-04 10:10 /user/hadoop/output/_SUCCESS
drwxr-xr-x - hadoop supergroup 02014-09-04 10:10 /user/hadoop/output/_logs
-rw-r--r-- 1 hadoop supergroup 02014-09-04 10:10 /output/part-r-00000
hadoop@HADOOP:~$
查看执行结果
hadoop@HADOOP:~$ hadoop fs -cat /output/part-r-00000
Hadoop 1
Hello 2
World 1
① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩
(注意:/output输出目录自己不要写,系统会自动生成,另外用过后要记得删掉 Hadoop fs -rmr /output,不然下次再次运行会报已存在这个目录的错误
)
大致思路如下:
{① 在开发工具(如:eclipse)上编译好WordCount代码,
②通过eclipse将其打包,将打包后的jar包放在自己指定的位置(上例是放在/home/hadoop/hadoop-1.1.2/hadoop-examples-1.1.2.jar 这个位置的),
③ 在本地建一个文件夹(上例叫file),该目录里存放数据,可以自己写入,也可以copy现有的数据放进去,文件格式不固定(上例中是自己写的echo "Hello World" > file1.txt,是以txt的文件形式),
④在HDFS上创建一个输入文件夹(上例是/input),把本地(file文件夹)的数据传到/input中,也可以说是复制进去,
⑤执行jar包,用jar包中的程序去处理传入的数据,
⑥查看处理结果(处理的结果会自动保存到了输出文件夹中(上例中/output)),
调出输出文件中的数据查看输出结果}
关于打包jar包:http://blog.csdn.net/little_stars/article/details/8880647(一个类的话,就直接在类上点击export打包)