注:开源力量Hadoop Development网络培训,链接:http://new.osforce.cn/course/52 个人笔记,不具参考性。
说明:
1)实验采用的是hadoop1.x,hadoop2.x的代码做一个分析
2)学习hadoop的基础知识;java基础;Linux操作基础
MapRedcue集群环境搭建
MapReduce WordCount运行
Hadoop Eclipse插件
bin目录下执行:./start-mapred.sh,启动jobtracker和tasktracker
问题:启动后使用jps命令,看不到上述tracker在运行。
原因:通过logs目录里的hadoop-michaelchen-tasktracker-mars.clustertech.com.log,查看后发现这么一条:Can not start task tracker because java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local
解决方法:没有配置 mapred-site.xml,配置如下。再次启动,使用jps可以看到jobtracker和tasktracker
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
bin目录下创建 vim a.txt,里面写一些单词作为输入文件。
1 hello world 2 hello java 3 java c++
./hadoop fs -mkdir /input ./hadoop fs -put a.txt /input ./hadoop jar ../hadoop-examples-1.2.1.jar wordcount /input /output
13/12/04 12:23:07 INFO input.FileInputFormat: Total input paths to process : 1 13/12/04 12:23:07 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/12/04 12:23:07 WARN snappy.LoadSnappy: Snappy native library not loaded 13/12/04 12:23:08 INFO mapred.JobClient: Running job: job_201312041206_0001 13/12/04 12:23:09 INFO mapred.JobClient: map 0% reduce 0% 13/12/04 12:23:22 INFO mapred.JobClient: map 100% reduce 0% 13/12/04 12:23:34 INFO mapred.JobClient: map 100% reduce 100% 13/12/04 12:23:38 INFO mapred.JobClient: Job complete: job_201312041206_0001 13/12/04 12:23:38 INFO mapred.JobClient: Counters: 29 13/12/04 12:23:38 INFO mapred.JobClient: Job Counters 13/12/04 12:23:38 INFO mapred.JobClient: Launched reduce tasks=1 13/12/04 12:23:38 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=14199 13/12/04 12:23:38 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/12/04 12:23:38 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/12/04 12:23:38 INFO mapred.JobClient: Launched map tasks=1 13/12/04 12:23:38 INFO mapred.JobClient: Data-local map tasks=1 13/12/04 12:23:38 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=12168 13/12/04 12:23:38 INFO mapred.JobClient: File Output Format Counters 13/12/04 12:23:38 INFO mapred.JobClient: Bytes Written=29 13/12/04 12:23:38 INFO mapred.JobClient: FileSystemCounters 13/12/04 12:23:38 INFO mapred.JobClient: FILE_BYTES_READ=51 13/12/04 12:23:38 INFO mapred.JobClient: HDFS_BYTES_READ=134 13/12/04 12:23:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=117313 13/12/04 12:23:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29 13/12/04 12:23:38 INFO mapred.JobClient: File Input Format Counters 13/12/04 12:23:38 INFO mapred.JobClient: Bytes Read=32 13/12/04 12:23:38 INFO mapred.JobClient: Map-Reduce Framework 13/12/04 12:23:38 INFO mapred.JobClient: Map output materialized bytes=51 13/12/04 12:23:38 INFO mapred.JobClient: Map input records=3 13/12/04 12:23:38 INFO mapred.JobClient: Reduce shuffle bytes=51 13/12/04 12:23:38 INFO mapred.JobClient: Spilled Records=8 13/12/04 12:23:38 INFO mapred.JobClient: Map output bytes=56 13/12/04 12:23:38 INFO mapred.JobClient: Total committed heap usage (bytes)=181141504 13/12/04 12:23:38 INFO mapred.JobClient: CPU time spent (ms)=5400 13/12/04 12:23:38 INFO mapred.JobClient: Combine input records=6 13/12/04 12:23:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=102 13/12/04 12:23:38 INFO mapred.JobClient: Reduce input records=4 13/12/04 12:23:38 INFO mapred.JobClient: Reduce input groups=4 13/12/04 12:23:38 INFO mapred.JobClient: Combine output records=4 13/12/04 12:23:38 INFO mapred.JobClient: Physical memory (bytes) snapshot=175915008 13/12/04 12:23:38 INFO mapred.JobClient: Reduce output records=4 13/12/04 12:23:38 INFO mapred.JobClient: Virtual memory (bytes) snapshot=787206144 13/12/04 12:23:38 INFO mapred.JobClient: Map output records=6
drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:20 /input -rw-r--r-- 1 michaelchen supergroup 32 2013-12-04 12:20 /input/a.txt drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:23 /output -rw-r--r-- 1 michaelchen supergroup 0 2013-12-04 12:23 /output/_SUCCESS drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:23 /output/_logs drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:23 /output/_logs/history -rw-r--r-- 1 michaelchen supergroup 13815 2013-12-04 12:23 /output/_logs/history/job_201312041206_0001_1386130987951_michaelchen_word+count -rw-r--r-- 1 michaelchen supergroup 49533 2013-12-04 12:23 /output/_logs/history/job_201312041206_0001_conf.xml -rw-r--r-- 1 michaelchen supergroup 29 2013-12-04 12:23 /output/part-r-00000 drwxr-xr-x - michaelchen supergroup 0 2013-12-04 10:26 /system drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:17 /tmp drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:17 /tmp/hadoop-michaelchen drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:23 /tmp/hadoop-michaelchen/mapred drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:23 /tmp/hadoop-michaelchen/mapred/staging drwxr-xr-x - michaelchen supergroup 0 2013-12-04 12:23 /tmp/hadoop-michaelchen/mapred/staging/michaelchen drwx------ - michaelchen supergroup 0 2013-12-04 12:23 /tmp/hadoop-michaelchen/mapred/staging/michaelchen/.staging drwx------ - michaelchen supergroup 0 2013-12-04 12:23 /tmp/hadoop-michaelchen/mapred/system -rw------- 1 michaelchen supergroup 4 2013-12-04 12:17 /tmp/hadoop-michaelchen/mapred/system/jobtracker.info drwxr-xr-x - michaelchen supergroup 0 2013-12-04 10:33 /user drwxr-xr-x - michaelchen supergroup 0 2013-12-04 10:33 /user/michaelchen drwxr-xr-x - michaelchen supergroup 0 2013-12-04 10:33 /user/michaelchen/archiveDir drwxr-xr-x - michaelchen supergroup 0 2013-12-04 10:33 /user/michaelchen/archiveDir/pack.har -rw-r--r-- 1 michaelchen supergroup 0 2013-12-04 10:33 /user/michaelchen/archiveDir/pack.har/_SUCCESS -rw-r--r-- 5 michaelchen supergroup 72 2013-12-04 10:33 /user/michaelchen/archiveDir/pack.har/_index -rw-r--r-- 5 michaelchen supergroup 22 2013-12-04 10:33 /user/michaelchen/archiveDir/pack.har/_masterindex -rw-r--r-- 1 michaelchen supergroup 15147 2013-12-04 10:33 /user/michaelchen/archiveDir/pack.har/part-0 drwxr-xr-x - michaelchen supergroup 0 2013-12-04 10:31 /xwchen -rw-r--r-- 1 michaelchen supergroup 15147 2013-12-04 10:31 /xwchen/hadoop
c++ 1 hello 2 java 2 world 1
Hadoop Eclipse插件
1)hadoop自0.20.x版本后不再提供现成的hadoop-eclipse插件,而是给出了源码自行编译
2)编译采用的是ant+ivy
3)URL:http://wiki.apache.org/hadoop/EclipsePlugin
编译的整个过程很是容易出问题,建议参考此链接:http://www.srccodes.com/p/article/30/build-hadoop-eclipse-plugin-jar-from-source-code-and-install-that-plugin-in-eclipse-ide
链接中注意版本号可能会有更改。
不过我编译的hadoop plugin的插件还是在eclipse中无法使用,最后用的网上下载的。。。
无法使用是出现在eclipse菜单里,show-others里编辑mapreduce路径,点new,弹不出窗口,泪奔啊~~
WordCount里用的相关Jar包依赖如下:
输入参数:hdfs://192.168.56.101:9100/input /output
注意output之前不能存在,否则报错。
其他参考视频。在1.2.1中编译成功,运行成功。
试题中的一些笔记: