hadoop 0.23 配置(启动、跑mapRedcue、web UI)

启动有几个部分:
1,hdfs;
其中包括、namenode、datanode、secondaryNamenode、backupNode
在namenode上执行:./start-dfs.sh 即可启动nn、dn、snn,
backupNode需要在backupNode上去执行:
nohup ./hdfs namenode -backup > backupNode.out &

2,yarn的启动:
直接./start-yarn.sh即可,没啥难的。
这个脚本自动会将resourcemanager和 resourceNode启动。


然后就可以准备跑第一MapReduce程序啦。
但是在之前,需要建立mr的配置文件,
我不知道为什么,apache的0.23.1的HADOOP_HOME/etc/hadoop里面没有mapred-site.xml,
需要建立,然后对其进行配置,如果不配置会在本地跑mr程序。
我配置了以下几项:


Execution framework set to Hadoop YARN.
mapreduce.framework.name
yarn



Larger resource limit for maps.
mapreduce.map.memory.mb
1536



Larger heap-size for child jvms of maps.
mapreduce.map.java.opts
-Xmx1024M



Larger resource limit for reduces.
mapreduce.reduce.memory.mb
3072



Larger heap-size for child jvms of reduces.
mapreduce.reduce.java.opts
-Xmx2560M




Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.mb
512



More streams merged at once while sorting files.
mapreduce.task.io.sort.factor
100




Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.

mapreduce.reduce.shuffle.parallelcopies
50




mapreduce.jobhistory.address
rmHost:10020




mapreduce.jobhistory.webapp.address
rmHost:19888




More streams merged at once while sorting files.
mapreduce.task.io.sort.factor
100



然后就可以跑mr程序啦。
执行例子程序:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar wordcount input output
.


另外关于web查看的入口:
Daemon Web Interface Notes
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.


参考资料:
http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html

你可能感兴趣的:(hadoop)