hadoop初始化配置详细

 

 配置相关的配置文件
2.5.x版本的配置文件在:$Hadoop_Home/etc/hadoop  目录下
2.X版本较1.X版本改动很大,主要是用Hadoop MapReduceV2(Yarn) 框架代替了一代的架构,其中JobTracker 和 TaskTracker 不见了,取而代之的是 ResourceManager, ApplicationMaster 与 NodeManager 三个部分,而具体的配置文件位置与内容也都有了相应变化,具体的可参考文献:http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/

(1)hadoop/etc/hadoop/hadoop-env.sh 与 hadoop/etc/hadoop/yarn-env.sh来配置两个文件里的JAVA_HOME

(2)etc/hadoop/core-site.xml,配置为:

  1. <configuration>

  2. <property>

  3. <name>hadoop.tmp.dir</name>

  4. <value>/home/hadoop/hadoop-2.5.2/tmp</value>  <!-- 自己的路径-->

  5. <description>A base for other temporary directories.</description>

  6. </property>

  7. <property>

  8. <name>fs.default.name</name>

  9. <value>hdfs://namenode:9000</value>

  10. </property>

  11. <property>

  12. <name>io.file.buffer.size</name>

  13. <value>131072</value>

  14. </property>

  15. <property>

  16. <name>hadoop.proxyuser.root.hosts</name>

  17. <value>namenode</value>

  18. </property>

  19. <property>

  20. <name>hadoop.proxyuser.root.groups</name>

  21. <value>*</value>

  22. </property>

  23. </configuration>


  24. 3)etc/hadoop/hdfs-site.xml,配置为: (注意:这里需要自己手动用mkdir创建name和data文件夹,具体位置也可以自己选择,其中dfs.replication的值建议配置为与分布式 cluster 中实际的 DataNode 主机数一致。)

  25. <configuration>

  26. <property>

  27. <name>dfs.namenode.name.dir</name>

  28. <value>/home/hadoop/hadoop-2.5.2/hdfs/name</value> <!-- 一定要手动建立 data 和 name-->

  29. <final>true</final>

  30. </property>

  31. <property>

  32. <name>dfs.datanode.data.dir</name>

  33. <value>/home/hadoop/hadoop-2.5.2/hdfs/data</value <!-- 自己的路径-->

  34. <final>true</final>

  35. </property>

  36. <property>

  37. <name>dfs.replication</name>

  38. <value>2</value>

  39. </property>

  40. <property>

  41. <name>dfs.permissions</name>

  42. <value>false</value>

  43. </property>

  44. </configuration>

    (4)etc/hadoop/mapred-site.xml,配置为:
  45. <configuration>
  46. <property>
    <name>mapreduce.framework.name</name>
    <value>Yarn</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>namenode:10020</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>namenode:19888</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>/mr-history/tmp</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>/mr-history/done</value>
    </property>
    </configuration>
  47. (5)etc/hadoop/yarn-site.xml对yarn进行配置:
  48. <configuration>
    <!-- Site specific YARN configuration properties -->
    <property>
    <name>Yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    </property>
    <property>
    <name>Yarn.resourcemanager.address</name>
    <value>namenode:18040</value>
    </property>
     
    <property>
    <name>Yarn.resourcemanager.scheduler.address</name>
    <value>namenode:18030</value>
    </property>
     
    <property>
    <name>Yarn.resourcemanager.resource-tracker.address</name>
    <value>namenode:18025</value>
    </property>
     
    <property>
    <name>Yarn.resourcemanager.admin.address</name>
    <value>namenode:18041</value>
    </property>
     
    <property>
    <name>Yarn.resourcemanager.webapp.address</name>
    <value>namenode:8088</value>
    </property>
     
    <property>
    <name>Yarn.nodemanager.local-dirs</name>
    <value>/home/zhang/hadoop-2.5.2/mynode/my</value>
    </property>
     
    <property>
    <name>Yarn.nodemanager.log-dirs</name>
    <value>/home/zhang/hadoop-2.5.2/mynode/logs</value>
    </property>
     
    <property>
    <name>Yarn.nodemanager.log.retain-seconds</name>
    <value>10800</value>
    </property>
     
    <property>
    <name>Yarn.nodemanager.remote-app-log-dir</name>
    <value>/logs</value>
    </property>
     
    <property>
    <name>Yarn.nodemanager.remote-app-log-dir-suffix</name>
    <value>logs</value>
    </property>
     
    <property>
    <name>Yarn.log-aggregation.retain-seconds</name>
    <value>-1</value>
    </property>
     
    <property>
    <name>Yarn.log-aggregation.retain-check-interval-seconds</name>
    <value>-1</value>
    </property>
     
     
    </configuration>

    注:本文配置文件主要参考 呆呆笨笨的鱼  http://blog.itpub.net/28929558/viewspace-1354180/

你可能感兴趣的:(hadoop初始化配置详细)