hadoop2.0 相关问题(持续更新)

    搭建了一个hadoop2.0的测试集群,使用的是QJM HA方案,搭建配置过程就不在这里说了,晚上有很多资料。把遇到的一些问题总结一下:

        配置HA的时候,hdfs-site.xml文件中:

                 <property>
                        <name>dfs.ha.automatic-failover.enabled</name>
                        <value>false</value>
                 </property>

        我们在这里使用的是收到恢复故障,如果使用自动恢复,需要配置:

                 <property>
                        <name>dfs.ha.fencing.methods</name>
                         <value>sshfence</value>
                 </property>

                 <property>
                          <name>dfs.ha.fencing.ssh.private-key-files</name>
                          <value>/home/qingwu.fu/.ssh/id_rsa</value>
                </property>

      公司安全规定不能设置无密码登陆,自己修改 /etc/hosts.allow ,ssh无密码登陆可以使用。过了几分钟就不好使了,查看 /etc/hosts.allow 文件,发现又恢复回去了。

      由于是测试集群,就没有向安全组申请,不过在ssh可以无密码登陆的时候测试过自动恢复,挺好用的。


      集群搭好以后,跑了一下wordcount 程序,发现有一点问题,任务提交以后总是不执行,原因是  nodemanager 在起container的时候总是处于reserved状态。hadoop2.0 也不会把这样的状态当成是错误,导致找了很长时间才找到问题的所在。首先需要了解一下container的几种基本状态:

       hadoop2.0 相关问题(持续更新)_第1张图片

        原来 container处于reserved状态是由于所需要的资源不能满足,等待nodemanager的资源达到container的需要才运行。

知道这一点就能推断出应该跟nodemanager的资源设置问题,也就是内存设置问题:

yarn-site.xml中与内存相关配置

       <property>
                <name>yarn.scheduler.minimum-allocation-mb</name>
                <value>512</value>

                <descript>The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum.</descript>
        </property>

        <property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>4096</value>
        </property>

        <!-- configuration for nodemanager -->

        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>4096</value>

                <descript>Amount of physical memory, in MB, that can be allocated for containers.</descript>
        </property>

mapred-site.xml中与内存相关配置:

       <property>
                <name>mapreduce.map.memory.mb</name>
                <value>1024</value>
        </property>


        <property>
                <name>mapreduce.map.java.opts</name>
                <value>-Xmx1024M</value>
        </property>

注意:mapreduce.map.memory.mb 和 yarn.scheduler.minimum-allocation-mb 的设置不要大于yarn.nodemanager.resource.memory-mb


查看namenode的镜像目录,会发现有很多的edit文件,而且是每隔一秒就会生成一个,这个跟以下配置有关:

       <property>
                <name>dfs.namenode.name.dir</name>
                <value>/export1/hadoop2/hdfs/namenode</value>
        </property>


        <property>
                <name>dfs.namenode.edits.dir</name>
                <value>${dfs.namenode.name.dir}</value>
                <description>Determines where on the local filesystem the DFS name node
                                        should store the transaction (edits) file. If this is a comma-delimited list
                                       of directories then the transaction file is replicated in all of the
                                       directories, for redundancy. Default value is same as dfs.namenode.name.dir
                 </description>
         </property>


        <property>
                 <name>dfs.namenode.num.extra.edits.retained</name>
                 <value>1000000</value>
                 <description>The number of extra transactions which should be retained
                                          beyond what is minimally necessary for a NN restart. This can be useful for
                                          audit purposes or for an HA setup where a remote Standby Node may have
                                          been offline for some time and need to have a longer backlog of retained
                                          edits in order to start again.
                                          Typically each edit is on the order of a few hundred bytes, so the default
                                          of 1 million edits should be on the order of hundreds of MBs or low GBs.

                  NOTE: Fewer extra edits may be retained than value specified for this setting
                               if doing so would mean that more segments would be retained than the number
                               configured by dfs.namenode.max.extra.edits.segments.retained.
                 </description>
          </property>


          <property>
                    <name>dfs.ha.tail-edits.period</name>
                   <value>60</value>
                   <description>
                                How often, in seconds, the StandbyNode should check for new
                                finalized log segments in the shared edits log.
                   </description>
          </property>



你可能感兴趣的:(hadoop2.0 相关问题(持续更新))