hadoop2.0 相关问题(持续更新)

    搭建了一个hadoop2.0的测试集群,使用的是QJM HA方案,搭建配置过程就不在这里说了,晚上有很多资料。把遇到的一些问题总结一下:

        配置HA的时候,hdfs-site.xml文件中:

                
                        dfs.ha.automatic-failover.enabled
                        false
                

        我们在这里使用的是收到恢复故障,如果使用自动恢复,需要配置:

                
                        dfs.ha.fencing.methods
                         sshfence
                


                
                          dfs.ha.fencing.ssh.private-key-files
                          /home/qingwu.fu/.ssh/id_rsa
               

      公司安全规定不能设置无密码登陆,自己修改 /etc/hosts.allow ,ssh无密码登陆可以使用。过了几分钟就不好使了,查看 /etc/hosts.allow 文件,发现又恢复回去了。

      由于是测试集群,就没有向安全组申请,不过在ssh可以无密码登陆的时候测试过自动恢复,挺好用的。


      集群搭好以后,跑了一下wordcount 程序,发现有一点问题,任务提交以后总是不执行,原因是  nodemanager 在起container的时候总是处于reserved状态。hadoop2.0 也不会把这样的状态当成是错误,导致找了很长时间才找到问题的所在。首先需要了解一下container的几种基本状态:

       hadoop2.0 相关问题(持续更新)_第1张图片

        原来 container处于reserved状态是由于所需要的资源不能满足,等待nodemanager的资源达到container的需要才运行。

知道这一点就能推断出应该跟nodemanager的资源设置问题,也就是内存设置问题:

yarn-site.xml中与内存相关配置

      
                yarn.scheduler.minimum-allocation-mb
                512

                The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum.
       

       
                yarn.scheduler.maximum-allocation-mb
                4096
       


       

       
                yarn.nodemanager.resource.memory-mb
                4096

                Amount of physical memory, in MB, that can be allocated for containers.
       

mapred-site.xml中与内存相关配置:

      
                mapreduce.map.memory.mb
                1024
       



       
                mapreduce.map.java.opts
                -Xmx1024M
       


注意:mapreduce.map.memory.mb 和 yarn.scheduler.minimum-allocation-mb 的设置不要大于yarn.nodemanager.resource.memory-mb


查看namenode的镜像目录,会发现有很多的edit文件,而且是每隔一秒就会生成一个,这个跟以下配置有关:

      
                dfs.namenode.name.dir
                /export1/hadoop2/hdfs/namenode
       


       
                dfs.namenode.edits.dir
                ${dfs.namenode.name.dir}
                Determines where on the local filesystem the DFS name node
                                        should store the transaction (edits) file. If this is a comma-delimited list
                                       of directories then the transaction file is replicated in all of the
                                       directories, for redundancy. Default value is same as dfs.namenode.name.dir
                

        


       
                 dfs.namenode.num.extra.edits.retained
                 1000000
                 The number of extra transactions which should be retained
                                          beyond what is minimally necessary for a NN restart. This can be useful for
                                          audit purposes or for an HA setup where a remote Standby Node may have
                                          been offline for some time and need to have a longer backlog of retained
                                          edits in order to start again.
                                          Typically each edit is on the order of a few hundred bytes, so the default
                                          of 1 million edits should be on the order of hundreds of MBs or low GBs.

                  NOTE: Fewer extra edits may be retained than value specified for this setting
                               if doing so would mean that more segments would be retained than the number
                               configured by dfs.namenode.max.extra.edits.segments.retained.
                

         


         
                    dfs.ha.tail-edits.period
                   60
                  
                                How often, in seconds, the StandbyNode should check for new
                                finalized log segments in the shared edits log.
                  

         



你可能感兴趣的:(hadoop)