Hadoop2.6.0学习笔记(一)HA集群搭建

鲁春利的工作笔记,谁说程序员不能有文艺范?


hadoop环境分主要分为单机、伪分布式以及集群,这里主要记录集群环境的搭建。

1、安装说明

    系统:CentOS-6.5-x86_64

    关闭防火墙和SELinux:
        service iptables status
        service iptables stop
        chkconfig iptables off
    
        vi /etc/sysconfig/selinux          
        设置 SELINUX=disabled
   

    通过VMWare安装了三个虚拟机   

    设置静态IP地址:
        vi /etc/sysconfig/network-scripts/ifcfg-eth0
        DEVICE="eth0"
        NM_CONTROLLED="yes"
        NAME="System eth0"
        BOOTPROTO=none
        ONBOOT="yes"
        TYPE="Ethernet"
        UUID="d0bfa44e-951f-4b4c-b002-4b41aff8ddfc"
        IPADDR=192.168.137.117(需要修改为自己想要的IP
        PREFIX=24
        GATEWAY=192.168.137.1(默认网关)

        DNS1=114.114.115.115(免费的域名解析服务器)
        DEFROUTE=yes
        IPV4_FAILURE_FATAL=yes
        IPV6INIT=no
        HWADDR=00:0C:29:60:17:AF
        LAST_CONNECT=1435483515  

 
    修改HostName:
        vi /etc/sysconfig/network
        三台主机名分别为:
            nnode
            dnode1

            dnode2
        
    IP与HostName绑定:
        vi /etc/hosts
            192.168.137.117    nnode      nnode
            192.168.137.118    dnode1    dnode1
            192.168.137.119    dnode2    dnode2
            
    解除 Linux 系统的最大进程数和最大文件打开数限制
        vim /etc/security/limits.conf
        
        # 添加如下的行
        * soft nofile 4100
        * hard nofile 4100
        
        * 代表针对所有用户
        noproc 是代表最大进程数
        nofile 是代表最大文件打开数


2、 JDK安装及配置
    jdk-7u75-linux-x64.tar.gz
    
    解压tar包:
        tar -xzv -f jdk-7u75-linux-x64.tar.gz -C /usr/local/jdk1.7
    
    配置环境变量:
        vi /etc/profile
        添加内容:
            export JAVA_HOME=/usr/local/jdk1.7
            export PATH=$PATH:$JAVA_HOME/bin
            export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    测试验证
        java -version


3、 SSH免密码登录配置
    su - hadoop
    
    三台机器分别生成公钥和私钥
        ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
            
        默认在 ~/.ssh目录生成两个文件:
            id_dsa      :私钥
            id_dsa.pub  :公钥
            
        chmod 700 ~/.ssh(必须是700
        chmod 600 ~/.ssh/authorized_keys(最好是600


       分发认证文件:

            将dnode1和dnode2的authorized_keys文件拷贝到nnode的/home/hadoop目录下,并将其中的内容附加到nnode的authorized_keys文件中

            cat authorized_keys >> /home/hadoop/.ssh/authorized_keys

            最后将最终的authorized_keys分发到另外两台机器dnode1和dnode2上

    

    说明:
        若执行了上述操作后登录仍提示输入密码,建议查看/var/log/secure文件的输出,确认是否为权限问题。


4、新增hadoop用户

        出于安全考虑使用普通用户而非系统管理员root用户来调用hadoop

        groupadd hadoop

        useradd -g hadoop hadoop

        passwd hadoop

        输入密码:hadoop


5、配置zookeeper

        zookeeper-3.4.6.tar.gz

        解压tar包:

        tar -xzv -f zookeeper-3.4.6.tar.gz -C /usr/local/zookeeper.3.4.6


        增加zookeeper的环境变量配置

[hadoop@nnode ~]$ vim .bash_profile 
export ZOOKEEPER_HOME=/usr/local/zookeeper3.4.6
export PATH=$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf:$PATH

        

        修改配置文件:

        拷贝zoo_sample.cfg到zoo.cfg

    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between 
    # sending a request and getting an acknowledgement
    syncLimit=5

    # the directory where the snapshot is stored.
    dataDir=/usr/local/zookeeper3.4.6/data
    # the directory where transaction files is stored.
    dataLogDir=/usr/local/zookeeper3.4.6/logs
    # the port at which the clients will connect
    clientPort=2181

    # the maximum number of client connections.
    maxClientCnxns=100
    # 在集群模式下,集群中的每台机器都需要感知到整个集群是由哪几台机器组成;
    # 在配置文件中,按如下格式配置,每行代表一个机器配置server.id=host:port:port
    # id为Server ID,标识机器在集群中的机器序号(1~255)
    server.1=nnode:2888:3888
    server.2=dnode1:2888:3888
    server.3=dnode2:2888:3888

        其中的server.1、server.2、server.3需要在每台机器的dataDir对应的目录下新建名为myid的文本文件,文件的内容分别为1、2、3(对应于server点后面的数字)。   

        [hadoop@nnode data]$ pwd
        /usr/local/zookeeper3.4.6/data
        [hadoop@nnode data]$ ll
        total 12
        -rw-rw-r-- 1 hadoop hadoop    2 May 13 11:19 myid
        drwxrwxr-x 2 hadoop hadoop 4096 Jul 18 21:07 version-2
        -rw-rw-r-- 1 hadoop hadoop    5 Jul 18 17:27 zookeeper_server.pid
        [hadoop@nnode data]$ cat myid 
        1
        [hadoop@nnode data]$


6、Hadoop安装配置

        hadoop-2.6.0.tar.gz
    
        解压tar包:
        tar -xzv -f hadoop-2.6.0.tar.gz -C /usr/local/hadoop2.6.0


        修改配置文件(/usr/local/hadoop2.6.0/etc/hadoop/目录下)

        a.) hadoop-env.sh

        # 配置hadoop运行时的环境变量

        export JAVA_HOME=/usr/local/jdk1.7

        export HADOOP_HOME=/usr/local/hadoop2.6.0

        

        b.) mapred-env.sh

        # 配置mapreduce运行时的环境变量

        export JAVA_HOME=/usr/local/jdk1.7


        c.) yarn-env.sh

        # 配置yarn运行时的环境变量

        export JAVA_HOME=/usr/local/jdk1.7


        d.) core-site.xml

        <configuration>                
                <!-- version of this configuration file -->
                <property>
                        <name>hadoop.common.configuration.version</name>
                        <value>0.23.0</value>
                </property>
                <!-- 这里的值指的是默认的HDFS路径。当有多个HDFS集群同时工作时,
                用户如果不写集群名称,那么默认使用哪个哪?在这里指定!
                该值来自于hdfs-site.xml中的配置 -->
                <property>
                        <name>fs.defaultFS</name>
                        <value>hdfs://cluster</value>
                </property>
                <!--  这里的路径默认是NameNode、DataNode、JournalNode等存放数据的公共目录。
                用户也可以自己单独指定这三类节点的目录。-->
                <!-- 默认值为/tmp/hadoop-${user.name},linux下的/tmp目录容易被清空,
                建议指定自己的目录 -->
                <property>
                        <name>hadoop.tmp.dir</name>
                        <value>/usr/local/hadoop2.6.0/tmp</value>
                </property>
                <!-- config zookeeper for ha -->
                <!-- 这里是ZooKeeper集群的地址和端口。注意,数量一定是奇数,
                且不少于三个节点 -->
                <property>
                        <name>ha.zookeeper.quorum</name>
                        <value>nnode:2181,dnode1:2181,dnode2:2181</value>
                </property>
        </configuration>

         e.) hdfs-site.xml

            

    <configuration>
            <!-- version of this configuration file -->
            <property>
                    <name>hadoop.hdfs.configuration.version</name>
                    <value>1</value>
            </property>
            <!-- 指定DataNode存储block的副本数量,现在有两个datanode,设定2。
            默认值是3个 -->
            <property>
                    <name>dfs.replication</name>
                    <value>2</value>
            </property>
            <!-- 指定DFS的name node在本地文件系统的什么位置存储name table(fsimage) -->
            <!-- 实际上最终的hdfs数据还是在namenode节点所在的linux主机上存着 -->
            <!-- 如果这里是以英文逗号分割的目录列表,那么name table将复制到所有目录;
            作为冗余存储。 -->
            <!-- 默认file://${hadoop.tmp.dir}/dfs/name -->
            <property>
                    <name>dfs.namenode.name.dir</name>
                    <value>/usr/local/hadoop2.6.0/data/dfs/name</value>
            </property>
            <!-- 默认为${dfs.namenode.name.dir},指定DFS name node在本地文件系统的什么
            位置存储transaction (edits) file。 -->
            <!-- 如果这里是以英文逗号分割的目录列表,那么transaction (edits) file将复制到
            所有目录;作为冗余存储(for redundancy)。 -->
            <property>
                    <name>dfs.namenode.name.dir</name>
                    <value>/usr/local/hadoop2.6.0/data/dfs/edits</value>
            </property>
            <!-- 指定DFS的data node在本地文件系统的什么位置存储its blocks -->
            <!-- 如果是以逗号分割的目录列表,则数据会存储在所有指定的目录,
            一般都是将目录分散在不同的设备上,不存着的目录会被忽略 -->
            <!-- 默认file://${hadoop.tmp.dir}/dfs/data --> 
            <property>
                     <name>dfs.datanode.data.dir</name>
                     <value>/usr/local/hadoop2.6.0/data/dfs/data</value>
            </property>
            <!-- 默认为true,启用NameNodes和DataNodes的WebHDFS(通过50070来访问) -->     
            <property>
                    <name>dfs.webhdfs.enabled</name>
                    <value>true</value>
            </property>
            <!-- 是否启用HDFS的权限检查,现在还不熟悉,暂时禁用 -->          
            <property>
                     <name>dfs.permissions.enabled</name>
                     <value>false</value>
            </property>

            <!-- The follows just for ha -->
            <!-- Comma-separated list of nameservices. -->
            <!-- 我这里只有一个集群,一个命名空间,一个nameservice,自定义集群名称 -->
            <property>
                    <name>dfs.nameservices</name>
                    <value>cluster</value>
            </property>
            <!-- dfs.ha.namenodes.EXAMPLENAMESERVICE -->
            <!-- EXAMPLENAMESERVICE代表了一个样例,如前面的cluster,表示该集群中的namenode,
            自定义名称,这里的值也是逻辑名称,名字随便起,相互不重复即可  -->
            <property>
                    <name>dfs.ha.namenodes.cluster</name>
                    <value>nn1,nn2</value>
            </property>

            <!-- for rpc connection -->
            <!-- 处理客户端请求的RPC地址,对于HA/Federation这种有多个namenode存着的情况,
            为自定义的nameservie和namenode标识  -->
            <!-- Hadoop的架构基于RPC来实现的,NameNode等为RPC的server端,
            如FileSystem等为实现的RPC的client端 -->
            <property>
                    <name>dfs.namenode.rpc-address.cluster.nn1</name>
                    <value>nnode:8020</value>
            </property>
            <!-- 下面为另一个namenode节点 -->
            <property>
                    <name>dfs.namenode.rpc-address.cluster.nn2</name>
                    <value>dnode1:8020</value>
            </property>

            <!-- for http connection -->
            <!-- 默认值0.0.0.0:50070,dfs namenode web ui监听的端口,
            namenode启动后可以通过该地址查看namenode状态 -->
            <property>
                    <name>dfs.namenode.http-address.cluster.nn1</name>
                    <value>nnode:50070</value>
            </property>
            <!-- 同上 -->
            <property>
                    <name>dfs.namenode.http-address.cluster.nn2</name>
                    <value>dnode1:50070</value>
            </property>

            <!-- for connection with namenodes-->
            <!-- 用来进行HDFS服务通信的RPC地址,如果配置了该地址则BackupNode、Datanodes
            以及其他服务应当连接该地址。 -->
            <!-- 对于HA/Federation有多个namenode的情况,应采用nameservice.namenode的形式 -->
            <!-- 如果该参数未设置,dfs.namenode.rpc-address将作为默认值被使用 -->
            <property>
                    <name>dfs.namenode.servicerpc-address.cluster.nn1</name>
                    <value>nnode:53310</value>
            </property>
            <!-- 同上 -->
            <property>
                    <name>dfs.namenode.servicerpc-address.cluster.nn2</name>
                    <value>dnode1:53310</value>
            </property>

            <!-- namenode.shared.edits -->
            <!-- 在HA中,多个namenode节点间共享存储目录时,使用的JournalNode集群信息。 -->
            <!-- active状态的namenode执行write,而standby状态的namenode执行read,
            以保证namespaces同步。-->
            <!-- 该目录无需位于dfs.namenode.edits.dir目录之上,若非HA集群该目录为空。 -->
            <property>
                    <name>dfs.namenode.shared.edits.dir</name>
                    <value>qjournal://nnode:8485;dnode1:8485;dnode2:8485/cluster</value>
            </property>     

            <!-- journalnode.edits.dir -->
            <!-- the path where the JournalNode daemon will store its local state -->
            <property>
                    <name>dfs.journalnode.edits.dir</name>
                    <value>/usr/local/hadoop2.6.0/data/ha/journal</value>
            </property>

            <!-- failover proxy -->
            <!-- 指定cluster出故障时,哪个实现类负责执行故障切换 -->
            <property>
                    <name>dfs.client.failover.proxy.provider.cluster</name>
                    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
            </property>

            <!-- automatic-failover -->
            <!-- 指定cluster是否启动自动故障恢复,即当NameNode出故障时,是否自动切换到另一台NameNode -->
            <property>
                    <name>dfs.ha.automatic-failover.enabled</name>
                    <value>true</value>
            </property>

            <!-- 一旦需要NameNode切换,使用ssh方式进行操作 -->
            <property>
                    <name>dfs.ha.fencing.methods</name>
                    <value>sshfence</value>
            </property>

            <!-- 如果使用ssh进行故障切换,使用ssh通信时用的密钥存储的位置 -->
            <property>
                    <name>dfs.ha.fencing.ssh.private-key-files</name>
                    <value>/home/hadoop/.ssh/id_dsa</value>
            </property>
    </configuration>

        f.) mapred-site.xml

    <configuration>
            <!-- The runtime framework for executing MapReduce jobs. 
            Can be one of local, classic or yarn. -->
            <property>
                    <name>mapreduce.framework.name</name>
                    <value>yarn</value>
            </property>
            <!-- The host and port that the MapReduce job tracker runs at. -->
            <property>
                    <name>mapreduce.jobtracker.address</name>
                    <value>nnode:9001</value>
            </property>
            <!-- The job tracker http server address and port the server will listen on, 
            默认0.0.0.0:50030。If the port is 0 then server will start on a free port -->
            <property>
                    <name>mapreduce.jobtracker.http.address</name>
                    <value>nnode:50030</value>
            </property>
            <!-- The task tracker http server address and port,默认值0.0.0.0:50060。 
            If the port is 0 then the server will start on a free port.  -->
            <property>
                    <name>mapreduce.tasktracker.http.address</name>
                    <value>nnode:50060</value>
            </property>
            <!-- MapReduce JobHistory Server IPC host:port -->
            <property>
                    <name>mapreduce.jobhistory.address</name>
                    <value>nnode:10020</value>
            </property>
            <!-- MapReduce JobHistory Server Web UI host:port -->
            <property>
                    <name>mapreduce.jobhistory.webapp.address</name>
                    <value>nnode:19888</value>
            </property>
            <!-- The directory where MapReduce stores control files. 
            默认值${hadoop.tmp.dir}/mapred/system. -->
            <property>
                    <name>mapreduce.jobtracker.system.dir</name>
                    <value>/usr/local/hadoop2.6.0/data/mapred/system</value>
            </property>
            <!-- The root of the staging area for users' job files,
            默认值${hadoop.tmp.dir}/mapred/staging -->
            <property>
                    <name>mapreduce.jobtracker.staging.root.dir</name>
                    <value>/usr/local/hadoop2.6.0/data/mapred/staging</value>
            </property>
            <!-- A shared directory for temporary files.  
            默认值${hadoop.tmp.dir}/mapred/temp      -->
            <property>
                    <name>mapreduce.cluster.temp.dir</name>
                    <value>/usr/local/hadoop2.6.0/data/mapred/tmp</value>
            </property>
            <!-- The local directory where MapReduce stores intermediate data files.
            默认值${hadoop.tmp.dir}/mapred/local  -->
            <property>
                    <name>mapreduce.cluster.local.dir</name>
                    <value>/usr/local/hadoop2.6.0/data/mapred/local</value>
            </property>
    </configuration>

        g.)  yarn-site.xml

    <configuration>
            <!-- The hostname of the RM.还是单点,这是隐患 -->
            <property>     
                    <name>yarn.resourcemanager.hostname</name>     
                    <value>nnode</value>     
            </property>
            <!-- The hostname of the NM.
            <property>     
                    <name>yarn.nodemanager.hostname</name>     
                    <value>nnode</value>  
            </property>
            -->
            <!-- NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce -->
            <property>
                    <name>yarn.nodemanager.aux-services</name>
                    <value>mapreduce_shuffle</value>
            </property>
            <!-- 默认值即为该Handler -->
            <property>
                    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
                    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
            <!-- 说明:为了能够运行MapReduce程序,需要让各个NodeManager在启动时加载
            shuffle server,shuffle server实际上是Jetty/Netty Server,Reduce Task通
            过该server从各个NodeManager上远程拷贝Map Task产生的中间结果。
            上面增加的两个配置均用于指定shuffle serve。
            -->
    </configuration>

        h.) slaves

# 指定所有的DataNode节点列表,每行一个节点名称
[hadoop@nnode ~]$ cd /usr/local/hadoop2.6.0/etc/hadoop/
[hadoop@nnode hadoop]$ cat slaves 
dnode1
dnode2
[hadoop@nnode hadoop]$

        i.) 将zookeeper和hadoop配置好的安装文件分发到另外两台机器

scp -r hadoop2.6.0 hadoop@dnode1:/usr/local/
scp -r hadoop2.6.0 hadoop@dnode2:/usr/local/


7、启动hadoop集群

        启动ZooKeeper集群

        在nnode、dnode1和dnode2三台机器上依次执行zkServer.sh start

[hadoop@nnode ~]$ zkServer.sh start
JMX enabled by default
Using config: /usr/local/zookeeper3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@nnode ~]$
# 检查zookeeper的状态
[hadoop@nnode ~]$ zkServer.sh status
JMX enabled by default
Using config: /usr/local/zookeeper3.4.6/bin/../conf/zoo.cfg
Mode: follower
[hadoop@nnode ~]$ 
# 三个节点都执行完成后通过zkCli.sh验证
[hadoop@nnode ~]$ zkCli.sh
Connecting to localhost:2181
2015-07-19 10:39:40,897 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2015-07-19 10:39:40,903 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=nnode
2015-07-19 10:39:40,903 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.7.0_75
2015-07-19 10:39:40,906 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2015-07-19 10:39:40,906 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/local/jdk1.7/jre
# 中间略
2015-07-19 10:39:40,907 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2015-07-19 10:39:40,907 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2015-07-19 10:39:40,907 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2015-07-19 10:39:40,908 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2015-07-19 10:39:40,908 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=2.6.32-431.el6.x86_64
2015-07-19 10:39:40,908 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=hadoop
2015-07-19 10:39:40,908 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/hadoop
2015-07-19 10:39:40,908 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/hadoop
2015-07-19 10:39:40,910 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@75aa57fb
Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2181(CONNECTING) 0] ls /
# 查看zk集群中的节点
# 启动的顺序是nnode>dnode1>dnode2,由于ZooKeeper集群启动的时候,每个结点都试图去连接集群中
# 的其它结点,先启动的肯定连不上后面还没启动的,所以日志前面部分可能会有异常报出。
# 最后集群在选出一个Leader后就稳定了,其他结点可能也出现类似问题,属于正常。

        格式化ZooKeeper集群,目的是在ZooKeeper集群上建立HA的相应节点

[hadoop@nnode ~]$ hdfs zkfc �CformatZK

        启动JournalNode集群

# 在三个节点上分别执行hadoop-daemon.sh start journalnode
[hadoop@nnode ~]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/hadoop2.6.0/logs/hadoop-hadoop-journalnode-nnode.out
[hadoop@nnode ~]$ jps
12425 Jps
12385 JournalNode
11637 QuorumPeerMain
# 注意这里是hadoop-daemon.sh而非hadoop-daemons.sh(daemon后面没有s)

        格式化集群的NameNode

hdfs namenode -format
# 我这里在任意目录下均可执行hdfs命令是由于我已经把hadoop的bin目录加入环境变量

        启动NameNode

[hadoop@nnode ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop2.6.0/logs/hadoop-hadoop-namenode-nnode.out
[hadoop@nnode ~]$ tail -n50 /usr/local/hadoop2.6.0/logs/hadoop-hadoop-namenode-nnode.log 
2015-07-19 10:58:54,762 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2015-07-19 10:58:54,764 INFO org.apache.hadoop.util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
2015-07-19 10:58:54,764 INFO org.apache.hadoop.util.GSet: capacity      = 2^21 = 2097152 entries
2015-07-19 10:58:54,772 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.block.access.token.enable=false
2015-07-19 10:58:54,772 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: defaultReplication         = 2
2015-07-19 10:58:54,773 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplication             = 512
2015-07-19 10:58:54,773 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: minReplication             = 1
2015-07-19 10:58:54,773 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplicationStreams      = 2
2015-07-19 10:58:54,773 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
2015-07-19 10:58:54,773 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: replicationRecheckInterval = 3000
2015-07-19 10:58:54,773 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: encryptDataTransfer        = false
2015-07-19 10:58:54,773 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2015-07-19 10:58:54,778 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
2015-07-19 10:58:54,778 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup          = supergroup
2015-07-19 10:58:54,778 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled = false
2015-07-19 10:58:54,778 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Determined nameservice ID: cluster
2015-07-19 10:58:54,778 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: true
2015-07-19 10:58:54,783 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2015-07-19 10:58:55,113 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap
2015-07-19 10:58:55,114 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2015-07-19 10:58:55,114 INFO org.apache.hadoop.util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
2015-07-19 10:58:55,114 INFO org.apache.hadoop.util.GSet: capacity      = 2^20 = 1048576 entries
2015-07-19 10:58:55,123 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2015-07-19 10:58:55,139 INFO org.apache.hadoop.util.GSet: Computing capacity for map cachedBlocks
2015-07-19 10:58:55,139 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2015-07-19 10:58:55,139 INFO org.apache.hadoop.util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
2015-07-19 10:58:55,139 INFO org.apache.hadoop.util.GSet: capacity      = 2^18 = 262144 entries
2015-07-19 10:58:55,144 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2015-07-19 10:58:55,144 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
2015-07-19 10:58:55,144 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
2015-07-19 10:58:55,145 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache on namenode is enabled
2015-07-19 10:58:55,146 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2015-07-19 10:58:55,151 INFO org.apache.hadoop.util.GSet: Computing capacity for map NameNodeRetryCache
2015-07-19 10:58:55,151 INFO org.apache.hadoop.util.GSet: VM type       = 64-bit
2015-07-19 10:58:55,152 INFO org.apache.hadoop.util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
2015-07-19 10:58:55,152 INFO org.apache.hadoop.util.GSet: capacity      = 2^15 = 32768 entries
2015-07-19 10:58:55,159 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: ACLs enabled? false
2015-07-19 10:58:55,159 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: XAttrs enabled? true
2015-07-19 10:58:55,163 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: Maximum size of an xattr: 16384
2015-07-19 10:58:55,187 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /usr/local/hadoop2.6.0/data/dfs/name/in_use.lock acquired by nodename 12629@nnode
2015-07-19 10:58:59,551 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 176 INodes.
2015-07-19 10:58:59,626 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage in 0 seconds.
2015-07-19 10:58:59,626 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 8261 from /usr/local/hadoop2.6.0/data/dfs/name/current/fsimage_0000000000000008261
2015-07-19 10:58:59,631 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@3f1084df expecting start txid #8262
2015-07-19 10:58:59,632 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Start loading edits file http://dnode1:8480/getJournal?jid=cluster&segmentTxId=8262&storageInfo=-60%3A339614018%3A0%3ACID-61347f43-0510-4b07-ad99-956472c0e49f, http://nnode:8480/getJournal?jid=cluster&segmentTxId=8262&storageInfo=-60%3A339614018%3A0%3ACID-61347f43-0510-4b07-ad99-956472c0e49f
2015-07-19 10:58:59,635 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream 'http://dnode1:8480/getJournal?jid=cluster&segmentTxId=8262&storageInfo=-60%3A339614018%3A0%3ACID-61347f43-0510-4b07-ad99-956472c0e49f, http://nnode:8480/getJournal?jid=cluster&segmentTxId=8262&storageInfo=-60%3A339614018%3A0%3ACID-61347f43-0510-4b07-ad99-956472c0e49f' to transaction ID 8262
2015-07-19 10:58:59,635 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream 'http://dnode1:8480/getJournal?jid=cluster&segmentTxId=8262&storageInfo=-60%3A339614018%3A0%3ACID-61347f43-0510-4b07-ad99-956472c0e49f' to transaction ID 8262

        同步NameNode的数据

# 在主机dnode1上执行如下同步命令
hdfs namenode -bootstrapStandby
# 注意这里bootstrapStandby前面的折线必须为英文状态的,否则总是会报错

        启动另一个Namenode

[hadoop@dnode1 ~]$ hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop2.6.0/logs/hadoop-hadoop-namenode-dnode1.out
[hadoop@dnode1 ~]$

        启动所有的DataNode

[hadoop@nnode ~]$ hadoop-daemons.sh start datanode
dnode1: starting datanode, logging to /usr/local/hadoop2.6.0/logs/hadoop-hadoop-datanode-dnode1.out
dnode2: starting datanode, logging to /usr/local/hadoop2.6.0/logs/hadoop-hadoop-datanode-dnode2.out
[hadoop@nnode ~]$ 
# 上面的命令为hadoop-daemons.sh(这里的daemon是带有s的)
# 查看dnode1节点的进程
[hadoop@dnode1 ~]$ jps
12431 JournalNode
12534 NameNode
11636 QuorumPeerMain
12651 DataNode
12737 Jps
[hadoop@dnode1 ~]$ 

# 查看dnode2节点的进程
[hadoop@dnode2 ~]$ jps
12477 Jps
12400 DataNode
11566 QuorumPeerMain
12286 JournalNode
[hadoop@dnode2 ~]$

        启动Yarn

[hadoop@nnode ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop2.6.0/logs/yarn-hadoop-resourcemanager-nnode.out
dnode1: starting nodemanager, logging to /usr/local/hadoop2.6.0/logs/yarn-hadoop-nodemanager-dnode1.out
dnode2: starting nodemanager, logging to /usr/local/hadoop2.6.0/logs/yarn-hadoop-nodemanager-dnode2.out
[hadoop@nnode ~]$

        启动ZooKeeperFailoverController

# nnode和dnode1节点配置的有namenode,在这两个节点执行
[hadoop@nnode ~]$ hadoop-daemon.sh start zkfc
starting zkfc, logging to /usr/local/hadoop2.6.0/logs/hadoop-hadoop-zkfc-nnode.out
[hadoop@nnode ~]$ 
[hadoop@nnode ~]$ jps
12629 NameNode
12385 JournalNode
13571 Jps
11637 QuorumPeerMain
13218 ResourceManager
13502 DFSZKFailoverController

        验证HDFS是否好用

[hadoop@nnode ~]$ hdfs dfs -ls -R /user/hadoop
-rw-r--r--   2 hadoop hadoop       2297 2015-06-29 14:44 /user/hadoop/20130913152700.txt.gz
-rw-r--r--   2 hadoop hadoop        211 2015-06-29 14:45 /user/hadoop/20130913160307.txt.gz
-rw-r--r--   2 hadoop hadoop   93046447 2015-07-18 18:01 /user/hadoop/apache-hive-1.2.0-bin.tar.gz
-rw-r--r--   2 hadoop hadoop    4139112 2015-06-28 22:54 /user/hadoop/httpInterceptor_192.168.1.101_1_20130913160307.txt
-rw-r--r--   2 hadoop hadoop        240 2015-05-30 20:54 /user/hadoop/lucl.gz
-rw-r--r--   2 hadoop hadoop         63 2015-05-27 23:55 /user/hadoop/lucl.txt
-rw-r--r--   2 hadoop hadoop    9994248 2015-06-29 14:12 /user/hadoop/scalog.txt
-rw-r--r--   2 hadoop hadoop    2664495 2015-06-28 20:54 /user/hadoop/scalog.txt.gz
-rw-r--r--   3 hadoop hadoop   28026803 2015-06-24 21:16 /user/hadoop/test.txt.gz
-rw-r--r--   2 hadoop hadoop      28551 2015-05-27 23:54 /user/hadoop/zookeeper.out

        验证Yarn是否好用

# 通过如下地址访问正常
http://nnode:8088/cluster

        验证HA的故障自动转移是否好用

http://nnode:50070

http://dnode1:50070

# 此时两个namenode一个为active一个为standby,将active的namenode进程kill后,
# standby状态的namenode成为active


你可能感兴趣的:(hadoop,HA)