一、准备
1、4台linux系统
2、检查联网
3、检查各hosts文件
4、检查ssh
5、检查各节点的jvm配置
6、将配置好的hadoop目录拷贝到其他节点:
scp -r itcast hadoop@skx2:/home/hadoop
7、检查各配置文件
federation的应用场景
参看:http://www.infoq.com/cn/articles/hadoop-2-0-namenode-ha-federation-practice-zh/
http://blog.csdn.net/strongerbit/article/details/7013221/
Federation HDFS与当前HDFS的比较
当前HDFS只有一个命名空间(Namespace),它使用全部的块。而Federation HDFS中有多个独立的命名空间(Namespace),并且每一个命名空间使用一个块池(block pool)。
当前HDFS中只有一组块。而Federation HDFS中有多组独立的块。块池(block pool)就是属于同一个命名空间的一组块。
当前HDFS由一个Namenode和一组datanode组成。而Federation HDFS由多个Namenode和一组datanode,每一个datanode会为多个块池(block pool)存储块。
其他配置文件和前节相同,主要是hdfs-site.xml,参看:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1,hadoop-cluster2</value>
<description>
Comma-separated list of nameservices.
</description>
</property>
<!-- hadoop cluster1-->
<property>
<name>dfs.ha.namenodes.hadoop-cluster1</name>
<value>nn1,nn2</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster1.nn1</name>
<value>SY-0217:8020</value>
<description>
RPC address for nomenode1 of hadoop-cluster1
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster1.nn2</name>
<value>SY-0355:8020</value>
<description>
RPC address for nomenode2 of hadoop-test
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster1.nn1</name>
<value>SY-0217:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster1.nn2</name>
<value>SY-0355:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>
<!-- hadoop cluster2 -->
<property>
<name>dfs.ha.namenodes.hadoop-cluster2</name>
<value>nn3,nn4</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster2.nn3</name>
<value>SY-0226:8020</value>
<description>
RPC address for nomenode1 of hadoop-cluster1
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster2.nn4</name>
<value>SY-0225:8020</value>
<description>
RPC address for nomenode2 of hadoop-test
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster2.nn3</name>
<value>SY-0226:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster2.nn4</name>
<value>SY-0225:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/dongxicheng/hadoop/hdfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://SY-0355:8485;SY-0225:8485;SY-0226:8485/hadoop-cluster</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/dongxicheng/hadoop/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>false</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/dongxicheng/hadoop/hdfs/journal/</value>
</property>
</configuration>
启动:
启动Hadoop集群:
-------------------------------------------------------------------
(1) 启动nn1与nn2
Step1 :
在各个JournalNode节点上,输入以下命令启动journalnode服务:
sbin/hadoop-daemon.sh start journalnode
Step2:
在[nn1]上,对其进行格式化,并启动:
bin/hdfs namenode -format -clusterId hadoop-cluster
sbin/hadoop-daemon.sh start namenode
Step3:
在[nn2]上,同步nn1的元数据信息:
bin/hdfs namenode -bootstrapStandby
Step4:
启动[nn2]:
sbin/hadoop-daemon.sh start namenode
经过以上四步操作,nn1和nn2均处理standby状态
Step5:
将[nn1]切换为Active
bin/hdfs haadmin -ns hadoop-cluster1 -transitionToActive nn1
-------------------------------------------------------------------
(2) 启动nn3与nn4
Step1:
在[nn3]上,对其进行格式化,并启动:
bin/hdfs namenode -format -clusterId hadoop-cluster
sbin/hadoop-daemon.sh start namenode
Step2:
在[nn4]上,同步nn3的元数据信息:
bin/hdfs namenode -bootstrapStandby
Step3:
启动[nn4]:
sbin/hadoop-daemon.sh start namenode
经过以上三步操作,nn3和nn4均处理standby状态
Step4:
将[nn3]切换为Active
bin/hdfs haadmin -ns hadoop-cluster2 -transitionToActive nn3
-------------------------------------------------------------------
(3)启动所有datanode
Step6:
在[nn1]上,启动所有datanode
sbin/hadoop-daemons.sh start datanode
-------------------------------------------------------------------
(4)关闭Hadoop集群:
在[nn1]上,输入以下命令
sbin/stop-dfs.sh