编写不易,转载请注明(http://shihlei.iteye.com/blog/2084711)!
说明
本文搭建Hadoop CDH5.0.1 分布式系统,包括NameNode ,ResourceManger HA,忽略了Web Application Proxy 和Job HistoryServer。
word版:见附件吧!
一概述
(一)HDFS
1)基础架构
(1)NameNode(Master)
- 命名空间管理:命名空间支持对HDFS中的目录、文件和块做类似文件系统的创建、修改、删除、列表文件和目录等基本操作。
- 块存储管理
(2)DataNode(Slaver)
namenode和client的指令进行存储或者检索block,并且周期性的向namenode节点报告它存了哪些文件的block
2)HA架构
使用Active NameNode,Standby NameNode 两个结点解决单点问题,两个结点通过JounalNode共享状态,通过ZKFC 选举Active ,监控状态,自动备援。
(1)Active NameNode:
接受client的RPC请求并处理,同时写自己的Editlog和共享存储上的Editlog,接收DataNode的Block report, block location updates和heartbeat;
(2)Standby NameNode:
同样会接到来自DataNode的Block report, block location updates和heartbeat,同时会从共享存储的Editlog上读取并执行这些log操作,使得自己的NameNode中的元数据(Namespcae information + Block locations map)都是和Active NameNode中的元数据是同步的。所以说Standby模式的NameNode是一个热备(Hot Standby NameNode),一旦切换成Active模式,马上就可以提供NameNode服务
(3)JounalNode:
用于Active NameNode , Standby NameNode 同步数据,本身由一组JounnalNode结点组成,该组结点基数个,支持Paxos协议,保证高可用,是CDH5唯一支持的共享方式(相对于CDH4 促在NFS共享方式)
(4)ZKFC:
监控NameNode进程,自动备援。
(二)YARN
1)基础架构
(1)ResourceManager(RM)
接收客户端任务请求,接收和监控NodeManager(NM)的资源情况汇报,负责资源的分配与调度,启动和监控ApplicationMaster(AM)。
(2)NodeManager
节点上的资源管理,启动Container运行task计算,上报资源、container情况给RM和任务处理情况给AM。
(3)ApplicationMaster
单个Application(Job)的task管理和调度,向RM进行资源的申请,向NM发出launch Container指令,接收NM的task处理状态信息。NodeManager
(4)Web Application Proxy
用于防止Yarn遭受Web攻击,本身是ResourceManager的一部分,可通过配置独立进程。ResourceManager Web的访问基于守信用户,当Application Master运行于一个非受信用户,其提供给ResourceManager的可能是非受信连接,Web Application Proxy可以阻止这种连接提供给RM。
(5)Job History Server
NodeManager在启动的时候会初始化LogAggregationService服务, 该服务会在把本机执行的container log (在container结束的时候)收集并存放到hdfs指定的目录下. ApplicationMaster会把jobhistory信息写到hdfs的jobhistory临时目录下, 并在结束的时候把jobhisoty移动到最终目录, 这样就同时支持了job的recovery.History会启动web和RPC服务, 用户可以通过网页或RPC方式获取作业的信息
2)HA架构
ResourceManager HA 由一对Active,Standby结点构成,通过RMStateStore存储内部数据和主要应用的数据及标记。目前支持的可替代的RMStateStore实现有:基于内存的MemoryRMStateStore,基于文件系统的FileSystemRMStateStore,及基于zookeeper的ZKRMStateStore。
ResourceManager HA的架构模式同NameNode HA的架构模式基本一致,数据共享由RMStateStore,而ZKFC成为 ResourceManager进程的一个服务,非独立存在。
二 规划
(一)版本
组件名 |
版本 |
说明 |
JRE |
java version "1.7.0_60" Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) Client VM (build 24.60-b09, mixed mode) |
|
Hadoop |
hadoop-2.3.0-cdh5.0.1.tar.gz |
主程序包
|
Zookeeper |
zookeeper-3.4.5-cdh5.0.1.tar.gz |
热切,Yarn 存储数据使用的协调服务 |
(二)主机规划
IP |
Host |
部署模块 |
进程 |
8.8.8.11 |
Hadoop-NN-01 |
NameNode ResourceManager |
NameNode DFSZKFailoverController ResourceManager |
8.8.8.12 |
Hadoop-NN-02 |
NameNode ResourceManager |
NameNode DFSZKFailoverController ResourceManager |
8.8.8.13 |
Hadoop-DN-01 Zookeeper-01 |
DataNode NodeManager Zookeeper |
DataNode NodeManager JournalNode QuorumPeerMain |
8.8.8.14 |
Hadoop-DN-02 Zookeeper-02 |
DataNode NodeManager Zookeeper |
DataNode NodeManager JournalNode QuorumPeerMain |
8.8.8.15 |
Hadoop-DN-03 Zookeeper-03 |
DataNode NodeManager Zookeeper |
DataNode NodeManager JournalNode QuorumPeerMain |
各个进程解释:
- NameNode
- ResourceManager
- DFSZKFC:DFS Zookeeper Failover Controller 激活Standby NameNode
- DataNode
- NodeManager
- JournalNode:NameNode共享editlog结点服务(如果使用NFS共享,则该进程和所有启动相关配置接可省略)。
- QuorumPeerMain:Zookeeper主进程
(三)目录规划
名称 |
路径 |
$HADOOP_HOME |
/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1 |
Data |
$ HADOOP_HOME/data |
Log |
$ HADOOP_HOME/logs |
三 环境准备
1)关闭防火墙
root 用户:
[root@CentOS-Cluster-01 hadoop-2.3.0-cdh5.0.1]# service iptables stop iptables: Flushing firewall rules: [ OK ] iptables: Setting chains to policy ACCEPT: filter [ OK ] iptables: Unloading modules: [ OK ] |
验证:
[root@CentOS-Cluster-01 hadoop-2.3.0-cdh5.0.1]# service iptables status iptables: Firewall is not running. |
2)安装JRE:略
3)安装Zookeeper :参见《Zookeeper-3.4.5-cdh5.0.1 单机模式、副本 ...》
4)配置SSH互信:
(1)Hadoop-NN-01创建密钥:
[zero@CentOS-Cluster-01 ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/zero/.ssh/id_rsa): Created directory '/home/zero/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/zero/.ssh/id_rsa. Your public key has been saved in /home/zero/.ssh/id_rsa.pub. The key fingerprint is: 28:0a:29:1d:98:56:55:db:ec:83:93:56:8a:0f:6c:ea zero@CentOS-Cluster-01 The key's randomart image is: +--[ RSA 2048]----+ | ..... | | o. + | |o.. . + | |.o .. ..* | |+ . .=.*So | |.. .o.+ . . | | .. . | | . | | E | +-----------------+ |
(2)分发密钥:
[zero@CentOS-Cluster-01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub zero@Hadoop-NN-01 The authenticity of host 'hadoop-nn-01 (8.8.8.11)' can't be established. RSA key fingerprint is a6:11:09:49:8c:fe:b2:fb:49:d5:01:fa:13:1b:32:24. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hadoop-nn-01,8.8.8.11' (RSA) to the list of known hosts. puppet@hadoop-nn-01's password: Permission denied, please try again. puppet@hadoop-nn-01's password: [zero@CentOS-Cluster-01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub zero@Hadoop-NN-01 zero@hadoop-nn-01's password: Now try logging into the machine, with "ssh 'zero@Hadoop-NN-01'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting. [zero@CentOS-Cluster-01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub zero@Hadoop-NN-02 (…略…) [zero@CentOS-Cluster-01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub zero@Hadoop-DN-01 (…略…) [zero@CentOS-Cluster-01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub zero@Hadoop-DN-02 (…略…) [zero@CentOS-Cluster-01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub zero@Hadoop-DN-03 (…略…) |
(3)验证:
[zero@CentOS-Cluster-01 ~]$ ssh Hadoop-NN-01 Last login: Sun Jun 22 19:56:23 2014 from 8.8.8.1 [zero@CentOS-Cluster-01 ~]$ exit logout Connection to Hadoop-NN-01 closed. [zero@CentOS-Cluster-01 ~]$ ssh Hadoop-NN-02 Last login: Sun Jun 22 20:03:31 2014 from 8.8.8.1 [zero@CentOS-Cluster-03 ~]$ exit logout Connection to Hadoop-NN-02 closed. [zero@CentOS-Cluster-01 ~]$ ssh Hadoop-DN-01 Last login: Mon Jun 23 02:00:07 2014 from centos_cluster_01 [zero@CentOS-Cluster-03 ~]$ exit logout Connection to Hadoop-DN-01 closed. [zero@CentOS-Cluster-01 ~]$ ssh Hadoop-DN-02 Last login: Sun Jun 22 20:07:03 2014 from 8.8.8.1 [zero@CentOS-Cluster-04 ~]$ exit logout Connection to Hadoop-DN-02 closed. [zero@CentOS-Cluster-01 ~]$ ssh Hadoop-DN-03 Last login: Sun Jun 22 20:07:05 2014 from 8.8.8.1 [zero@CentOS-Cluster-05 ~]$ exit logout Connection to Hadoop-DN-03 closed. |
5)配置/etc/hosts并分发:
[root@CentOS-Cluster-01 zero]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 8.8.8.10 CentOS-StandAlone 8.8.8.11 CentOS-Cluster-01 Hadoop-NN-01 8.8.8.12 CentOS-Cluster-02 Hadoop-NN-02 8.8.8.13 CentOS-Cluster-03 Hadoop-DN-01 Zookeeper-01 8.8.8.14 CentOS-Cluster-04 Hadoop-DN-02 Zookeeper-02 8.8.8.15 CentOS-Cluster-05 Hadoop-DN-03 Zookeeper-03 |
6)配置环境变量:vi ~/.bashrc 然后 source ~/.bashrc
[zero@CentOS-Cluster-01 ~]$ vi ~/.bashrc ……
# hadoop cdh5 export HADOOP_HOME=/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1 export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin [zero@CentOS-Cluster-01 ~]$ source ~/.bashrc |
四 安装
1)解压
[puppet@BigData-01 cdh4.4]$ tar -xvf hadoop-2.0.0-cdh4.4.0.tar.gz |
2)修改配置文件
说明:
配置名称 |
类型 |
说明 |
hadoop-env.sh |
Bash脚本 |
Hadoop运行环境变量设置 |
core-site.xml |
xml |
配置Hadoop core,如IO |
hdfs-site.xml |
xml |
配置HDFS守护进程:NN、JN、DN |
yarn-env.sh |
Bash脚本 |
Yarn运行环境变量设置 |
yarn-site.xml |
xml |
Yarn框架配置环境 |
mapred-site.xml |
xml |
MR属性设置 |
capacity-scheduler.xml |
xml |
Yarn调度属性设置 |
container-executor.cfg |
Cfg |
Yarn Container配置 |
mapred-queues.xml |
xml |
MR队列设置 |
hadoop-metrics.properties |
Java属性 |
Hadoop Metrics配置 |
hadoop-metrics2.properties |
Java属性 |
Hadoop Metrics配置 |
slaves |
Plain Text |
DN节点配置 |
exclude |
Plain Text |
移除DN节点配置文件 |
log4j.properties |
|
系统日志设置 |
configuration.xsl |
|
|
(1)修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh:
#--------------------Java Env------------------------------ export JAVA_HOME="/usr/runtime/java/jdk1.7.0_60" #--------------------Hadoop Env------------------------------ #export HADOOP_PID_DIR= export HADOOP_PREFIX="/home/zero/hadoop/hadoop-2.3.0-cdh5.0.1" #--------------------Hadoop Daemon Options----------------- #export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}" #export HADOOP_DATANODE_OPTS= #--------------------Hadoop Logs--------------------------- #export HADOOP_LOG_DIR=
(2)修改$HADOOP_HOME/etc/hadoop/hadoop-site.xml
fs.defaultFS hdfs://mycluster dfs.permissions.superusergroup zero fs.trash.checkpoint.interval 0 fs.trash.interval 1440
(3)修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml
dfs.webhdfs.enabled true dfs.namenode.name.dir /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/data/dfs/name namenode 存放name table(fsimage)本地目录(需要修改) dfs.namenode.edits.dir ${dfs.namenode.name.dir} namenode粗放 transaction file(edits)本地目录(需要修改) dfs.datanode.data.dir /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/data/dfs/data datanode存放block本地目录(需要修改) dfs.replication 1 dfs.blocksize 268435456 dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1 Hadoop-NN-01:8020 dfs.namenode.rpc-address.mycluster.nn2 Hadoop-NN-02:8020 dfs.namenode.http-address.mycluster.nn1 Hadoop-NN-01:50070 dfs.namenode.http-address.mycluster.nn2 Hadoop-NN-02:50070 dfs.journalnode.http-address 0.0.0.0:8480 dfs.journalnode.rpc-address 0.0.0.0:8485 dfs.namenode.shared.edits.dir qjournal://Hadoop-DN-01:8485;Hadoop-DN-02:8485;Hadoop-DN-03:8485/mycluster dfs.journalnode.edits.dir /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/data/dfs/jn dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files /home/zero/.ssh/id_rsa dfs.ha.fencing.ssh.connect-timeout 30000 dfs.ha.automatic-failover.enabled true ha.zookeeper.quorum Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181 ha.zookeeper.session-timeout.ms 2000
(4)修改$HADOOP_HOME/etc/hadoop/yarn-env.sh
#Yarn Daemon Options #export YARN_RESOURCEMANAGER_OPTS #export YARN_NODEMANAGER_OPTS #export YARN_PROXYSERVER_OPTS #export HADOOP_JOB_HISTORYSERVER_OPTS #Yarn Logs export YARN_LOG_DIR=” /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1/logs”
(5)$HADOOP_HOEM/etc/hadoop/mapred-site.xml
mapreduce.framework.name yarn mapreduce.jobhistory.address 0.0.0.0:10020 mapreduce.jobhistory.webapp.address 0.0.0.0:19888
(6)修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler Address where the localizer IPC is. yarn.nodemanager.localizer.address 0.0.0.0:23344 NM Webapp address. yarn.nodemanager.webapp.address 0.0.0.0:23999 yarn.resourcemanager.connect.retry-interval.ms 2000 yarn.resourcemanager.ha.enabled true yarn.resourcemanager.ha.automatic-failover.enabled true yarn.resourcemanager.ha.automatic-failover.embedded true yarn.resourcemanager.cluster-id yarn-cluster yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.ha.id rm2 -->yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler yarn.resourcemanager.recovery.enabled true yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms 5000 yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.resourcemanager.zk-address Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181 yarn.resourcemanager.zk.state-store.address Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181 yarn.resourcemanager.address.rm1 Hadoop-NN-01:23140 yarn.resourcemanager.address.rm2 Hadoop-NN-02:23140 yarn.resourcemanager.scheduler.address.rm1 Hadoop-NN-01:23130 yarn.resourcemanager.scheduler.address.rm2 Hadoop-NN-02:23130 yarn.resourcemanager.admin.address.rm1 Hadoop-NN-01:23141 yarn.resourcemanager.admin.address.rm2 Hadoop-NN-02:23141 yarn.resourcemanager.resource-tracker.address.rm1 Hadoop-NN-01:23125 yarn.resourcemanager.resource-tracker.address.rm2 Hadoop-NN-02:23125 yarn.resourcemanager.webapp.address.rm1 Hadoop-NN-01:23188 yarn.resourcemanager.webapp.address.rm2 Hadoop-NN-02:23188 yarn.resourcemanager.webapp.https.address.rm1 Hadoop-NN-01:23189 yarn.resourcemanager.webapp.https.address.rm2 Hadoop-NN-02:23189
(7)修改slaves
[zero@CentOS-Cluster-01 hadoop]$ vi slaves Hadoop-DN-01 Hadoop-DN-02 Hadoop-DN-03 |
3)分发程序
[zero@CentOS-Cluster-01 ~]$ scp -r /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1 zero@Hadoop-NN-02: /home/zero/hadoop/ ……. [zero@CentOS-Cluster-01 ~]$ scp -r /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1 zero@Hadoop-DN-01: /home/zero/hadoop/ ……. [zero@CentOS-Cluster-01 ~]$ scp -r /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1 zero@Hadoop-DN-02: /home/zero/hadoop/ ……. [zero@CentOS-Cluster-01 ~]$ scp -r /home/zero/hadoop/hadoop-2.3.0-cdh5.0.1 zero@Hadoop-DN-03: /home/zero/hadoop/ ……. |
4)启动HDFS
(1)启动JournalNode
格式化前需要在JournalNode结点上启动JournalNode:
[zero@CentOS-Cluster-03 hadoop-2.3.0-cdh5.0.1]$ hadoop-daemon.sh start journalnode starting journalnode, logging to /home/puppet/hadoop/cdh4.4/hadoop-2.0.0-cdh4.4.0/logs/hadoop-puppet-journalnode-BigData-03.out |