2019年4月14日14:04:12
资料来自传智播客
apache hadoop三种架构介绍(standAlone,伪分布,分布式环境介绍以及安装)
hadoop 文档
http://hadoop.apache.org/docs/
5.1、StandAlone环境搭建
运行服务 服务器IP
NameNode 192.168.52.100
SecondaryNameNode 192.168.52.100
DataNode 192.168.52.100
ResourceManager 192.168.52.100
NodeManager 192.168.52.100
第一步:下载apache hadoop并上传到服务器
下载链接:
http://archive.apache.org/dist/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
解压命令
cd /export/softwares
tar -zxvf hadoop-2.7.5.tar.gz -C …/servers/
第二步:修改配置文件
修改core-site.xml
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim core-site.xml
fs.default.name
hdfs://192.168.52.100:8020
hadoop.tmp.dir
/export/servers/hadoop-2.7.5/hadoopDatas/tempDatas
io.file.buffer.size
4096
fs.trash.interval
10080
修改hdfs-site.xml 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim hdfs-site.xml dfs.blocksize 134217728 修改hadoop-env.sh 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim hadoop-env.sh vim hadoop-env.sh export JAVA_HOME=/export/servers/jdk1.8.0_141 修改mapred-site.xml 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim mapred-site.xml mapreduce.framework.name yarn
mapreduce.job.ubertask.enable
true
mapreduce.jobhistory.address
node01:10020
mapreduce.jobhistory.webapp.address
node01:19888
修改yarn-site.xml 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim yarn-site.xml yarn.resourcemanager.hostname node01 yarn.nodemanager.aux-services mapreduce_shuffle
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
604800
修改mapred-env.sh
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim mapred-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141
修改slaves
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim slaves
localhost
第三步:启动集群
要启动 Hadoop 集群,需要启动 HDFS 和 YARN 两个模块。
注意: 首次启动 HDFS 时,必须对其进行格式化操作。 本质上是一些清理和
准备工作,因为此时的 HDFS 在物理上还是不存在的。
hdfs namenode -format 或者 hadoop namenode –format
启动命令:
创建数据存放文件夹
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/tempDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/snn/name
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits
准备启动
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/
bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
三个端口查看界面
http://node01:50070/explorer.html#/ 查看hdfs
http://node01:8088/cluster 查看yarn集群
http://node01:19888/jobhistory 查看历史完成的任务
5.2、伪分布式环境搭建(适用于学习测试开发集群模式)
服务规划
服务器IP 192.168.52.100 192.168.52.110 192.168.52.120
主机名 node01.hadoop.com node02.hadoop.com node03.hadoop.com
NameNode 是 否 否
Secondary
NameNode 是 否 否
dataNode 是 是 是
ResourceManager 是 否 否
NodeManager 是 是 是
停止单节点集群,删除/export/servers/hadoop-2.7.5/hadoopDatas文件夹,然后重新创建文件夹
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5
sbin/stop-dfs.sh
sbin/stop-yarn.sh
sbin/mr-jobhistory-daemon.sh stop historyserver
删除hadoopDatas然后重新创建文件夹
rm -rf /export/servers/hadoop-2.7.5/hadoopDatas
重新创建文件夹
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/tempDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/snn/name
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits
修改slaves文件,然后将安装包发送到其他机器,重新启动集群即可
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim slaves
node01
node02
node03
安装包的分发
第一台机器执行以下命令
cd /export/servers/
scp -r hadoop-2.7.5 node02: P W D s c p − r h a d o o p − 2.7.5 n o d e 03 : PWD scp -r hadoop-2.7.5 node03: PWDscp−rhadoop−2.7.5node03:PWD
启动集群
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5
bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
5.3、分布式环境搭建(适用于工作当中正式环境搭建)
使用完全分布式,实现namenode高可用,ResourceManager的高可用
集群运行服务规划
192.168.1.100 192.168.1.110 192.168.1.120
zookeeper zk zk zk
HDFS JournalNode JournalNode JournalNode
NameNode NameNode
ZKFC ZKFC
DataNode DataNode DataNode
YARN ResourceManager ResourceManager
NodeManager NodeManager NodeManager
MapReduce JobHistoryServer
安装包解压
停止之前的hadoop集群的所有服务,并删除所有机器的hadoop安装包,然后重新解压hadoop压缩包
解压压缩包
第一台机器执行以下命令进行解压
cd /export/softwares
tar -zxvf hadoop-2.7.5.tar.gz -C …/servers/
配置文件的修改
修改core-site.xml
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim core-site.xml
ha.zookeeper.quorum
node01:2181,node02:2181,node03:2181
fs.defaultFS
hdfs://ns
hadoop.tmp.dir
/export/servers/hadoop-2.7.5/data/tmp
fs.trash.interval
10080
修改hdfs-site.xml
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim hdfs-site.xml
dfs.nameservices
ns
dfs.ha.namenodes.ns
nn1,nn2
dfs.namenode.rpc-address.ns.nn1
node01:8020
dfs.namenode.rpc-address.ns.nn2
node02:8020
dfs.namenode.servicerpc-address.ns.nn1
node01:8022
dfs.namenode.servicerpc-address.ns.nn2
node02:8022
dfs.namenode.http-address.ns.nn1
node01:50070
dfs.namenode.http-address.ns.nn2
node02:50070
dfs.namenode.shared.edits.dir
qjournal://node01:8485;node02:8485;node03:8485/ns1
dfs.client.failover.proxy.provider.ns
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ssh.private-key-files
/root/.ssh/id_rsa
dfs.journalnode.edits.dir
/export/servers/hadoop-2.7.5/data/dfs/jn
dfs.ha.automatic-failover.enabled
true
dfs.namenode.name.dir
file:///export/servers/hadoop-2.7.5/data/dfs/nn/name
dfs.namenode.edits.dir
file:///export/servers/hadoop-2.7.5/data/dfs/nn/edits
dfs.datanode.data.dir
file:///export/servers/hadoop-2.7.5/data/dfs/dn
dfs.permissions
false
dfs.blocksize
134217728
修改yarn-site.xml,注意node03与node02配置不同
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml
yarn.log-aggregation-enable
true
yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id mycluster yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 node03 yarn.resourcemanager.hostname.rm2 node02 yarn.resourcemanager.address.rm1 node03:8032 yarn.resourcemanager.scheduler.address.rm1 node03:8030 yarn.resourcemanager.resource-tracker.address.rm1 node03:8031 yarn.resourcemanager.admin.address.rm1 node03:8033 yarn.resourcemanager.webapp.address.rm1 node03:8088 yarn.resourcemanager.address.rm2 node02:8032 yarn.resourcemanager.scheduler.address.rm2 node02:8030 yarn.resourcemanager.resource-tracker.address.rm2 node02:8031 yarn.resourcemanager.admin.address.rm2 node02:8033 yarn.resourcemanager.webapp.address.rm2 node02:8088 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.ha.id rm1 If we want to launch more than one RM in single node, we need this configuration
yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.resourcemanager.zk-address node02:2181,node03:2181,node01:2181 For multiple zk services, separate them with comma yarn.resourcemanager.ha.automatic-failover.enabled true Enable automatic failover; By default, it is enabled only when HA is enabled. yarn.client.failover-proxy-provider org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider yarn.nodemanager.resource.cpu-vcores 4 yarn.nodemanager.resource.memory-mb 512 yarn.scheduler.minimum-allocation-mb 512 yarn.scheduler.maximum-allocation-mb 512 yarn.log-aggregation.retain-seconds 2592000 yarn.nodemanager.log.retain-seconds 604800 yarn.nodemanager.log-aggregation.compression-type gz yarn.nodemanager.local-dirs /export/servers/hadoop-2.7.5/yarn/local yarn.resourcemanager.max-completed-applications 1000 yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.connect.retry-interval.ms 2000
修改mapred-site.xml
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim mapred-site.xml
修改hadoop-env.sh
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141
集群启动过程
将第一台机器的安装包发送到其他机器上
第一台机器执行以下命令:
cd /export/servers
scp -r hadoop-2.7.5/ node02: P W D s c p − r h a d o o p − 2.7.5 / n o d e 03 : PWD scp -r hadoop-2.7.5/ node03: PWDscp−rhadoop−2.7.5/node03:PWD
三台机器上共同创建目录
三台机器执行以下命令
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits
更改node02的rm2
第二台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml
yarn.resourcemanager.ha.id
rm2
If we want to launch more than one RM in single node, we need this configuration
启动HDFS过程
node01机器执行以下命令
cd /export/servers/hadoop-2.7.5
bin/hdfs zkfc -formatZK
sbin/hadoop-daemons.sh start journalnode
bin/hdfs namenode -format
bin/hdfs namenode -initializeSharedEdits -force
sbin/start-dfs.sh
node02上面执行
cd /export/servers/hadoop-2.7.5
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动yarn过程
node03上面执行
cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh
node02上执行
cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh
查看resourceManager状态
node03上面执行
cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm1
node02上面执行
cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm2
node03启动jobHistory
node03机器执行以下命令启动jobHistory
cd /export/servers/hadoop-2.7.5
sbin/mr-jobhistory-daemon.sh start historyserver
hdfs状态查看
node01机器查看hdfs状态
http://192.168.52.100:50070/dfshealth.html#tab-overview
node02机器查看hdfs状态
http://192.168.52.110:50070/dfshealth.html#tab-overview
yarn集群访问查看
http://node03:8088/cluster
历史任务浏览界面
页面访问:
http://192.168.52.120:19888/jobhistory