安装hadoop准备
Hadoop下载地址:
https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/
安装
解压安装文件到/opt/server下面
tar -zxvf hadoop-2.7.2.tar.gz -C /opt/server/
查看是否解压成功
将Hadoop添加到环境变量
打开/etc/profile文件
vi /etc/profile
在profile文件末尾添加JDK路径:(shitf+g)
HADOOP\_HOME
export HADOOP\_HOME=/opt/server/hadoop-2.7.2
export PATH=$PATH:$HADOOP\_HOME/bin
export PATH=$PATH:$HADOOP\_HOME/sbin
让修改后的文件生效
source /etc/profile
测试是否安装成功
hadoop version
Hadoop 2.7.2
...
配置文件
修改core-site.xml
ha.zookeeper.quorum
node01:2181,node02:2181,node03:2181
fs.defaultFS
hdfs://ns
hadoop.tmp.dir
/export/servers/hadoop-2.7.5/data/tmp
fs.trash.interval
10080
修改hdfs-site.xml
dfs.nameservices
ns
dfs.ha.namenodes.ns
nn1,nn2
dfs.namenode.rpc-address.ns.nn1
node01:8020
dfs.namenode.rpc-address.ns.nn2
node02:8020
dfs.namenode.servicerpc-address.ns.nn1
node01:8022
dfs.namenode.servicerpc-address.ns.nn2
node02:8022
dfs.namenode.http-address.ns.nn1
node01:50070
dfs.namenode.http-address.ns.nn2
node02:50070
dfs.namenode.shared.edits.dir
qjournal://node01:8485;node02:8485;node03:8485/ns1
dfs.client.failover.proxy.provider.ns
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ssh.private-key-files
/root/.ssh/id_rsa
dfs.journalnode.edits.dir
/export/servers/hadoop-2.7.5/data/dfs/jn
dfs.ha.automatic-failover.enabled
true
dfs.namenode.name.dir
file:///export/servers/hadoop-2.7.5/data/dfs/nn/name
dfs.namenode.edits.dir
file:///export/servers/hadoop-2.7.5/data/dfs/nn/edits
dfs.datanode.data.dir
file:///export/servers/hadoop-2.7.5/data/dfs/dn
dfs.permissions
false
dfs.blocksize
134217728
修改yarn-site.xml
yarn.log-aggregation-enable
true
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
mycluster
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
node03
yarn.resourcemanager.hostname.rm2
node02
yarn.resourcemanager.address.rm1
node03:8032
yarn.resourcemanager.scheduler.address.rm1
node03:8030
yarn.resourcemanager.resource-tracker.address.rm1
node03:8031
yarn.resourcemanager.admin.address.rm1
node03:8033
yarn.resourcemanager.webapp.address.rm1
node03:8088
yarn.resourcemanager.address.rm2
node02:8032
yarn.resourcemanager.scheduler.address.rm2
node02:8030
yarn.resourcemanager.resource-tracker.address.rm2
node02:8031
yarn.resourcemanager.admin.address.rm2
node02:8033
yarn.resourcemanager.webapp.address.rm2
node02:8088
yarn.resourcemanager.recovery.enabled
true
yarn.resourcemanager.ha.id
rm1
If we want to launch more than one RM in single node, we need this configuration
yarn.resourcemanager.store.class
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
yarn.resourcemanager.zk-address
node02:2181,node03:2181,node01:2181
For multiple zk services, separate them with comma
yarn.resourcemanager.ha.automatic-failover.enabled
true
Enable automatic failover; By default, it is enabled only when HA is enabled.
yarn.client.failover-proxy-provider
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider
yarn.nodemanager.resource.cpu-vcores
4
yarn.nodemanager.resource.memory-mb
512
yarn.scheduler.minimum-allocation-mb
512
yarn.scheduler.maximum-allocation-mb
512
yarn.log-aggregation.retain-seconds
2592000
yarn.nodemanager.log.retain-seconds
604800
yarn.nodemanager.log-aggregation.compression-type
gz
yarn.nodemanager.local-dirs
/export/servers/hadoop-2.7.5/yarn/local
yarn.resourcemanager.max-completed-applications
1000
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.connect.retry-interval.ms
2000
修改mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
node03:10020
mapreduce.jobhistory.webapp.address
node03:19888
mapreduce.jobtracker.system.dir
/export/servers/hadoop-2.7.5/data/system/jobtracker
mapreduce.map.memory.mb
1024
mapreduce.reduce.memory.mb
1024
mapreduce.task.io.sort.mb
100
mapreduce.task.io.sort.factor
10
mapreduce.reduce.shuffle.parallelcopies
25
yarn.app.mapreduce.am.command-opts
-Xmx1024m
yarn.app.mapreduce.am.resource.mb
1536
mapreduce.cluster.local.dir
/export/servers/hadoop-2.7.5/data/system/local
修改slaves
node01
node02
node03
修改hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141
集群启动过程
将第一台机器的安装包发送到其他机器上
第一台机器执行以下命令
cd /export/servers
scp -r hadoop-2.7.5/ node02:$PWD
scp -r hadoop-2.7.5/ node03:$PWD
三台机器上共同创建目录
三台机器执行以下命令
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits
更改node02的rm2
第二台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml
yarn.resourcemanager.ha.id
rm2
If we want to launch more than one RM in single node, we need this configuration
启动HDFS过程
node01机器执行以下命令
cd /export/servers/hadoop-2.7.5
bin/hdfs zkfc -formatZK
sbin/hadoop-daemons.sh start journalnode
bin/hdfs namenode -format
bin/hdfs namenode -initializeSharedEdits -force
sbin/start-dfs.sh
node02上面执行
cd /export/servers/hadoop-2.7.5
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动yarn过程
node03上面执行
cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh
node02上执行
cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh
查看resourceManager状态
node03上面执行
cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm1
node02上面执行
cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm2
node03启动jobHistory
node03机器执行以下命令启动jobHistory
cd /export/servers/hadoop-2.7.5
sbin/mr-jobhistory-daemon.sh start historyserver
hdfs状态查看
node01机器查看hdfs状态
http://192.168.52.100:50070/dfshealth.html#tab-overview
node02机器查看hdfs状态
http://192.168.52.110:50070/dfshealth.html#tab-overview
yarn集群访问查看
历史任务浏览界面
页面访问: