所需环境:四台主机(笔者用四台VMware虚拟机代替),centos6.5系统,hadoop-2.7.1软件包,jdk1.8.0_91
准备工作:创建四台虚拟主机,使用NAT模式访问网络。
1)在四台虚拟机中安装好hadoop和jdk软件;
2)更改每个主机的主机名:
[master]下:gedit/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=master
NTPSERVERARGS=iburst
[slave1]下:gedit/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=slave1
NTPSERVERARGS=iburst
[slave2]下:gedit/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=slave2
NTPSERVERARGS=iburst
[slave3]下:gedit/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=slave3
NTPSERVERARGS=iburst
3)更改各个主机的hosts文件:
[zq@master~]$ sudo gedit /etc/hosts
[sudo]password for zq:
在hosts文件中加入:
192.168.44.142 master
192.168.44.140 slave1
192.168.44.143 slave2
192.168.44.141 slave3
同理,在slave1,slave2,slave3中的hosts文件中添加同样ip。
4)关闭各个主机的防火墙:sudoservice iptables stop
开始配置安装:
1.设置免密码登录,如上图的结构图所示,ssh免密码登录使master可以免密码访问slave1,slave2,slave3即可,这里不进行详细解说。
2.在hadoop的配置文件中新建文件fairscheduler.xml:
[zq@master ~]$ cd/home/zq/soft/hadoop-2.7.1/etc/hadoop/
[zq@master hadoop]$ touch fairscheduler.xml
[zq@master hadoop]$ gedit fairscheduler.xml
配置fairscheduler.xml文件:
102400 mb, 50 vcores
153600mb, 100 vcores
200
300
1.0
root,yarn,search,hdfs,zq
102400 mb,30 vcores
153600 mb, 50 vcores
102400 mb,30 vcores
153600 mb, 50 vcores
3.配置hadoop文件:core-site.xml:
[zq@master hadoop-2.7.1]$ sudo geditcore core-site.xml
fs.default.name
hdfs://master:8020
master=8020
fs.default.name是定义master的url和端口号,读者可以将master改为自己设置的主机名或地址。
4.配置hadoop文件:hdfs-site.xml:
[zq@master hadoop-2.7.1]$ sudo gedithdfs-site.xml
分配集群cluster1,cluster2
dfs.nameservices
cluster1,cluster2
nameservices
为集群cluster1分配NameNode,名为nn1和nn2
dfs.ha.namenodes.cluster1
nn1,nn2
namenodes.cluster1
dfs.namenode.rpc-address.cluster1.nn1
master:8020
rpc-address.cluster1.nn1
dfs.namenode.rpc-address.cluster1.nn2
slave1:8020
rpc-address.cluster1.nn2
dfs.namenode.http-address.cluster1.nn1
master:50070
http-address.cluster1.nn1
dfs.namenode.http-address.cluster1.nn2
slave1:50070
http-address.cluster1.nn2
为集群cluster2分配NameNode,名为nn3和nn4
dfs.ha.namenodes.cluster2
nn3,nn4
namenodes.cluster2
dfs.namenode.rpc-address.cluster2.nn3
slave2:8020
rpc-address.cluster2.nn3
dfs.namenode.rpc-address.cluster2.nn4
slave3:8020
rpc-address.cluster2.nn4
dfs.namenode.http-address.cluster2.nn3
slave2:50070
http-address.cluster2.nn3
dfs.namenode.http-address.cluster2.nn4
slave3:50070
http-address.cluster2.nn4
dfs.namenode.name.dir
file:///home/zq/soft/hadoop-2.7.1/hdfs/name
dfs.namenode.name.dir
创建Name存储路径,读者需更改为自己的路径
dfs.namenode.shared.edits.dir
qjournal://slave1:8485;slave2:8485;slave3:8485/cluster1
shared.edits.dir
**这里必须注意,根据结构图,master和slave1共享cluster1,slave2和slave3共享cluster2
dfs.namenode.data.dir
file:///home/zq/soft/hadoop-2.7.1/hdfs/data
data.dir
创建data路径
5.配置hadoop文件:yarn-site.xml:
[zq@master hadoop-2.7.1]$ sudo gedit yarn-site.xml
yarn.resourcemanager.hostname
master
hostname=master
yarn.resourcemanager.address
${yarn.resourcemanager.hostname}:8032
address=master:8032
yarn.resourcemanager.scheduler.addresss
${yarn.resourcemanager.hostname}:8030
scheduler.address=master:8030
yarn.resourcemanager.webapp.addresss
${yarn.resourcemanager.hostname}:8088
webapp.address=master:8088
yarn.resourcemanager.webapp.https.addresss
${yarn.resourcemanager.hostname}:8090
https.addresss=8090
yarn.resourcemanager.resource-tracker.addresss
${yarn.resourcemanager.hostname}:8031
yarn.resourcemanager.admin.addresss
${yarn.resourcemanager.hostname}:8033
admin.addresss
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
org.apache.hadoop.yarn.server.resourcemanager
yarn.scheduler.fair.allocation.file
${yarn.home.dir}/etc/hadoop/fairscheduler.xml
scheduler.fair.allocation.file=.xml
yarn.nodemanager.local-dirs
/home/zq/soft/hadoop-2.7.1/yarn/local
nodemanager.local-dirs
yarn.log-aggregation-enable
true
yarn.log-aggregation-enable
yarn.nodemanager.remote-app-log-dir
/tmp/logs
remote-app-log-dir=/tmp/logs
yarn.nodemanager.resource.memory-mb
30720
resource.memory-mb=30720
yarn.nodemanager.resource.cpu-vcores
12
resource.cpu-vcores
yarn.nodemanager.aux-services
mapreduce-shuffle
aux-services
6.配置hadoop文件:mapred-site.xml
[zq@master hadoop-2.7.1]$ sudo gedit mapred-site.xml
mapreduce.framework.name
yarn
yarn
mapreduce.jobhistory.address
slave1:10020
slave1=10020
mapreduce.jobhistory.webapp.address
slave1:19888
slave2=19888
7.配置hadoop文件:hadoop-env.sh
[zq@master hadoop-2.7.1]$ sudo gedit hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/home/zq/soft/jdk1.8.0_91
8. 配置hadoop文件:slaves
[zq@master hadoop-2.7.1]$ gedit slaves
slave1
slave2
slave3
9.利用scp命令将以上所有配置文件拷贝到slave1,slave2,slave3中:
[zq@master etc]$ scp hadoop/*zq@slave1:/home/zq/soft/hadoop-2.7.1/etc/hadoop
[zq@master etc]$ scp hadoop/*zq@slave2:/home/zq/soft/hadoop-2.7.1/etc/hadoop
[zq@master etc]$ scp hadoop/*zq@slave3:/home/zq/soft/hadoop-2.7.1/etc/hadoop
10.至此,hadoop的所有配置已经完成。接下来开始启动hadoop服务:
11.简单pi测试
[zq@master hadoop-2.7.1]bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jarpi 2 1000