一、软件准备
下载Hadoop http://apache.fayea.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
下载JDK http://download.oracle.com/otn-pub/java/jdk/8u66-b17/jdk-8u66-linux-x64.tar.gz
二、服务器准备(三台服务器,最好4台)
最好4台服务器,1台namenode,3台datanode(hadoop默认副本数为3),如果空余的话可以加一个secondnode(namenode的备用节点,据说已经通过zookper实现了热备,这个有待考证。)
准备三台服务器:
角色 IP 主机名
Namenode 192.168.63.227 NameNode
Datanode 192.168.63.202 node1
Datanode 192.168.63.203 node2
三、服务器环境准备(每台服务器都执行)
1、每台机器都同步时间
1
|
ntpdate
time
.windows.com
|
2、每台机器都关闭iptables、关闭selinux
1
2
|
service iptables stop;chkconfig iptables off
sed
-i
's/SELINUX=enforcing/SELINUX=disabled/g'
/etc/sysconfig/selinux
; setenforce 0
|
3、每台服务器都修改/etc/hosts文件
增加如下内容:
1
2
3
|
192.168.63.227 NameNode
192.168.63.202 node1
192.168.63.203 node2
|
4、每台服务器增加一个运行hadoop的用户(可选,测试环境可用root用户启动)
1
|
useradd
hduser &&
echo
"123456"
|
passwd
--stdin hduser
|
5、每台服务器都配置java环境(这里用的是jdk-8.66,需jdk7以上)
1
2
|
tar
xf jdk-8u66-linux-x64.
tar
.gz -C
/usr/local/
vim
/etc/profile
.d
/java
.sh
|
输入:
1
2
3
|
JAVA_HOME=
/usr/local/jdk1
.8.0_66
PATH=$JAVA_HOME
/bin
:$PATH
export
JAVA_HOME PATH
|
1
2
|
执行:
source
/etc/profile
.d
/java
.sh
|
执行:java -version 查看java版本是否已经变成最新的。
6、每台服务器都配置Hadoop环境(这里使用的是Hadoop2.7.1,先不用安装Hadoop。)
1
|
vim
/etc/profile
.d
/hadoop
.sh
|
输入:
1
2
3
4
|
HADOOP_HOME=
/usr/local/hadoop
PATH=$HADOOP_HOME
/bin
:$PATH
PATH=$HADOOP_HOME
/sbin
:$PATH
export
HADOOP_HOME PATH
|
1
2
|
执行:
source
/etc/profile
.d
/hadoop
.sh
|
四、配置NameNode到其他节点ssh免密登陆
如果是非root用户启动Hadoop,这里配置指定用户的ssh免密登陆(例如上面增加了用户hduser,这里要配置hduser的免密登陆,需要先su - hduser)
1
2
3
4
|
ssh
-keygen -t rsa -P
''
(一路回车,不要输入密码)
ssh
-copy-
id
-i ~/.
ssh
/id_rsa
.pub root@namenode(这里要特别注意,也要配置到本机的
ssh
免密登陆)
ssh
-copy-
id
-i ~/.
ssh
/id_rsa
.pub root@node1
ssh
-copy-
id
-i ~/.
ssh
/id_rsa
.pub root@node2
|
五、配置Hadoop(在NameNode服务器上进行操作)
1、安装Hadoop
1
2
3
|
tar
xf hadoop-2.7.1.
tar
.gz -C
/usr/local/
ln
-sv
/usr/local/hadoop-2
.7.1
/usr/local/hadoop
(如果以非root用户启动Hadoop,执行:
chown
-R hduser.
/usr/local/hadoop-2
.7.1/)
|
2、配置Hadoop
创建目录:
1
2
|
cd
/usr/local/hadoop
mkdir
tmp &&
mkdir
-p hdfs
/data
&&
mkdir
-p hdfs
/name
|
修改配置文件:
1
|
cd
/usr/local/hadoop
|
1
|
vim etc
/hadoop/core-site
.xml
|
在<configuration>中间插入:
1
2
3
4
5
6
7
8
9
10
11
12
|
<
property
>
<
name
>fs.defaultFS</
name
>
<
value
>hdfs://NameNode:9000</
value
>
</
property
>
<
property
>
<
name
>hadoop.tmp.dir</
name
>
<
value
>file:///usr/local/hadoop/tmp</
value
>
</
property
>
<
property
>
<
name
>io.file.buffer.size</
name
>
<
value
>131702</
value
>
</
property
>
|
1
|
vim etc
/hadoop/hdfs-site
.xml
|
在<configuration>中间插入:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
<
property
>
<
name
>dfs.namenode.name.dir</
name
>
<
value
>file:///usr/local/hadoop/hdfs/name</
value
>
</
property
>
<
property
>
<
name
>dfs.datanode.data.dir</
name
>
<
value
>file:///usr/local/hadoop/hdfs/data</
value
>
</
property
>
<
property
>
<
name
>dfs.replication</
name
>
<
value
>2</
value
>
</
property
>
<
property
>
<
name
>dfs.webhdfs.enabled</
name
>
<
value
>true</
value
>
</
property
>
|
1
|
cp
etc
/hadoop/mapred-site
.xml.template etc
/hadoop/mapred-site
.xml
|
1
|
vim etc
/hadoop/mapred-site
.xml
|
在<configuration>中间插入:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
<
property
>
<
name
>mapreduce.framework.name</
name
>
<
value
>yarn</
value
>
<
final
>true</
final
>
</
property
>
<
property
>
<
name
>mapreduce.jobtracker.http.address</
name
>
<
value
>NameNode:50030</
value
>
</
property
>
<
property
>
<
name
>mapreduce.jobhistory.address</
name
>
<
value
>NameNode:10020</
value
>
</
property
>
<
property
>
<
name
>mapreduce.jobhistory.webapp.address</
name
>
<
value
>NameNode:19888</
value
>
</
property
>
<
property
>
<
name
>mapred.job.tracker</
name
>
<
value
>http://NameNode:9001</
value
>
</
property
>
|
1
|
vim etc
/hadoop/yarn-site
.xml
|
在<configuration>中间插入:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
<
property
>
<
name
>yarn.resourcemanager.hostname</
name
>
<
value
>NameNode</
value
>
</
property
>
<
property
>
<
name
>yarn.nodemanager.aux-services</
name
>
<
value
>mapreduce_shuffle</
value
>
</
property
>
<
property
>
<
name
>yarn.nodemanager.auxservices.mapreduce.shuffle.class</
name
>
<
value
>org.apache.hadoop.mapred.ShuffleHandler</
value
>
</
property
>
<
property
>
<
name
>yarn.resourcemanager.address</
name
>
<
value
>NameNode:8032</
value
>
</
property
>
<
property
>
<
name
>yarn.resourcemanager.scheduler.address</
name
>
<
value
>NameNode:8030</
value
>
</
property
>
<
property
>
<
name
>yarn.resourcemanager.resource-tracker.address</
name
>
<
value
>NameNode:8031</
value
>
</
property
>
<
property
>
<
name
>yarn.resourcemanager.admin.address</
name
>
<
value
>NameNode:8033</
value
>
</
property
>
<
property
>
<
name
>yarn.resourcemanager.webapp.address</
name
>
<
value
>NameNode:8088</
value
>
</
property
>
<
property
>
<
name
>yarn.nodemanager.resource.memory-mb</
name
>
<
value
>2048</
value
>
</
property
>
<
property
>
<
name
>yarn.nodemanager.resource.cpu-vcores</
name
>
<
value
>1</
value
>
</
property
>
|
修改etc/hadoop/slaves
删除localhost,把所有的datanode添加到这个文件中。
增加:
node1
node2
六、复制haoop到其他节点
将NameNode服务器上的hadoop整个copy到另外2个节点上(NameNode节点执行)
1
2
|
scp
-r hadoop-2.7.1 root@node1:
/usr/local/
scp
-r hadoop-2.7.1 root@node2:
/usr/local/
|
在所有节点上执行:
1
|
ln
-sv
/usr/local/hadoop-2
.7.1
/usr/local/hadoop
|
七、格式化(namenode节点执行)
1
|
cd
/usr/local/hadoop
|
#格式化namenode
1
|
hdfs namenode -
format
|
八、启动Hadoop(namenode节点执行)
登陆到NameNode节点: 执行:start-all.sh(之前已经配置过hadoop的环境变量了,所以直接执行就可以。)
如果是非root用户启动,需要su - hduser,然后再执行 start-all.sh
停止:stop-all.sh
九、验证
在所有节点上执行:
hadoop dfsadmin -report
出现类似下图结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
[root@node1 ~]
# hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs
command
is deprecated.
Instead use the hdfs
command
for
it.
15
/11/13
22:39:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library
for
your platform... using
builtin
-java classes where applicable
Configured Capacity: 37139136512 (34.59 GB)
Present Capacity: 27834056704 (25.92 GB)
DFS Remaining: 27833999360 (25.92 GB)
DFS Used: 57344 (56 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.63.202:50010 (node1)
Hostname: node1
Decommission Status : Normal
Configured Capacity: 18569568256 (17.29 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4652568576 (4.33 GB)
DFS Remaining: 13916971008 (12.96 GB)
DFS Used%: 0.00%
DFS Remaining%: 74.95%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Nov 13 22:39:28 CST 2015
Name: 192.168.63.203:50010 (node2)
Hostname: node2
Decommission Status : Normal
Configured Capacity: 18569568256 (17.29 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 4652511232 (4.33 GB)
DFS Remaining: 13917028352 (12.96 GB)
DFS Used%: 0.00%
DFS Remaining%: 74.95%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Nov 13 22:39:27 CST 2015
|
在namenode节点上执行命令:jps
1
2
3
4
5
|
[root@NameNode ~]
# jps
7187 Jps
3493 NameNode
3991 SecondaryNameNode
4136 ResourceManager
|
在datanode节点上执行jps
1
2
3
4
|
[root@node1 ~]
# jps
2801 NodeManager
3970 Jps
2698 DataNode
|
访问hadoop的web页面:
http://namenode:8088
http://namenode:50070