1. 集群信息如下:
主机名 |
Hadoop角色 |
Hadoop jps命令结果 |
Hadoop用户 |
Hadoop安装目录 |
master server152 |
Master slaves |
NameNode DataNode JobTracker TaskTracker SecondaryNameNode |
创建相同的用户的组名:hadoop。 安装hadoop-2.7.2时使用hadoop用户,并且hadoop的文件夹归属也是hadoop:hadoop |
/usr/local/hadoop |
slave1 server153 |
slaves |
DataNode TaskTracker |
||
slave2 server154 |
slaves |
DataNode TaskTracker |
注:master即使master又是slave.
3台64位centos6.5 + Hadoop2.7.2 + java7
2. 配置服务器的主机名
Namenode节点对应的主机名为server152
Datanode节点对应的主机名分别为server153、server154
3. 编辑每台机器的hosts, 以及主机名hostname。 (以server153为例子)
[root@server153 ~]# vi /etc/hosts 192.168.1.152 server152 192.168.1.153 server153 192.168.1.154 server154
[root@server153 ~]# cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=server153 NETWORKING_IPV6=yes IPV6_AUTOCONF=no
4. 创建用户组
groupadd hadoop 添加一个组
useradd hadoop -g hadoop 添加用户
5. 安装hadoop
下载:http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
解压到/usr/local/hadoop/hadoop2.7.2
hadoop也要设置环境变量,使用vi /etc/profile命令编辑添加如下内容:
[root@server153 ~]# cat /etc/profile export HADOOP_HOME=/usr/local/hadoop/hadoop2.7.2 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH #执行source /etc/profile使配置文件生效 [root@server153 ~]# source /etc/profile #修改所有者改为hadoop [root@server153 ~]#chown -R hadoop:hadoop /usr/local/hadoop/
5. SSH设置无密码验证
a)安装SSH,并让master免验证登陆自身服务器、节点服务器
#执行下面命令,让master节点能够免验证登陆自身服务器 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa >> ~/.ssh/authorized_keys export HADOOP\_PREFIX=/usr/local/hadoop/hadoop2.7.2
b) 让主结点(master)能通过SSH免密码登录两个子结点(slave)
#为了实现这个功能,两个slave结点的公钥文件中必须要包含主结点的公钥信息,这样当master就可以顺利安全地访问这两个slave结点了 # 在slave的机器上执行一下命令 scp hadoop@server152:~/.ssh/id_dsa.pub ~/.ssh/id_dsa.pub cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys #设置权限,否则ssh依然需要输入密码 chmod -R 700 ~/.ssh
6. 安装Hadoop
解压到/usr目录下面,改名为hadoop。
hadoop也要设置环境变量,使用vi /etc/profile命令编辑添加如下内容:
export HADOOP_HOME=/usr/local/hadoop/hadoop2.7.2 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
7. 配置Hadoop
配置之前,需要在server152本地文件系统创建以下文件夹:
/usr/loacal/hadoop/name /usr/loacal//hadoop/data /usr/loacal//hadoop/temp
这里要涉及到的配置文件有7个:
~/hadoop-2.7.2/etc/hadoop/hadoop-env.sh ~/hadoop-2.7.2/etc/hadoop/yarn-env.sh ~/hadoop-2.7.2/etc/hadoop/slaves ~/hadoop-2.7.2/etc/hadoop/core-site.xml ~/hadoop-2.7.2/etc/hadoop/hdfs-site.xml ~/hadoop-2.7.2/etc/hadoop/mapred-site.xml ~/hadoop-2.7.2/etc/hadoop/yarn-site.xml
core-site.xml:
fs.defaultFS hdfs://server152:9000 hadoop.tmp.dir file:/usr/local/hadoop/fs/temp Abase for other temporary directories.
hdfs-site.xml:
dfs.namenode.secondary.http-address hdfs://server152:9001 dfs.namenode.name.dir file:/usr/local/hadoop/fs/name dfs.datanode.data.dir file:/usr/local/hadoop/fs/data dfs.replication 3 dfs.webhdfs.enabled true
mapred-site.xml:
mapreduce.framework.name yarn mapreduce.jobhistory.address server152:10020 mapreduce.jobhistory.webapp.address server152:19888
yarn-site.xml:
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.scheduler.address server152:8030 yarn.resourcemanager.resource-tracker.address server152:8031 yarn.resourcemanager.address server152:8032 yarn.resourcemanager.admin.address server152:8033 yarn.resourcemanager.webapp.address server152:8088
slave.xml:
server153 server154