搭建Hadoop-2.6.0集群
一、硬件配置
四台:IBM System x3650 M4(7915I51)
产品类别:机架式
产品结构:2U
CPU型号:Xeon E5-2650
标配CPU数量:1颗
内存类型:ECC DDR3
内存容量:16GB
硬盘接口类型:SATA/SAS
标配硬盘容量:2TB
详细参数:http://detail.zol.com.cn/331/330619/param.shtml
一台作为master、其余三台作为slaves。
在master上的服务:NameMode、SecondaryNameNode、ResourceManager
在slaves上的服务:DataNode、NodeManager
master和slave1在机架1上,slave2和slave3在机架2上。机架感知,见:配置机架感知
二、集群搭建与配置
1. ssh与cluster shell配置
cluster shell用于在多台机器上执行相同的命令,ssh需要配置为master到slaves节点的无密码登录,用于在master上执行start-dfs.sh、start-yarn.sh等命令。
步骤1:在Master上安装cluster shell,以root用户身份
1) 安装步骤略
2) 配置/etc/clustershell/groups如下:
master: master slaves: slave[1-3] hadoop: master @slaves
说明:
master包括master节点的主机名
slaves包括slaves节点的主机名
hadoop组包括集群中所有节点的主机名,包括乐master和slaves节点
3) 修改/etc/hosts文件,将主机名与IP对应起来
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.120 master master.example.com 192.168.1.121 slave1 slave1.example.com 192.168.1.122 slave2 slave2.example.com 192.168.1.123 slave3 slave3.example.com
步骤2:在hadoop组上root用户的ssh无密码登录
1) 产生密钥对
[root@master ~]# ssh-keygen -t rsa
2) 编写expect脚本~/bin/copy_id.exp,用于以交互方式执行ssh-copy-id命令
#!/usr/bin/expect set node [lindex $argv 0] spawn ssh-copy-id root@$node expect { "Are you sure you want to continue connecting (yes/no)?" { send "yes\n"; exp_continue } "*password:" { send "redhat\n" } } expect eof exit
#!/bin/bash cat /root/bin/hadoop.txt | while read node do echo 'starting copy id to '${node} expect copy_id.exp $node echo 'finishing copy id to '${node} done
其中,hadoop.txt包含所有主机名
master slave1 slave2 slave3
[root@master ~]# chmod a+x bin/cluster_copy_id.sh
[root@master ~]# bin/cluster_copy_id.sh
步骤2之后,root用户就能在集群中使用clush命令执行相同命令了!注意-g选项指定的是在哪些组上执行命令,配置已经在/etc/clustershell/groups中,上面已经有所说明。
步骤3:在“hadoop组”上创建hadoop用户
1) [root@master ~]# clush -g hadoop useradd hadoop
#!/bin/bash cat /root/bin/hadoop.txt | while read node do echo 'starting change passwd to '${node} expect passwd.exp $node echo 'finishing change passwd to '${node} done
3) 编写expect脚本:bin/passwd.exp,交互方式修改密码
#!/usr/bin/expect set node [lindex $argv 0] spawn ssh root@$node passwd hadoop expect "新的 密码:" send "hadoop\n" expect "重新输入新的 密码:" send "hadoop\n" expect eof exit
4) 执行shell脚本,更改hadoop用户的密码
[root@master ~]# chmow a+x bin/cluster_passwd_id.sh
[root@master ~]# bin/cluster_passwd_id.sh
5) 配置hadoop用户的sudo,然后执行需要root权限的命令时,就不需要在切换用户了,而是使用sudo COMMAND的方式。
修改/etc/sudoers,添加:
hadoop ALL=(ALL) ALL
步骤3之后,所有节点上都有了一个hadoop用户,并且设置乐相同的密码,而且有sudo权限。
步骤4: 在hadoop组上hadoop用户的ssh无密码登录
切换到hadoop用户后,参考步骤2的方式,只是用户名和密码不同而已,这里就不重复说乐。
2. 安装Hadoop
步骤1:挂载NFS共享目录(192.168.1.113不在集群中,提供NFS服务),用于下载需要的软件包等
[hadoop@master ~] sudo clush -g hadoop mkdir /mnt/hadoop-nfs
[root@master ~] sudo clush -g hadoop mount -t nfs 192.168.1.113:/home/wangsch/download /mnt/hadoop-nfs/
步骤2:安装Java
1) 在hadoop组安装
[hadoop@master ~] sudo clush -g hadoop tar -xzf /mnt/hadoop-nfs/jdk-7u75-linux-i586.tar.gz -C /opt/
[hadoop@master ~] sudo clush -g hadoop chown -R hadoop:hadoop /opt/jdk1.7.0_75/
[hadoop@master ~] sudo clush -g slaves yum remove -y java-1.6.0-openjdk*
步骤3:安装配置hadoop
1) 解压
[hadoop@master ~] sudo tar -xvzf /mnt/hadoop-nfs/hadoop-2.6.0.tar.gz -C /opt
[hadoop@master ~] chown -R hadoop:hadoop /opt/hadoop-2.6.0
2) 拷贝hosts文件到slaves组
[hadoop@master ~] sudo clush -g slaves --copy /etc/hosts3) 配置slaves文件,编辑/opt/hadoop-2.6.0/etc/hadoop/slaves
slave1 slave2 slave3
4) 配置hadoop($HADOOP_HOME/etc/hadoop/hadoop-env.sh)
# 指定JAVA_HOME export JAVA_HOME=/opt/jdk1.7.0_75 # 修改GC选项 export HADOOP_OPTS="$HADOOP_OPTS -XX:+UseParallelOldGC" # 修改NameNode和SecondaryNameNode堆内存 export HADOOP_NAMENODE_OPTS="-Xmx2000M $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx2000M $HADOOP_SECONDARYNAMENODE_OPTS" # 修改DataNode堆内存 export HADOOP_DATANODE_OPTS="-Xmx3000M $HADOOP_DATANODE_OPTS" # 修改pid目录 export HADOOP_PID_DIR=/data/hadoop-pids export HADOOP_SECURE_DN_PID_DIR=/data/hadoop-pids
5) 配置Yarn($HADOOP_HOME/etc/hadoop/yarn-env.sh)
# 修改ResourceManager堆内存 export YARN_RESOURCEMANAGER_OPTS=-Xmx2000M # 修改NodeManager堆内存 export YARN_NODEMANAGER_OPTS=-Xmx3000M # 修改GC选项 YARN_OPTS="$YARN_OPTS -XX:+UseParallelOldGC" YARN_PID_DIR=/data/hadoop-pids
6) 配置site
common:/opt/hadoop2.6.0/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/tmp</value> </property> <property> <name>topology.script.file.name</name> <value>/opt/hadoop-2.6.0/rack-aware/rack_aware.py</value> </property> </configuration>
创建/opt/hadoop-2.6.0/etc/hadoop/masters文件
master
hdfs:/opt/hadoop2.6.0/etc/hadoop/hdfs-site.xml
<property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/dfs/data</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/data/dfs/namesecondary</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <!-- 配置SecondaryNameNode --> <property> <name>dfs.http.address</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property>
mapreduce:/opt/hadoop-2.6.0/etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
编辑:/opt/hadoop-2.6.0/etc/hadoop/core-site.xml
<property> <name>topology.script.file.name</name> <value>/opt/hadoop-2.6.0/rack-aware/rack_aware.py</value> </property>
编写IP与机架ID的映射
[hadoop@master ~] mkdir /opt/hadoop2.6.0/rack-aware
在rack-aware目录下创建rack_aware.py
#!/usr/bin/python # -*- coding:utf-8 -*- import sys rack = {"slave1":"rack1", "slave2":"rack2", "slave3":"rack2", "192.168.1.121":"rack1", "192.168.1.122":"rack2", "192.168.1.123":"rack2" } if __name__=="__main__": print "/" + rack.get(sys.argv[1],"rack0")
[hadoop@master ~] chown hadoop:hadoop /opt/hadoop-2.6.0/rack-aware/rack_aware.py
[hadoop@master ~] chmod a+x /opt/hadoop-2.6.0/rack-aware/rack_aware.py
8) 统一修改环境变量
编辑~/.bash_profile
export JAVA_HOME=/opt/jdk1.7.0_75 export HADOOP_HOME=/opt/hadoop-2.6.0 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
分发文件
[hadoop@master ~] sudo clush -g slaves --copy ~/.bash_profile
9) 将hadoop安装到其他节点
[hadoop@master ~] sudo clush -g slaves --copy /opt/hadoop-2.6.0
[hadoop@master ~] sudo clush -g slaves chown -R hadoop:hadoop /opt/hadoop-2.6.0
步骤4:启动Hadoop
1) 格式化NameNode
[hadoop@master ~] cd /opt/hadoop-2.6.0
[hadoop@master hadoop-2.6.0]$ bin/hdfs namenode -format
2) 启动HDFS
[hadoop@master hadoop-2.6.0]$ sbin/start-dfs.sh
3) 启动Yarn
[hadoop@master hadoop-2.6.0]$ sbin/start-yarn.sh
4) 关闭防火墙(解决webUI界面无法查看)
[hadoop@master hadoop-2.6.0]$ sudo clush -g hadoop service iptables stop
[hadoop@master hadoop-2.6.0]$ sudo clush -g hadoop chkconfig iptables off
4) 查看启动状况
[hadoop@master hadoop-2.6.0]$ clush -g hadoop /opt/jdk1.7.0_75/bin/jps | sort
master: 14362 NameNode
master: 14539 SecondaryNameNode
master: 15285 ResourceManager
master: 15585 Jps
slave1: 5469 DataNode
slave1: 5801 NodeManager
slave1: 5932 Jps
slave2: 5005 DataNode
slave2: 5296 NodeManager
slave2: 5427 Jps
slave3: 4889 DataNode
slave3: 5196 NodeManager
slave3: 5327 Jps
5) 查看HDFS
[hadoop@master hadoop-2.6.0]$ hdfs dfsadmin -report
15/03/18 17:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 61443870720 (57.22 GB)
Present Capacity: 52050423808 (48.48 GB)
DFS Remaining: 52050350080 (48.48 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.1.123:50010 (slave3)
Hostname: slave3
Rack: /rack2
Decommission Status : Normal
Configured Capacity: 20481290240 (19.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3131166720 (2.92 GB)
DFS Remaining: 17350098944 (16.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.71%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 17:21:15 CST 2015
Name: 192.168.1.122:50010 (slave2)
Hostname: slave2
Rack: /rack2
Decommission Status : Normal
Configured Capacity: 20481290240 (19.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3131105280 (2.92 GB)
DFS Remaining: 17350160384 (16.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.71%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 17:21:17 CST 2015
Name: 192.168.1.121:50010 (slave1)
Hostname: slave1
Rack: /rack1
Decommission Status : Normal
Configured Capacity: 20481290240 (19.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3131174912 (2.92 GB)
DFS Remaining: 17350090752 (16.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.71%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 17:21:17 CST 2015
-----------------------------------------------------------------------------------------------------------------------------------------------
三、问题记录
1. 格式化时出现:KnownHostException: master.example.com
使用hostname查看,显示master.example.com,master是短名称,同时要配置全名称,所以更新/etc/hosts文件
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.120 master master.example.com 192.168.1.121 slave1 slave1.example.com 192.168.1.122 slave2 slave2.example.com 192.168.1.123 slave3 slave3.example.com
2. 格式化时出现:org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory is in an inconsistent state: storage directory does not exist or is not accessible.
[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/namesecondary
[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/name
[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/data
[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/tmp
[hadoop@master hadoop-2.6.0] sudo clush -g hadoop chown -R hadoop:hadoop /data
3. [root@master hadoop-2.6.0]# sbin/start-dfs.sh
15/03/17 15:58:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [Java HotSpot(TM) Client VM warning: You have loaded library /opt/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
master]
-c: Unknown cipher type 'cd'
or: ssh: Could not resolve hostname or: Name or service not known
fix: ssh: Could not resolve hostname fix: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
with: ssh: Could not resolve hostname with: Name or service not known
with: ssh: Could not resolve hostname with: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
to: ssh: Could not resolve hostname to: Name or service not known
sed:-e 表达式 #1,字符 6:“s”的未知选项
that: ssh: Could not resolve hostname that: Name or service not known
will: ssh: Could not resolve hostname will: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
recommended: ssh: Could not resolve hostname recommended: Name or service not known
Java: ssh: Could not resolve hostname Java: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
disabled: ssh: Could not resolve hostname disabled: Name or service not known
link: ssh: Could not resolve hostname link: Name or service not known
you: ssh: Could not resolve hostname you: Name or service not known
You: ssh: Could not resolve hostname You: Name or service not known
Client: ssh: Could not resolve hostname Client: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
guard.: ssh: Could not resolve hostname guard.: Name or service not known
the: ssh: Could not resolve hostname the: Name or service not known
在~/.bash_profile中添加
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
4. SecondaryNameNode启动时要求ssh验证,解决:
1) 创建/opt/hadoop-2.6.0/etc/hadoop/masters文件
master
2) 编辑:hdfs-site.xml
<property> <name>dfs.http.address</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property>