搭建Hadoop-2.6.0集群

搭建Hadoop-2.6.0集群


一、硬件配置

四台:IBM System x3650 M4(7915I51)

  • 产品类别:机架式

    产品结构:2U

  • CPU型号:Xeon E5-2650

    标配CPU数量:1颗

  • 内存类型:ECC DDR3

    内存容量:16GB

  • 硬盘接口类型:SATA/SAS

    标配硬盘容量:2TB


详细参数:http://detail.zol.com.cn/331/330619/param.shtml


一台作为master、其余三台作为slaves。

在master上的服务:NameMode、SecondaryNameNode、ResourceManager

在slaves上的服务:DataNode、NodeManager

master和slave1在机架1上,slave2和slave3在机架2上。机架感知,见:配置机架感知


二、集群搭建与配置

1. ssh与cluster shell配置

cluster shell用于在多台机器上执行相同的命令,ssh需要配置为master到slaves节点的无密码登录,用于在master上执行start-dfs.sh、start-yarn.sh等命令。

步骤1:在Master上安装cluster shell,以root用户身份

1) 安装步骤略


2) 配置/etc/clustershell/groups如下:

master: master
slaves: slave[1-3]
hadoop: master @slaves


说明:

master包括master节点的主机名

slaves包括slaves节点的主机名

hadoop组包括集群中所有节点的主机名,包括乐master和slaves节点


3) 修改/etc/hosts文件,将主机名与IP对应起来

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.120	master	master.example.com
192.168.1.121	slave1	slave1.example.com
192.168.1.122	slave2	slave2.example.com
192.168.1.123	slave3	slave3.example.com


步骤2:在hadoop组上root用户的ssh无密码登录

1) 产生密钥对

[root@master ~]# ssh-keygen -t rsa

2) 编写expect脚本~/bin/copy_id.exp,用于以交互方式执行ssh-copy-id命令

#!/usr/bin/expect
set node [lindex $argv 0]
spawn ssh-copy-id root@$node
expect {
    "Are you sure you want to continue connecting (yes/no)?" { send "yes\n"; exp_continue }
    "*password:" { send "redhat\n" }
}

expect eof
exit

2) 编写shell脚本cluster_copy_id.sh,用于对每个节点执行copy_id.exp

#!/bin/bash
cat /root/bin/hadoop.txt | while read node
do
    echo 'starting copy id to '${node}
    expect copy_id.exp $node
    echo 'finishing copy id to '${node}
done

其中,hadoop.txt包含所有主机名

master
slave1
slave2
slave3

3) 执行shell脚本

[root@master ~]# chmod a+x bin/cluster_copy_id.sh

[root@master ~]# bin/cluster_copy_id.sh


步骤2之后,root用户就能在集群中使用clush命令执行相同命令了!注意-g选项指定的是在哪些组上执行命令,配置已经在/etc/clustershell/groups中,上面已经有所说明。


步骤3:在“hadoop组”上创建hadoop用户

1) [root@master ~]# clush -g hadoop useradd hadoop


2) 编写shell脚本:bin/cluster_passwd_id.sh,用于修改hadoop用户的密码(是因为不能通过clush不支持交互的方式修改密码,所以通过编写脚本,在每个节点上循环执行修改密码操作)

#!/bin/bash
cat /root/bin/hadoop.txt | while read node
do
	echo 'starting change passwd to '${node}
	expect passwd.exp $node
	echo 'finishing change passwd to '${node}
done


3) 编写expect脚本:bin/passwd.exp,交互方式修改密码

#!/usr/bin/expect
set node [lindex $argv 0]
spawn ssh root@$node passwd hadoop
expect "新的 密码:"
send "hadoop\n"
expect "重新输入新的 密码:"
send "hadoop\n"
expect eof
exit

4) 执行shell脚本,更改hadoop用户的密码

[root@master ~]# chmow a+x bin/cluster_passwd_id.sh

[root@master ~]# bin/cluster_passwd_id.sh


5) 配置hadoop用户的sudo,然后执行需要root权限的命令时,就不需要在切换用户了,而是使用sudo COMMAND的方式。

修改/etc/sudoers,添加:

hadoop	ALL=(ALL)	ALL

步骤3之后,所有节点上都有了一个hadoop用户,并且设置乐相同的密码,而且有sudo权限。


步骤4: 在hadoop组上hadoop用户的ssh无密码登录

切换到hadoop用户后,参考步骤2的方式,只是用户名和密码不同而已,这里就不重复说乐。


2. 安装Hadoop

步骤1:挂载NFS共享目录(192.168.1.113不在集群中,提供NFS服务),用于下载需要的软件包等

[hadoop@master ~] sudo clush -g hadoop mkdir /mnt/hadoop-nfs

[root@master ~] sudo clush -g hadoop mount -t nfs 192.168.1.113:/home/wangsch/download /mnt/hadoop-nfs/

步骤2:安装Java

1) 在hadoop组安装

[hadoop@master ~] sudo clush -g hadoop tar -xzf /mnt/hadoop-nfs/jdk-7u75-linux-i586.tar.gz -C /opt/

[hadoop@master ~] sudo clush -g hadoop chown -R hadoop:hadoop /opt/jdk1.7.0_75/

[hadoop@master ~] sudo clush -g slaves yum remove -y java-1.6.0-openjdk*


步骤3:安装配置hadoop

1) 解压

[hadoop@master ~] sudo tar -xvzf /mnt/hadoop-nfs/hadoop-2.6.0.tar.gz -C /opt

[hadoop@master ~] chown -R hadoop:hadoop /opt/hadoop-2.6.0

2) 拷贝hosts文件到slaves组

[hadoop@master ~] sudo clush -g slaves --copy /etc/hosts

3) 配置slaves文件,编辑/opt/hadoop-2.6.0/etc/hadoop/slaves

slave1
slave2
slave3

4) 配置hadoop($HADOOP_HOME/etc/hadoop/hadoop-env.sh)

# 指定JAVA_HOME
export JAVA_HOME=/opt/jdk1.7.0_75

# 修改GC选项
export HADOOP_OPTS="$HADOOP_OPTS -XX:+UseParallelOldGC"

# 修改NameNode和SecondaryNameNode堆内存
export HADOOP_NAMENODE_OPTS="-Xmx2000M $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx2000M $HADOOP_SECONDARYNAMENODE_OPTS"

# 修改DataNode堆内存
export HADOOP_DATANODE_OPTS="-Xmx3000M $HADOOP_DATANODE_OPTS"

# 修改pid目录
export HADOOP_PID_DIR=/data/hadoop-pids
export HADOOP_SECURE_DN_PID_DIR=/data/hadoop-pids


5) 配置Yarn($HADOOP_HOME/etc/hadoop/yarn-env.sh)

# 修改ResourceManager堆内存

export YARN_RESOURCEMANAGER_OPTS=-Xmx2000M

# 修改NodeManager堆内存
export YARN_NODEMANAGER_OPTS=-Xmx3000M

# 修改GC选项
YARN_OPTS="$YARN_OPTS -XX:+UseParallelOldGC"

YARN_PID_DIR=/data/hadoop-pids 

6) 配置site

common:/opt/hadoop2.6.0/etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/tmp</value>
    </property>
    <property>
        <name>topology.script.file.name</name>
        <value>/opt/hadoop-2.6.0/rack-aware/rack_aware.py</value>
    </property>
</configuration>

创建/opt/hadoop-2.6.0/etc/hadoop/masters文件

master

hdfs:/opt/hadoop2.6.0/etc/hadoop/hdfs-site.xml

<property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/dfs/data</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/data/dfs/namesecondary</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <!-- 配置SecondaryNameNode --> <property> <name>dfs.http.address</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property>


mapreduce:/opt/hadoop-2.6.0/etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
	<value>yarn</value>
     </property>
</configuration>

yarn:/opt/hadoop-2.6.0/etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
	<value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

7) 配置机架感知

编辑:/opt/hadoop-2.6.0/etc/hadoop/core-site.xml

<property>
    <name>topology.script.file.name</name>
    <value>/opt/hadoop-2.6.0/rack-aware/rack_aware.py</value>
</property>

编写IP与机架ID的映射

[hadoop@master ~] mkdir /opt/hadoop2.6.0/rack-aware

在rack-aware目录下创建rack_aware.py

#!/usr/bin/python
# -*- coding:utf-8 -*-
import sys  

rack = {"slave1":"rack1",
        "slave2":"rack2",
        "slave3":"rack2",
        "192.168.1.121":"rack1",
        "192.168.1.122":"rack2",
        "192.168.1.123":"rack2"
        }


if __name__=="__main__":
    print "/" + rack.get(sys.argv[1],"rack0")

[hadoop@master ~] chown hadoop:hadoop /opt/hadoop-2.6.0/rack-aware/rack_aware.py

[hadoop@master ~] chmod a+x /opt/hadoop-2.6.0/rack-aware/rack_aware.py


8) 统一修改环境变量

编辑~/.bash_profile

export JAVA_HOME=/opt/jdk1.7.0_75
export HADOOP_HOME=/opt/hadoop-2.6.0
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

分发文件

[hadoop@master ~] sudo clush -g slaves --copy ~/.bash_profile

9) 将hadoop安装到其他节点

[hadoop@master ~] sudo clush -g slaves --copy /opt/hadoop-2.6.0

[hadoop@master ~] sudo clush -g slaves chown -R hadoop:hadoop /opt/hadoop-2.6.0


步骤4:启动Hadoop

1) 格式化NameNode

[hadoop@master ~] cd /opt/hadoop-2.6.0

[hadoop@master hadoop-2.6.0]$ bin/hdfs namenode -format

2) 启动HDFS

[hadoop@master hadoop-2.6.0]$ sbin/start-dfs.sh

3) 启动Yarn

[hadoop@master hadoop-2.6.0]$ sbin/start-yarn.sh

4) 关闭防火墙(解决webUI界面无法查看)

[hadoop@master hadoop-2.6.0]$ sudo clush -g hadoop service iptables stop

[hadoop@master hadoop-2.6.0]$ sudo clush -g hadoop chkconfig iptables off


4) 查看启动状况

[hadoop@master hadoop-2.6.0]$ clush -g hadoop /opt/jdk1.7.0_75/bin/jps | sort
master: 14362 NameNode
master: 14539 SecondaryNameNode
master: 15285 ResourceManager
master: 15585 Jps
slave1: 5469 DataNode
slave1: 5801 NodeManager
slave1: 5932 Jps
slave2: 5005 DataNode
slave2: 5296 NodeManager
slave2: 5427 Jps
slave3: 4889 DataNode
slave3: 5196 NodeManager
slave3: 5327 Jps


5) 查看HDFS

[hadoop@master hadoop-2.6.0]$ hdfs dfsadmin -report
15/03/18 17:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 61443870720 (57.22 GB)
Present Capacity: 52050423808 (48.48 GB)
DFS Remaining: 52050350080 (48.48 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.1.123:50010 (slave3)
Hostname: slave3
Rack: /rack2
Decommission Status : Normal
Configured Capacity: 20481290240 (19.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3131166720 (2.92 GB)
DFS Remaining: 17350098944 (16.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.71%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 17:21:15 CST 2015


Name: 192.168.1.122:50010 (slave2)
Hostname: slave2
Rack: /rack2
Decommission Status : Normal
Configured Capacity: 20481290240 (19.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3131105280 (2.92 GB)
DFS Remaining: 17350160384 (16.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.71%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 17:21:17 CST 2015


Name: 192.168.1.121:50010 (slave1)
Hostname: slave1
Rack: /rack1
Decommission Status : Normal
Configured Capacity: 20481290240 (19.07 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3131174912 (2.92 GB)
DFS Remaining: 17350090752 (16.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.71%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 17:21:17 CST 2015


-----------------------------------------------------------------------------------------------------------------------------------------------


三、问题记录

1. 格式化时出现:KnownHostException: master.example.com

使用hostname查看,显示master.example.com,master是短名称,同时要配置全名称,所以更新/etc/hosts文件

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.120	master	master.example.com
192.168.1.121	slave1	slave1.example.com
192.168.1.122	slave2	slave2.example.com
192.168.1.123	slave3	slave3.example.com

[hadoop@master hadoop-2.6.0] sudo clush -g slaves --copy /etc/hosts

2. 格式化时出现:org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory  is in an inconsistent state: storage directory     does not exist or is not accessible.

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/namesecondary

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/name

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/data

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/tmp

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop chown -R hadoop:hadoop /data


3. [root@master hadoop-2.6.0]# sbin/start-dfs.sh
15/03/17 15:58:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [Java HotSpot(TM) Client VM warning: You have loaded library /opt/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
master]
-c: Unknown cipher type 'cd'
or: ssh: Could not resolve hostname or: Name or service not known
fix: ssh: Could not resolve hostname fix: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
with: ssh: Could not resolve hostname with: Name or service not known
with: ssh: Could not resolve hostname with: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
to: ssh: Could not resolve hostname to: Name or service not known
sed:-e 表达式 #1,字符 6:“s”的未知选项
that: ssh: Could not resolve hostname that: Name or service not known
will: ssh: Could not resolve hostname will: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
recommended: ssh: Could not resolve hostname recommended: Name or service not known
Java: ssh: Could not resolve hostname Java: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
disabled: ssh: Could not resolve hostname disabled: Name or service not known
link: ssh: Could not resolve hostname link: Name or service not known
you: ssh: Could not resolve hostname you: Name or service not known
You: ssh: Could not resolve hostname You: Name or service not known
Client: ssh: Could not resolve hostname Client: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
guard.: ssh: Could not resolve hostname guard.: Name or service not known
the: ssh: Could not resolve hostname the: Name or service not known


在~/.bash_profile中添加
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

4. SecondaryNameNode启动时要求ssh验证,解决:

1) 创建/opt/hadoop-2.6.0/etc/hadoop/masters文件

master

2) 编辑:hdfs-site.xml

<property>
        <name>dfs.http.address</name>
        <value>master:50070</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master:50090</value>
    </property>





你可能感兴趣的:(搭建Hadoop-2.6.0集群)