Hadoop我们从Apache官方网站直接下载最新版本Hadoop2.2。官方目前是提供了linux32位系统可执行文件,下载地址:http://apache.claz.org/hadoop/common/hadoop-2.2.0/,因为操作系统我选择的是CentOS-6.5-x86_64-bin-DVD1.iso所以需要下载hadoop的源码自己编译成64位的hadoop,至于编译过程本人还不太了解,因此这里我选择了在晚上找到的别人编译好的64位hadoop-2.4.1。
1、这里我搭建了一个由三台机器组成的集群:
IP 用户名 密码 角色
172.16.254.222 root 123456 master
172.16.254.228 root 123456 slave
172.16.254.229 root 123456 slave
2、配置三台主机的host映射(三台主机都要执行):
vi /etc/hosts
172.16.254.222 hadoop1
172.16.254.228 hadoop2
172.16.254.229 hadoop3
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop1
重启网卡:service network restart
vi /etc/hosts
172.16.254.222 hadoop1
172.16.254.228 hadoop2
172.16.254.229 hadoop3
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop2
重启网卡:service network restart
vi /etc/hosts
172.16.254.222 hadoop1
172.16.254.228 hadoop2
172.16.254.229 hadoop3
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop3
重启网卡:service network restart
3、关闭防火墙(在三台主机上分别执行)
service iptables stop
chkconfig iptables off
首先到用户主目录下, 在“ls-a”命令显示的文件中,最后一列中间一项是“.ssh”,该文件夹是存放密钥的。如果没有会自动生成。
(1)在hadoop1上执行:ssh-keygen -t rsa
之后一路回车即可
cd .ssh/
ls -l
cp id_rsa.pub authorized_keys
ls -l
ssh hadoop1
exit
(2)在hadoop2上执行:
ssh -keygen -t rsa
之后一路回车即可
cd .ssh/
ls -l
cp id_rsa.pub authorized_keys
ls -l
ssh hadoop2
exit
(3)在hadoop3上执行:
ssh -keygen -t rsa
之后一路回车即可
cd .ssh/
ls -l
cp id_rsa.pub authorized_keys
ls -l
ssh hadoop3
exit
(4)设置彼此之间的免密码登录在hadoop1上执行:ssh-copy-id -i hadoop2
在hadoop3上执行:ssh-copy-id -i hadoop2
在hadoop2上执行:
scp /root/.ssh/authorized_keys hadoop1:/root/.ssh/
scp /root/.ssh/authorized_keys hadoop2:/root/.ssh/
5、安装jdk(在三台主机上都要执行)
(1)卸载操作系统自带的jdk
查看Linux自带的JDK是否已安装:java -version
查看jdk的信息:rpm -qa|grep java
一般将获得如下信息:
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
卸载自带的jdk:
yum -y remove java java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
yum -y remove java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
(2)安装jdk(安装包位于/usr/local目录下)
tar -zxvf jdk-8u40-linux-x64.tar.gz
(3)重命名解压后的文件为jdk
(4)配置环境变量
vi /etc/profile
export JAVA_HOME=/usr/local/jdk
export PATH=.:$PATH:$JAVA_HOME/bin
source /etc/profile
6、将编译好的hadoop文件上传到hadoop1的/usr/local/目录下安装hadoop
(1)重命名上传的文件名为hadoop
(2)进入/usr/local/hadoop/etc/hadoop/目录下编辑如下文件
vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk
vi yarn-env.sh
export JAVA_HOME=/usr/local/jdk
vi slaves
hadoop2
hadoop3
vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>Abase for othertemporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
</configuration>
vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:8088</value>
</property>
</configuration>
scp -r /usr/local/hadoop hadoop2:/usr/local
scp -r /usr/local/hadoop hadoop3:/usr/local
(4)格式化hadoop集群,在hadoop1上
cd /usr/local/hadoop/bin
hadoop namenode -format
(5)启动hadoop集群:
在hadoop1中
cd /usr/local/hadoop/sbin
start-all.sh
7、查看hadoop集群是否正常启动
在hadoop1上使用jps命令查看有如下进程:
namenode secondarynamenode resourcemanager
在hadoop2 hadoop3上使用jps命令查看那有如下进程:
datanode nodemanaget
查看HDFS: http://172.16.254.222:50070
查看RM: http://172.16.254.222:8088
至此整个hadoop集群按照完毕。。。