Ubuntu+Hadoop 2.2.0+java1.7配置详解(单机+伪分布+集群)

一 需要准备的软件

1.Ubuntu 14.04

     三个主机

     192.168.71.136  cloud01

     192.168.71.135  cloud02

      192.168.71.137  cloud03

2.jdk-7u51-linux-i586.tar.gz

3.hadoop-2.2.0.tar.gz

百度云盘链接:pan.baidu.com/s/1pKADKNL

二 操作步骤

单机搭建

1.修改主机名分别修改三个主机名为cloud01 cloud02 cloud03

Sudo gedit /etc/hostnname(重启)

2 在hosts中添加地址内容

192.168.71.134 cloud01

192.168.71.129 cloud02

192.168.71.130 cloud03

Sudo gedit /etc/hosts

3  安装java(分别安装)

新建文件夹并八java的压缩包拷贝到该目录下

Sudo mkdir/usr/java

解压

Sudo tar –zxvf文件名

修改配置文件

Sudo gedit/etc/profile

添加如下内容:

export JAVA_HOME=/usr/java/jdk1.7.0_51

exportCLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar

exportPATH=$JAVA_HOME/bin:$PATH

执行命令

Source /etc/profile

查看是否安装成功

Java –version

4.安装hadoop

把文件拷贝到家目录解压

Sudo tar –zxvf文件名

解压之后chmod –R 777得到的文件名 赋予执行权限

这一步为止表示单机安装完毕 验证一下

在该目录下执行

./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 20

伪分布搭建(接上)

5.安装ssh

执行命令sudo apt-get install ssh

在家目录新建文件.sshsudo mkdir .ssh

进入该文件夹cd .ssh

ssh-keygen –t rsa (一路enter)

cat id_rsa.pub >> authorized_keys

sudo service ssh restart

测试一下ssh cloud00(效果是不需要输入密码)

6配置hadoop环境变量

首先在家目录创建几个文件夹

~/hddata/dfs/name

~/hddata/dfs/data

~/hddata/tmp

然后在hadoop 2.2.0文件夹下 修改三个配置文件

gedit etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_51

gedit etc/hadoop/core-site.xml

fs.default.name

hdfs://localhost:9000

hadoop.tmp.dir

/home/hduser/hddata/tmp

gedit etc/hadoop/hdfs-site.xml

dfs.namenode.name.dir

/home/hduser/hddata/dfs/name

dfs.datanode.data.dir

/home/hduser/hddata/dfs/data

dfs.replication

1

cp etc/hadoop/mapred-site.xml.templateetc/hadoop/mapred-site.xml

gedit etc/hadoop/mapred-site.xml

mapred.job.tracker

localhost:54311

mapred.map.tasks

10

mapred.reduce.tasks

2

格式硬盘

./bin/hdfs namenode –format

启动所有的程序

./sbin/start-all.sh

查看启动程序

Jps

3776 ResourceManager

3354 NameNode

3645 SecondaryNameNode

3467 DataNode

3895 NodeManager

4382 Jps

测试在浏览器中输入loalhost:50070

这个时候 伪分布搭建已经完成

集群搭建

1.解压集群配置文件,在虚拟机中打开三个机器

2.修改每台机器的固定IP地址,注意查看网关和DNS

3.修改每台机器的hosts

sudo gedit /etc/hosts

192.168.71..136 cloud01

192.168.71.135 cloud02

192.168.71.137 cloud03

注:请将原文件最上面的第二行127.0.1.1删除掉,每台机器都要做

4.每台机器配公私钥

sudo apt-get install ssh

mkdir .ssh

cd .ssh

ssh-keygen -t rsa

cat id_rsa.pub>>authorized_keys

sudo service ssh restart

ssh localhost

如果存在.ssh文件夹,则应先删除.ssh(rm-rf .ssh)

5.发送主机的公钥,并加入到每台机器的授权文件中

cd .ssh

scp authorized_keyshduser@cloud02:~/.ssh/authorized_keys_from_cloud01

分别进入cloud02和cloud03,执行以下命令

cd .ssh

catauthorized_keys_from_cloud01>>authorized_keys

6.在每台机器上安装jdk

7.在主机上安装hadoop-2.2.0(tar-zxvf hadoop-2.2.0.tar.gz)

8.在每台机器的主文件夹下新建以下三个文件夹

~/hddata/dfs/name

~/hddata/dfs/data

~/hdata/tmp

scp -r ~/hddata hduser@cloud02:~/

scp -r ~/hddata hduser@cloud03:~/

9.在主机上修改7个配置文件

cd hadoop-2.2.0

(1)geditetc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_51

(2)geditetc/hadoop/yarn-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_51

(3)geditetc/hadoop/slaves

cloud01

cloud02

cloud03

(4)geditetc/hadoop/core-site.xml

fs.defaultFS

hdfs://cloud01:9000

io.file.buffer.size

131072

hadoop.tmp.dir

/home/hduser/hddata/tmp

(5)geditetc/hadoop/hdfs-site.xml

dfs.namenode.secondary.http-address

cloud01:9001

dfs.namenode.name.dir

/home/hduser/hddata/dfs/name

dfs.datanode.data.dir

/home/hduser/hddata/dfs/data

dfs.replication

2

dfs.webhdfs.enabled

true

(6)cpmapred-site.xml.template mapred-site.xml

gedit etc/hadoop/mapred-site.xml

mapreduce.framework.name

yarn

mapreduce.jobhistory.address

cloud01:10020

mapreduce.jobhistory.webapp.address

cloud01:19888

(7)geditetc/hadoop/yarn-site.xml

yarn.nodemanager.aux-services

mapreduce_shuffle

yarn.nodemanager.aux-services.mapreduce.shuffle.class

org.apache.hadoop.mapred.ShuffleHandler

yarn.resourcemanager.address

cloud01:8132

yarn.resourcemanager.scheduler.address

cloud01:8130

yarn.resourcemanager.resource-tracker.address

cloud01:8131

yarn.resourcemanager.admin.address

cloud01:8133

yarn.resourcemanager.webapp.address

cloud01:8188

9.将主机上的hadoop-2.2.0的文件夹发送给另两台机器

scp -r hadoop-2.2.0 hduser@cloud02:~/

scp -r hadoop-2.2.0 hduser@cloud03:~/

10.格式化namenode

cd hadoop-2.2.0

./bin/hdfs namenode -format

11.启动hadoop

./sbin/start-all.sh

查看文件块组成

./bin/hdfs fsck / -files -blocks

./bin/hdfs dfsadmin -report

http://192.168.71.136:50070

http://192.168.71.136:8188

./sbin/mr-jobhistory-daemon.sh start historyserver

12.运行pi

./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 20

你可能感兴趣的:(Ubuntu+Hadoop 2.2.0+java1.7配置详解(单机+伪分布+集群))