Hadoop2.5.2完全分布式搭建

该博客主要帮助实现Hadoop完全分布式环境的搭建:(已经安装好Ubuntu前提下,并且保证虚拟机可以相互ping同还有上网情况下)

事先准备:

jdk-7u51-linux-x64.tar

hadoop-2.5.2


第一大步骤:

1创建root密码:sudo passwd root

2新增一个用户:sudo adduser hadoop

3切换到root

su root 

执行

sudoers增加写权限:chmod u+w /etc/sudoers

编译sudoers文件:nano /etc/sudoers   (也可以vi进行编辑,本人比较懒,懒得下载就直接用nano)   
rootALL=(ALL)  ALL下方增加hadoop ALL=(ALL)NOPASSWD:ALL

去掉sudoers文件的写权限:chmod u-w /etc/sudoers

4第一步:sudo nano/etc/hostname改成自己的用户名

  sudo nano /etc/hosts  127.0.1.1注释掉

  并添加上集群所有的iphostname

本人的集群如下:

ip                                      hostname

192.168.218.130          master

192.168.218.131          slaver1

192.168.218.132          slaver2

(这里建议配置奇数个节点就行,以后设置zookeeper方便点)

重启网络sudo/etc/init.d/networking restart

 所有节点执行以上一样的操作!


退出使用hadoop登录

第二大步骤

 

安装jdkhadoop

安装jdk

sudo tar -xzvf  /home/hadoop/jdk-7u51-linux-x64.tar   /usr/lib/jvm/;



tar -xzvf   /home/hadoop/hadoop-2.5.2.tar.gz


安装ssh(最头痛的);

sudo apt-get install openssh-server

若安装失败

 sudo cp/etc/apt/sources.list  /etc/apt/sources.list.bak//做好备份

sudo gedit /etc/apt/sources.list

 

替换为以下形式:

 

debhttp://ubuntu.uestc.edu.cn/ubuntu/ precise main restricted universe multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-backports main restricteduniverse multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-proposed main restricteduniverse multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-security main restricteduniverse multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-updates main restricted universemultiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise main restricted universemultiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-backports main restricteduniverse multiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-proposed main restricteduniverse multiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-security main restricteduniverse multiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-updates main restricteduniverse multiverse

 

替换完之后执行sudo apt-getupdate


(遇到的问题:我的ubuntu机器上出现下面这个错误。

Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList/var/lib/apt/lists/ftp.sjtu.edu.cn_ubuntu_dists_precise-security_restricted_binary-i386_Packages
E: The package lists or status file could not beparsed or opened.

虽然不知道是怎么回事,但是google出来的结果提示可以按如下方法解决,记录之:
sudo rm /var/lib/apt/lists/* -vf
sudo apt-get update

继续执行 sudo apt-getinstall ssh

 

之后:

ssh-keygen -t rsa

.ssh/home/hadoop/.ssh

 

cat~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

 

 

插曲:

1)修改文件"authorized_keys" hadoop用户下修改)
chmod 600 ~/.ssh/authorized_keys

 2)设置SSH配置 
root用户登录服务器修改SSH配置文件"/etc/ssh/sshd_config"的下列内容。

RSAAuthentication yes #
启用 RSA认证
PubkeyAuthentication yes #
启用公钥私钥配对认证方式
AuthorizedKeysFile .ssh/authorized_keys #
公钥文件路径(和上面生成的文件同)


设置完之后记得重启SSH服务,才能使刚才设置有效。
service sshd restart

 

进入master.ssh目录(hadoop@master)

scp authorized_keys hadoop@slaver1:~/.ssh/authorized_keys_from_master

scp authorized_keyshadoop@slaver2:~/.ssh/authorized_keys_from_master


进入slaver1.ssh目录( hadoop@slaver1)

scp authorized_keys hadoop@master:~/.ssh/authorized_keys_from_slaver1

scp authorized_keyshadoop@slaver2:~/.ssh/authorized_keys_from_slaver1


进入slaver2.ssh目录( hadoop@slaver2)

scp authorized_keys hadoop@master:~/.ssh/authorized_keys_from_slaver2

scp authorized_keyshadoop@slaver1:~/.ssh/authorized_keys_from_slaver2


之后再进入master

在目录/home/hadoop/.ssh

cat authorized_keys_from_slaver1  >>  authorized_keys

cat authorized_keys_from_slaver2  >>  authorized_keys

进入slaver1

在目录/home/hadoop/.ssh

cat authorized_keys_from_master  >>  authorized_keys

cat authorized_keys_from_slaver2  >>  authorized_keys


进入slaver2

在目录/home/hadoop/.ssh

cat authorized_keys_from_master  >>  authorized_keys

cat authorized_keys_from_slaver1 >>  authorized_keys

第三部其实讲白一点就是直接将各个虚拟机对应的公钥,复制到~/.ssh/authorized_keys

都要启动者,执行ssh虚拟机时才能运行

 

第四大步骤:

设置java环境变量(根据自己安装目录自己设置)

hadoop@master:~$sudo nano /etc/profile

再添加:

exportJAVA_HOME=/usr/lib/jvm/jdk1.7.0_51

exportJRE_HOME=/usr/lib/jvm/jdk1.7.0_51/jre

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

exportPATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin


hadoop@master:~$sudo nano /etc/environment

再添加:

exportJAVA_HOME=/usr/lib/jvm/jdk1.7.0_51

exportJRE_HOME=/usr/lib/jvm/jdk1.7.0_51/jre

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

 

scp authorized_keys hadoop@slaver1:~/.ssh/authorized_keys_from_master

scp /usr/lib/jvm/jdk1.7.0_51 hadoop@slaver1:/usr/lib/jvm/jdk1.7.0_51

scp /usr/lib/jvm/jdk1.7.0_51 hadoop@slaver2:/usr/lib/jvm/jdk1.7.0_51


然后在设置相应的环境变量就行了

 

java –version或者javac查看

(sudo ufw disable)关闭防火墙,以后启动集群经常用到的命令

 

第五部分:

 

Hadoop

hadoop@master:~/hadoop-2.5.2$sudo mkdir hdfs

hadoop@master:~/hadoop-2.5.2$sudo mkdir hdfs/name

hadoop@master:~/hadoop-2.5.2$sudo mkdir hdfs/data

hadoop@master:~/hadoop-2.5.2$sudo mkdir tmp

 

修改权限,保证文件都是在hadoop下操作

hadoop@master:~/hadoop-2.5.2$sudo chown -R hadoop:hadoop hdfs

hadoop@master:~/hadoop-2.5.2$sudo chown -R hadoop:hadoop tmp

 

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano hadoop-env.sh

修改里面的JAVA_HOME

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano yarn-env.sh

修改里面的JAVA_HOME

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano slaves

(这个文件里面保存所有slave节点)

 

hadoop@master:~/hadoop-2.5.2$ nano etc/hadoop/core-site.xml

内容如下:

       

               fs.defaultFS

                hdfs://master:8020

       

       

               io.file.buffer.size

               131072

       

       

               hadoop.tmp.dir

                file:/home/hadoop/hadoop-2.5.2/tmp

                Abase forother temporary  directories.

       

       

               hadoop.proxyuser.hadoop.hosts

                *

       

       

               hadoop.proxyuser.hadoop.groups

                *

       

 

编辑mapred-site.xml(需要复制mapred-site.xml.template,并命名为mapred-site.xml

hadoop@master:~/hadoop-2.5.2$ nano etc/hadoop/mapred-site.xml

 

          

                            mapreduce.framework.name                                                   

                yarn

       

       

               mapreduce.jobhistory.address

               master:10020

       

       

               mapreduce.jobhistory.webapp.address

               master:19888

       

 

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano yarn-site.xml

 

yarn.nodemanager.aux-services

    mapreduce_shuffle

                                                                   

 

yarn.nodemanager.aux-services.mapreduce.shuffle.class

     org.apache.hadoop.mapred.ShuffleHandler

     yarn.resourcemanager.address

     master:8032

      yarn.resourcemanager.scheduler.address

      master:8030

      yarn.resourcemanager.resource-tracker.address

      master:8031

      yarn.resourcemanager.admin.address

       master:8033

       yarn.resourcemanager.webapp.address

       master:8088

 

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano hdfs-site.xml

 

内容如下:

 

               dfs.namenode.secondary.http-address

               master:9001

       

       

               dfs.namenode.name.dir

                file:/home/hadoop/hadoop-2.5.2/hdfs/name

       

       

               dfs.datanode.data.dir

               file:/home/hadoop/hadoop-2.5.2/hdfs/data

       

       

                dfs.replication

                3

       

       

               dfs.webhdfs.enabled

                true

       

 

 

 

 scp -r /home/hadoop/hadoop-2.5.2 hadoop@slaver1:/home/hadoop
 scp -r /home/hadoop/hadoop-2.5.2 hadoop@slaver2:/home/hadoop


 

 

hadoop@master:~/hadoop-2.5.2$bin/hdfs namenode –format或者bin/hadoop namenode format

[[email protected]]$ sbin/start-all.sh

通过查找/home/hadoop/hadoop-2.5.2/bin或者/home/hadoop/hadoop-2.5.2/sbin下的文件可以执行各种命令

PS:安装有一段时间了,突然间想写一下博客,然后就匆匆花了两个钟写了,有什么错漏的敬请纠正!基本上我是按照这个配置的,集群正常启动!

 

你可能感兴趣的:(hadoop,Hadoop,完全分布环境搭建,ubuntu)