Hadoop集群配置

Hadoop集群配置


使用虚拟机配置hadoop集群
顺序;
一、先在一台机上配置好虚拟机,然后拷贝出其他两台。
1、安装CentOS7
2、修改yum源(清华源)
3、# yum update -y
安装jps工具
# yum install -y java-1.8.0-openjdk-devel
4、默认使用root用户操作
# vim ~/.bashrc
#在末尾添加以下内容
#HADOOP START
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP END
5、通过systemctl命令可以将sshd服务加到开机自启动列表里。实现开机自动启动sshd服务。
# systemctl enable sshd


6、设置开机启动第三运行级别
# systemctl set-default multi-user.target
7、关机,拷贝虚拟机两份。总共有三个虚拟机。


二、先在一台机上配置hadoop,之后使用scp拷贝到其他两台机。
1、记下其他两台机子的ip地址
2、修改hosts
# vim /etc/hosts
127.0.0.1 localhost
192.168.182.134 master
192.168.182.135 slaves1
192.168.182.136 slaves2
PS:master是本机


3、检查连接是否成功
# ping -w 3 slaves1
# ping -w 3 slaves2


4、配置ssh免密登录
(1)生成密钥(在每台虚拟机上执行)
# ssh-keygen -t rsa -P ""
ssh-keygen是工具,-t指定密钥类型rsa,-P指定密码
(2)添加可信密钥
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys


# ssh slaves1 cat /root/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# ssh slaves2 cat /root/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(3)复制authorized_keys到另外两台
# scp ~/.ssh/authorized_keys slaves1:~/.ssh/
# scp ~/.ssh/authorized_keys slaves2:~/.ssh/
(4)验证ssh的免密登录:
# ssh slaves1
# ssh slaves2


5、下载hadoop2.6.0.tar.gz,并解压
# wget labfile.oss.aliyuncs.com/hadoop-2.6.0.tar.gz


# tar xf hadoop-2.6.0.tar.gz
# mv hadoop-2.6.0/ /usr/local/hadoop


6、修改hadoop配置文件
(1)hadoop配置文件目录 /usr/local/hadoop/etc/hadoop/
# cd /usr/local/hadoop/etc/hadoop/


(2)hadoop-env.sh
# vim hadoop-env.sh +25
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre

(3) core-site.xml
# vim core-site.xml

       
                fs.default.name
                hdfs://master:9000
       

       
                hadoop.tmp.dir
                /hadoop/datanode
       




(4)hdfs-site.xml
# vim hdfs-site.xml


        dfs.namenode.secondary.http-address
        master:50090
   


        dfs.http.address
        master:50070
       
        The address and the base port where the dfs namenode web ui will listen on.
        If the port is 0 then the server will start on a free port.
       


   
        dfs.replication
        2
   

   
        dfs.namenode.name.dir
        /hadoop/namenode
   

   
        dfs.datanode.data.dir
        /hadoop/datanode
   




(5)mapred-site.xml
# vim mapred-site.xml

       
                mapreduce.framework.name
                yarn
       




(6)yarn-site.xml
# vim yarn-site.xml


        yarn.resourcemanager.hostname
        master
   
 
   
        yarn.nodemanager.aux-services
        mapreduce_shuffle
   




7、创建目录/hadoop
# mkdir /hadoop


8、利用scp拷贝到其他两台机
# scp /usr/loca/hadoop -r slaves1:/usr/local/
# scp /usr/loca/hadoop -r slaves2:/usr/local/


三、格式化后启动
1、格式化
# hdfs dfs namenode -format


1.5、清除防火墙
# iptables -F


2、启动dfs系统
# start-dfs.sh
启动yarn
# start-yarn.sh


3、使用jps查看是否启动
这里建议查看清楚,经常有些会启动失败,只要多启动几次就好
# hadoop-daemon start namenode
# hadoop-daemon start datanode


# yarn-daemon start resourcemanager
# yarn-daemon start nodemanager


至此,hadoop的分布式已经成功部署


四、测试
1、创建目录
# hdfs dfs -mkdir /user
# hdfs dfs -mkdir /user/root
# hdfs dfs -mkdir /user/root/input


2、将文件复制到hdfs上
# hdfs dfs -copyFromLocal /etc/protocols /user/root/input
# hdfs dfs -ls -R /user/root/input


# 这里貌似不能用put命令,用了好多次一直出错, 以为出问题了


3、执行WordCount测试
# cd /usr/local/hadoop


# bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.0-sources.jar org.apache.hadoop.examples.WordCount input output


4、查看结果
# hdfs dfs -cat /user/root/output/*

你可能感兴趣的:(Hadoop)