Hadoop集群搭建

一、环境准备以及基础配置

1、准备环境,centos7虚拟机三台,192.168.2.150,151、152

2、创建hadoop用户

useradd -d /home/hadoop -m hadoop

3、修改hadoop密码

passwd hadoop

4、修改主机名(分别修改三台机器的主机名为master、slave1、slave2)

hostnamectl set-hostname master 

5、配置hosts文件,在每个节点/etc/hosts文件中添加一下内容

192.168.2.150 master
192.168.2.151 slave1
192.168.2.152 slave2

5、配置ssh免密登录,是节点之间两两互通

ssh-keygen -t rsa 

ssh-copy-id uname@hostname

6、安装jdk并且配置环境变量

7、下载hadoop安装文件,解压在工作目录中

cd /home/hadoop/work
tar -zxvf hadoop-2.8.3.tar.gz
mv hadoop-2.8.3 hadoop

8、在工作目录下创建hdfs目录

cd /home/hadoop/work
mkdir hdfs
cd hdfs
mkdir data name tmp

9、添加hadoop环境变量到系统

export HADOOP_HOME=/home/hadoop/work/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

10、添加JAVA_HOME到hadoop

在/home/hadoop/work/hadoop/etc/hadoop/hadoop-env.sh最后添加一下内容

export  JAVA_HOME=/usr/local/jdk

二、配置集群

涉及core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、slaves五个配置文件,对应各个组件的配置。 位于 /home/hadoop/work/hadoop/etc/hadoop/ 目录下,文件说明如下:

文件 说明
core-site.xml Common组件
hdfs-site.xml HDFS组件
mapred-site.xml MapReduce组件
yarn-site.xml YARN组件
slaves slave节点信息

1、core-site.xml








  
     hadoop.tmp.dir
     file:/home/hadoop/work/hdfs/tmp
     A base for other temporary directories.
  
  
     io.file.buffer.size
     131072
  
  
     fs.default.name
     hdfs://master:9000
  
  
     hadoop.proxyuser.root.hosts
     *
  
  
     hadoop.proxyuser.root.groups
     *
  

2、hdfs-site.xml








  
    dfs.replication
    2
  
  
    dfs.namenode.name.dir
    file:/home/hadoop/work/hdfs/name
    true
  
  
    dfs.datanode.data.dir
    file:/home/hadoop/work/hdfs/data
    true
  
  
    dfs.namenode.secondary.http-address
    master:9001
  
  
    dfs.webhdfs.enabled
    true
  
  
    dfs.permissions
    false
  

3、mapred-site.xml

MapReduce组件配置文件默认为、mapred-site.xml.template ,复制问津

cp mapred-site.xml.template  mapred-site.xml







  
    mapreduce.framework.name
    yarn
  

4、yarn-site.xml







  
    yarn.resourcemanager.address
    master:18040
  
  
    yarn.resourcemanager.scheduler.address
    master:18030
  
  
    yarn.resourcemanager.webapp.address
    master:18088
  
  
    yarn.resourcemanager.resource-tracker.address
    master:18025
  
  
    yarn.resourcemanager.admin.address
    master:18141
  
  
    yarn.nodemanager.aux-services.mapreduce.shuffle.class
    org.apache.hadoop.mapred.ShuffleHandler
  

5、编辑slaves文件,添加从节点信息

去掉原本的localhost,换成以下内容。配置slaves的目录,是把所有节点连在一起,构成一个相连的集群,启动时,整个集群一起启动。

slave1
slave2

6、将配置好的hadoop目录拷贝到其他各节点上

cd /home/hadoop/work
scp -r hadoop slave1:/home/hadoop/work/
scp -r hadoop slave2:/home/hadoop/work/
scp -r hdfs slave1:/home/hadoop/work/
scp -r hdfs slave2:/home/hadoop/work/

7、格式化namenode

hadoop namenode -format

8、启动hadoop集群

start-all.sh

日志输出如下:

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
18/09/09 12:26:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/work/hadoop/logs/hadoop-hadoop-namenode-master.out
slave2: starting datanode, logging to /home/hadoop/work/hadoop/logs/hadoop-hadoop-datanode-slave2.out
slave1: starting datanode, logging to /home/hadoop/work/hadoop/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/work/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
18/09/09 12:27:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/work/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave2: starting nodemanager, logging to /home/hadoop/work/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
slave1: starting nodemanager, logging to /home/hadoop/work/hadoop/logs/yarn-hadoop-nodemanager-slave1.out

9、各节点进程情况

master节点:

1664 NameNode
4657 Jps
2021 ResourceManager
1865 SecondaryNameNode
1515 QuorumPeerMain

slave节点:

2145 Jps
1156 DataNode 
1100 QuorumPeerMain
1244 NodeManager

10、访问web页面

访问master WEB UI界面,可以看另外2个节点都正常运行。
http://master:50070/

查看客户端节点: http://master:18088/cluster/nodes

你可能感兴趣的:(hadoop集群)