标签:Hadoop 搭建分布式集群环境 MapReduce YARN HDFS
分布式环境搭建之环境介绍
之前我们已经介绍了如何在单机上搭建伪分布式的Hadoop环境,而在实际情况中,肯定都是多机器多节点的分布式集群环境,所以本文将简单介绍一下如何在多台机器上搭建Hadoop的分布式环境。
我这里准备了三台机器,IP地址如下:
- 192.168.77.128
- 192.168.77.130
- 192.168.77.134
首先在这三台机器上编辑/etc/hosts
配置文件,修改主机名以及配置其他机器的主机名
[root@localhost ~]# vim /etc/hosts # 三台机器都需要操作
192.168.77.128 hadoop000
192.168.77.130 hadoop001
192.168.77.134 hadoop002
[root@localhost ~]# reboot
三台机器在集群中所担任的角色:
- hadoop000作为NameNode、DataNode、ResourceManager、NodeManager
- hadoop001作为DataNode、NodeManager
- hadoop002也是作为DataNode、NodeManager
配置ssh免密码登录
集群之间的机器需要相互通信,所以我们得先配置免密码登录。在三台机器上分别运行如下命令,生成密钥对:
[root@hadoop000 ~]# ssh-keygen -t rsa # 三台机器都需要执行这个命令生成密钥对
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
0d:00:bd:a3:69:b7:03:d5:89:dc:a8:a2:ca:28:d6:06 root@hadoop000
The key‘s randomart image is:
+--[ RSA 2048]----+
| .o. |
| .. |
| . *.. |
| B +o |
| = .S . |
| E. * . |
| .oo o . |
|=. o o |
|*.. . |
+-----------------+
[root@hadoop000 ~]# ls .ssh/
authorized_keys id_rsa id_rsa.pub known_hosts
[root@hadoop000 ~]#
以hadoop000为主,执行以下命令,分别把公钥拷贝到其他机器上:
[root@hadoop000 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop000
[root@hadoop000 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001
[root@hadoop000 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002
注:其他两台机器也需要执行以上这三条命令。
拷贝完成之后,测试能否正常进行免密登录:
[root@hadoop000 ~]# ssh hadoop000
Last login: Mon Apr 2 17:20:02 2018 from localhost
[root@hadoop000 ~]# ssh hadoop001
Last login: Tue Apr 3 00:49:59 2018 from 192.168.77.1
[root@hadoop001 ~]# 登出
Connection to hadoop001 closed.
[root@hadoop000 ~]# ssh hadoop002
Last login: Tue Apr 3 00:50:03 2018 from 192.168.77.1
[root@hadoop002 ~]# 登出
Connection to hadoop002 closed.
[root@hadoop000 ~]# 登出
Connection to hadoop000 closed.
[root@hadoop000 ~]#
如上,hadoop000机器已经能够正常免密登录其他两台机器,那么我们的配置就成功了。
安装JDK
到Oracle官网拿到JDK的下载链接,我这里用的是JDK1.8,地址如下:
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
使用wget命令将JDK下载到/usr/local/src/
目录下,我这里已经下载好了:
[root@hadoop000 ~]# cd /usr/local/src/
[root@hadoop000 /usr/local/src]# ls
jdk-8u151-linux-x64.tar.gz
[root@hadoop000 /usr/local/src]#
解压下载的压缩包,并将解压后的目录移动到/usr/local/
目录下:
[root@hadoop000 /usr/local/src]# tar -zxvf jdk-8u151-linux-x64.tar.gz
[root@hadoop000 /usr/local/src]# mv ./jdk1.8.0_151 /usr/local/jdk1.8
编辑/etc/profile
文件配置环境变量:
[root@hadoop000 ~]# vim /etc/profile # 增加如下内容
JAVA_HOME=/usr/local/jdk1.8/
JAVA_BIN=/usr/local/jdk1.8/bin
JRE_HOME=/usr/local/jdk1.8/jre
PATH=$PATH:/usr/local/jdk1.8/bin:/usr/local/jdk1.8/jre/bin
CLASSPATH=/usr/local/jdk1.8/jre/lib:/usr/local/jdk1.8/lib:/usr/local/jdk1.8/jre/lib/charsets.jar
export PATH=$PATH:/usr/local/mysql/bin/
使用source
命令加载配置文件,让其生效,生效后执行java -version
命令即可看到JDK的版本:
[root@hadoop000 ~]# source /etc/profile
[root@hadoop000 ~]# java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
[root@hadoop000 ~]#
在hadoop000上安装完JDK后,通过rsync命令,将JDK以及配置文件都同步到其他机器上:
[root@hadoop000 ~]# rsync -av /usr/local/jdk1.8 hadoop001:/usr/local
[root@hadoop000 ~]# rsync -av /usr/local/jdk1.8 hadoop002:/usr/local
[root@hadoop000 ~]# rsync -av /etc/profile hadoop001:/etc/profile
[root@hadoop000 ~]# rsync -av /etc/profile hadoop002:/etc/profile
同步完成后,分别在两台机器上source配置文件,让环境变量生效,生效后再执行java -version
命令测试JDK是否已安装成功。
Hadoop配置及分发
下载Hadoop 2.6.0-cdh5.7.0的tar.gz包并解压:
[root@hadoop000 ~]# cd /usr/local/src/
[root@hadoop000 /usr/local/src]# wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz
[root@hadoop000 /usr/local/src]# tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C /usr/local/
注:如果在Linux上下载得很慢的话,可以在windows的迅雷上使用这个链接进行下载。然后再上传到Linux中,这样就会快一些。
解压完后,进入到解压后的目录下,可以看到hadoop的目录结构如下:
[root@hadoop000 /usr/local/src]# cd /usr/local/hadoop-2.6.0-cdh5.7.0/
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0]# ls
bin cloudera examples include libexec NOTICE.txt sbin src
bin-mapreduce1 etc examples-mapreduce1 lib LICENSE.txt README.txt share
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0]#
简单说明一下其中几个目录存放的东西:
- bin目录存放可执行文件
- etc目录存放配置文件
- sbin目录下存放服务的启动命令
- share目录下存放jar包与文档
以上就算是把hadoop给安装好了,接下来就是编辑配置文件,把JAVA_HOME配置一下:
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0]# cd etc/
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc]# cd hadoop
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8/ # 根据你的环境变量进行修改
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]#
然后将Hadoop的安装目录配置到环境变量中,方便之后使用它的命令:
[root@hadoop000 ~]# vim ~/.bash_profile # 增加以下内容
export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh5.7.0/
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
[root@localhost ~]# source !$
source ~/.bash_profile
[root@localhost ~]#
接着分别编辑core-site.xml
以及hdfs-site.xml
配置文件:
[root@hadoop000 ~]# cd $HADOOP_HOME
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0]# cd etc/hadoop
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# vim core-site.xml # 增加如下内容
fs.default.name
hdfs://hadoop000:8020 # 指定默认的访问地址以及端口号
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# vim hdfs-site.xml # 增加如下内容
dfs.namenode.name.dir
/data/hadoop/app/tmp/dfs/name # namenode临时文件所存放的目录
dfs.datanode.data.dir
/data/hadoop/app/tmp/dfs/data # datanode临时文件所存放的目录
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# mkdir -p /data/hadoop/app/tmp/dfs/name
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# mkdir -p /data/hadoop/app/tmp/dfs/data
接下来还需要编辑yarn-site.xml
配置文件:
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# vim yarn-site.xml # 增加如下内容
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
hadoop000
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]#
拷贝并编辑MapReduce的配置文件:
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# vim !$ # 增加如下内容
mapreduce.framework.name
yarn
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]#
最后是配置从节点的主机名,如果没有配置主机名的情况下就使用IP:
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]# vim slaves
hadoop000
hadoop001
hadoop002
[root@hadoop000 /usr/local/hadoop-2.6.0-cdh5.7.0/etc/hadoop]#
到此为止,我们就已经在hadoop000上搭建好了我们主节点(master)的Hadoop集群环境,但是还有其他两台作为从节点(slave)的机器没配置Hadoop环境,所以接下来需要把hadoop000上的Hadoop安装目录以及环境变量配置文件分发到其他两台机器上,分别执行如下命令:
[root@hadoop000 ~]# rsync -av /usr/local/hadoop-2.6.0-cdh5.7.0/ hadoop001:/usr/local/hadoop-2.6.0-cdh5.7.0/
[root@hadoop000 ~]# rsync -av /usr/local/hadoop-2.6.0-cdh5.7.0/ hadoop002:/usr/local/hadoop-2.6.0-cdh5.7.0/
[root@hadoop000 ~]# rsync -av ~/.bash_profile hadoop001:~/.bash_profile
[root@hadoop000 ~]# rsync -av ~/.bash_profile hadoop002:~/.bash_profile
分发完成之后到两台机器上分别执行source命令以及创建临时目录:
[root@hadoop001 ~]# source .bash_profile
[root@hadoop001 ~]# mkdir -p /data/hadoop/app/tmp/dfs/name
[root@hadoop001 ~]# mkdir -p /data/hadoop/app/tmp/dfs/data
[root@hadoop002 ~]# source .bash_profile
[root@hadoop002 ~]# mkdir -p /data/hadoop/app/tmp/dfs/name
[root@hadoop002 ~]# mkdir -p /data/hadoop/app/tmp/dfs/data
Hadoop格式化及启停
对NameNode做格式化,只需要在hadoop000上执行即可:
[root@hadoop000 ~]# hdfs namenode -format
格式化完成之后,就可以启动Hadoop集群了:
[root@hadoop000 ~]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
18/04/02 20:10:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop000]
hadoop000: starting namenode, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-namenode-hadoop000.out
hadoop000: starting datanode, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-datanode-hadoop000.out
hadoop001: starting datanode, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-datanode-hadoop001.out
hadoop002: starting datanode, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-datanode-hadoop002.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)‘ can‘t be established.
ECDSA key fingerprint is 4d:5a:9d:31:65:75:30:47:a3:9c:f5:56:63:c4:0f:6a.
Are you sure you want to continue connecting (yes/no)? yes # 输入yes即可
0.0.0.0: Warning: Permanently added ‘0.0.0.0‘ (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/hadoop-root-secondarynamenode-hadoop000.out
18/04/02 20:11:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/yarn-root-resourcemanager-hadoop000.out
hadoop001: starting nodemanager, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/yarn-root-nodemanager-hadoop001.out
hadoop002: starting nodemanager, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/yarn-root-nodemanager-hadoop002.out
hadoop000: starting nodemanager, logging to /usr/local/hadoop-2.6.0-cdh5.7.0/logs/yarn-root-nodemanager-hadoop000.out
[root@hadoop000 ~]# jps # 查看是否有以下几个进程
6256 Jps
5538 DataNode
5843 ResourceManager
5413 NameNode
5702 SecondaryNameNode
5945 NodeManager
[root@hadoop000 ~]#
到另外两台机器上检查进程:
hadoop001:
[root@hadoop001 ~]# jps
3425 DataNode
3538 NodeManager
3833 Jps
[root@hadoop001 ~]#
hadoop002:
[root@hadoop002 ~]# jps
3171 DataNode
3273 NodeManager
3405 Jps
[root@hadoop002 ~]#
各机器的进程检查完成,并且确定没有问题后,在浏览器上访问主节点的50070端口,例如:192.168.77.128:50070
。