三台云服务器搭建Hadoop完全分布式环境

三台云服务器搭建Hadoop完全分布式环境

系统选择:阿里云,Ubuntu18.04,2核4G, 40G磁盘, * 3 ,1M带宽

SSH连接:Xshell,MobaXterm,不建议使用putty

前期准备

0.修改三台主机的名字为master、slave1、slave2:

hostnamectl set-hostname 主机名

1.因为是云服务器,所以不需要考虑换源问题,依次执行:

apt update
apt -y upgrade 

apt -y install iprint 
#在shell脚本中调用vi或者vim的时候使用

=<分割线>==

JAVA环境

2.首先将JAVA安装包下载下来,可以用apt直接安装到云服务器,
也可以通过到官网下载后用winsCP传上去

apt -y install openjdk-8-jdk

https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

3.配置java环境,需要修改/etc/profile文件

vim /etc/profile

shift+G跳到文件末行,添加java环境变量

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
java -version

<分割线>=

Hadoop环境

4.下载Hadoop安装包,用wget方法直接下载或官网下载

wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

5.下载之后解压到/usr/local目录下,重命名为hadoop,方便使用

tar -zxvf hadoop-3.2.1.tar.gz -C /usr/local/
cd /usr/local/
mv hadoop-3.2.1 hadoop

6.修改配置文件,添加hadoop的环境变量

vim /etc/profile

添加以下内容:

#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME 
export HADOOP_INSTALL=$HADOOP_HOME 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_CONF_DIR=$HADOOP_HOME 
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec 
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
#export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_PREFIX/usr/local/hadoop/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

复制环境变量

scp -r /etc/profile root@slave1:/etc/

7.使配置文件生效

source /etc/profile

查看是否生效

hadoop version

<分割线>=

配置Hadoop

8.配置hadoop的HDFS部署架构

结构情况:

hadoop.master:Namedenode,secondarynode,datanode

hadoop.slave1:datanode

vim /etc/hosts

分别在三台主机下添加:

master:

master内网IP 	 master主机名

slave1公网IP	   slave1主机名

slave2公网IP	   slave2主机名

slave1:

master公网IP 	 master主机名

slave1内网IP	   slave1主机名

slave2公网IP	   slave2主机名

slave2:

master公网IP  master

slave1公网IP  slave1

slave2内网IP  slave2

<分割线>=

9.修改Hadoop配置文件

我们需要对Hadoop进行配置才能正常使用,这些配置文件都在$HADOOP_HOME/etc/hadoop目路下,完整路径为:/usr/local/hadoop/etc/hadoop。

cd /usr/local/hadoop/etc/hadoop

在这个文件夹中,我们主要修改以下文件:

vim hadoop-env.sh

在文末添加:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

vim core-site.xml



 
     fs.default.name
     hdfs://master:9000
 
 
 
     hadoop.tmp.dir
     /home/hadoop3/hadoop/tmp
 
 

vim hdfs-site.xml


 


     dfs.replication
     2
 
 


     dfs.namenode.name.dir
     /home/hadoop3/hadoop/hdfs/name
 
 
     dfs.namenode.data.dir
     /home/hadoop3/hadoop/hdfs/data
 
 

  dfs.http.address
  master:50070


    
        dfs.permissions.enabled
        false
    
 

vim mapred-site.xml


 


     mapreduce.framework.name
     yarn
 
 


     mapreduce.application.classpath
     
         /usr/local/hadoop/etc/hadoop,
         /usr/local/hadoop/share/hadoop/common/*,
         /usr/local/hadoop/share/hadoop/common/lib/*,
         /usr/local/hadoop/share/hadoop/hdfs/*,
         /usr/local/hadoop/share/hadoop/hdfs/lib/*,
         /usr/local/hadoop/share/hadoop/mapreduce/*,
         /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
         /usr/local/hadoop/share/hadoop/yarn/*,
         /usr/local/hadoop/share/hadoop/yarn/lib/*
     
 
 

vim yarn-site.xml


 
     yarn.resourcemanager.hostname
     master
 
 


     yarn.nodemanager.aux-services
     mapreduce_shuffle
 
 

vim workers

master

slave1

slave2

core-site.xml配置文档:

https://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-common/core-default.xml

hdfs-site.xml配置文档:

https://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

mapred-site.xml配置文档:

https://hadoop.apache.org/docs/r2.9.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

yarn-site.xml配置文档:
https://hadoop.apache.org/docs/r2.9.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

10.分别在集群中各主机上完成配置文件的操作

①复制环境变量

scp -r /etc/profile root@slave1:/etc/

分别在master和slave1中使配置文件生效

source /etc/profile

②复制Hadoop配置文件

scp -r /hadoop安装路径/etc/hadoop/ 用户名:主机名(或ip) /hadoop安装路径/etc/hadoop/

scp -r /usr/local/hadoop/etc/hadoop root@slave1:/usr/local/hadoop/etc/

scp -r /usr/local/hadoop/etc/hadoop root@slave2:/usr/local/hadoop/etc/

========================分割线

配置免密登录

分别在两台服务器的主目录下生成密钥,并写入到authorized_keys文件中

ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service ssh restart

然后将生成的密钥发送给其他主机

ssh-copy-id -i ~/.ssh/id_rsa.pub  master

ssh-copy-id -i ~/.ssh/id_rsa.pub  slave1

ssh-copy-id -i ~/.ssh/id_rsa.pub  slave2

完成之后可以相互ssh登录测试一下,可以发现是不需要再输入密码登录了。

<分割线>=

启动Hadoop

在启动之前先对dfs初始化

hadoop namenode -formate
start-all.sh

启动之后用jps命令查看已经开启的进程

jps

测试环境:

UI界面验证

http://master:50070

文件执行验证:

cd

mkdir test

cd test

touch words

vim words
hello hadoop
hello word
hello mapreduce
hello ergouzi
lili
haha
haha gege
cd /usr/local/hadoop/share/hadoop/mapreduce

利用自带工具对文件进行验证

hadoop jar hadoop-mapreduce-examples-3.2.1.jar wordcount /test/words /test

你可能感兴趣的:(linux,笔记,网络操作系统,hadoop,分布式,大数据,linux)