系统选择:阿里云,Ubuntu18.04,2核4G, 40G磁盘, * 3 ,1M带宽
SSH连接:Xshell,MobaXterm,不建议使用putty
0.修改三台主机的名字为master、slave1、slave2:
hostnamectl set-hostname 主机名
1.因为是云服务器,所以不需要考虑换源问题,依次执行:
apt update
apt -y upgrade
apt -y install iprint
#在shell脚本中调用vi或者vim的时候使用
=<分割线>==
2.首先将JAVA安装包下载下来,可以用apt直接安装到云服务器,
也可以通过到官网下载后用winsCP传上去
apt -y install openjdk-8-jdk
https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
3.配置java环境,需要修改/etc/profile文件
vim /etc/profile
shift+G跳到文件末行,添加java环境变量
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
java -version
<分割线>=
4.下载Hadoop安装包,用wget方法直接下载或官网下载
wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
5.下载之后解压到/usr/local目录下,重命名为hadoop,方便使用
tar -zxvf hadoop-3.2.1.tar.gz -C /usr/local/
cd /usr/local/
mv hadoop-3.2.1 hadoop
6.修改配置文件,添加hadoop的环境变量
vim /etc/profile
添加以下内容:
#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
#export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_PREFIX/usr/local/hadoop/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
复制环境变量
scp -r /etc/profile root@slave1:/etc/
7.使配置文件生效
source /etc/profile
查看是否生效
hadoop version
<分割线>=
8.配置hadoop的HDFS部署架构
结构情况:
hadoop.master:Namedenode,secondarynode,datanode
hadoop.slave1:datanode
vim /etc/hosts
分别在三台主机下添加:
master:
master内网IP master主机名
slave1公网IP slave1主机名
slave2公网IP slave2主机名
slave1:
master公网IP master主机名
slave1内网IP slave1主机名
slave2公网IP slave2主机名
slave2:
master公网IP master
slave1公网IP slave1
slave2内网IP slave2
<分割线>=
9.修改Hadoop配置文件
我们需要对Hadoop进行配置才能正常使用,这些配置文件都在$HADOOP_HOME/etc/hadoop目路下,完整路径为:/usr/local/hadoop/etc/hadoop。
cd /usr/local/hadoop/etc/hadoop
在这个文件夹中,我们主要修改以下文件:
vim hadoop-env.sh
在文末添加:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
vim core-site.xml
fs.default.name
hdfs://master:9000
hadoop.tmp.dir
/home/hadoop3/hadoop/tmp
vim hdfs-site.xml
dfs.replication
2
dfs.namenode.name.dir
/home/hadoop3/hadoop/hdfs/name
dfs.namenode.data.dir
/home/hadoop3/hadoop/hdfs/data
dfs.http.address
master:50070
dfs.permissions.enabled
false
vim mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.application.classpath
/usr/local/hadoop/etc/hadoop,
/usr/local/hadoop/share/hadoop/common/*,
/usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/hdfs/*,
/usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoop/mapreduce/*,
/usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop/share/hadoop/yarn/*,
/usr/local/hadoop/share/hadoop/yarn/lib/*
vim yarn-site.xml
yarn.resourcemanager.hostname
master
yarn.nodemanager.aux-services
mapreduce_shuffle
vim workers
master
slave1
slave2
core-site.xml配置文档:
https://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-common/core-default.xml
hdfs-site.xml配置文档:
https://hadoop.apache.org/docs/r2.9.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
mapred-site.xml配置文档:
https://hadoop.apache.org/docs/r2.9.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
yarn-site.xml配置文档:
https://hadoop.apache.org/docs/r2.9.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
10.分别在集群中各主机上完成配置文件的操作
①复制环境变量
scp -r /etc/profile root@slave1:/etc/
分别在master和slave1中使配置文件生效
source /etc/profile
②复制Hadoop配置文件
scp -r /hadoop安装路径/etc/hadoop/ 用户名:主机名(或ip) /hadoop安装路径/etc/hadoop/
scp -r /usr/local/hadoop/etc/hadoop root@slave1:/usr/local/hadoop/etc/
scp -r /usr/local/hadoop/etc/hadoop root@slave2:/usr/local/hadoop/etc/
========================分割线
分别在两台服务器的主目录下生成密钥,并写入到authorized_keys文件中
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service ssh restart
然后将生成的密钥发送给其他主机
ssh-copy-id -i ~/.ssh/id_rsa.pub master
ssh-copy-id -i ~/.ssh/id_rsa.pub slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
完成之后可以相互ssh登录测试一下,可以发现是不需要再输入密码登录了。
<分割线>=
在启动之前先对dfs初始化
hadoop namenode -formate
start-all.sh
启动之后用jps命令查看已经开启的进程
jps
测试环境:
UI界面验证
http://master:50070
文件执行验证:
cd
mkdir test
cd test
touch words
vim words
hello hadoop
hello word
hello mapreduce
hello ergouzi
lili
haha
haha gege
cd /usr/local/hadoop/share/hadoop/mapreduce
利用自带工具对文件进行验证
hadoop jar hadoop-mapreduce-examples-3.2.1.jar wordcount /test/words /test