环境准备:
Centos7 64位
JDK1.8
hadoop-2.7.7.tar.gz (这里选择2.7.7 是为了后面与hbase-2.0.0匹配,版本匹配问题可查阅hbase官网)
服务器规划
192.168.222.3 hadoop-namenode # 该节点只运行namenode服务
192.168.222.4 hadoop-yarn # 该节点运行resourcemanager服务
192.168.222.5 hadoop-datanode1 # 数据节点
192.168.222.6 hadoop-datanode2 # 数据节点
192.168.222.7 hadoop-datanode3 # 数据节点
我们以 hadoop-namenode 为例讲解安装,(安装时配置一样,只是在启动时选择启动不同项即可)
为了测试方便,会关闭所有服务器的防火墙,在所有服务器上执行关闭防火墙
systemctl stop firewalld.service # 停止firewall
systemctl disable firewalld.service # 禁止firewall开机启动
firewall-cmd --state # 查看默认防火墙装状态(关闭后显示notrunning, 开启显示running)
关闭所有服务器的SLNEX
vim /etc/selinux/config
SELINUX=disabled
修改hostname
vim /etc/hostname
hadoop-namenode
配置hosts
192.168.222.3 hadoop-namenode
192.168.222.4 hadoop-yarn
192.168.222.5 hadoop-datanode1
192.168.222.6 hadoop-datanode2
192.168.222.7 hadoop-datanode3
SSH免密码登录(同理将其它节点的公钥追加进来,即:每个节点都拥有其它机器的公钥)
ssh-keygen -t rsa # 一路回车即可,在~/.ssh 目录下回生成id_rsa.pub 文件,将该文件追加到authorized_keys
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
安装JDK8,注意hadoop-3.x版本要使用jdk1.8,否则会版本不兼容的错误信息
[参考博文](https://blog.csdn.net/hwm_life/article/details/81699882)
解压hadoop到指定目录
tar -zxvf hadoop-3.x.tar.gz -C /usr/local # -C 参数指定解压目录
配置hadoop环境变量
vim ~/.bash_profile
export HADOOP_HOME=/usr/local/hadoop-2.7.7
export PATH=$PATH:$HADOOP_HOME/bin
soruce ~/.bash_profile # 让配置立即生效,否则要重启系统才生效
配置hadoop-env.sh、mapred-env.sh、yarn-env.sh,在这三个文件中添加JAVA_HOME路径,如下
export JAVA_HOME=/usr/local/jdk8
修改core-site.xml
fs.defaultFS
hdfs://hadoop-namenode:9000
namenode的地址
dfs.namenode.name.dir
file:///usr/local/hadoop-2.7.7/dfs/tmp
namenode存放数据的目录
io.file.buffer.size
131072
修改hdfs-site.xml
dfs.namenode.http-address
hadoop-namenode:50070
dfs.namenode.secondary.http-address
hadoop-namenode:50090
dfs.replication
1
文件副本数,一般指定多个,测试指定一个
dfs.namenode.name.dir
file:///usr/local/hadoop-2.7.7/dfs/name
dfs.datanode.data.dir
file:///usr/local/hadoop-2.7.7/dfs/data
dfs.permissions
false
dfs.blocksize
16m
修改mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop-yarn:10020
mapreduce.jobhistory.webapp.address
hadoop-yarn:19888
mapreduce.application.classpath
/usr/local/hadoop-2.7.7/etc/hadoop,
/usr/local/hadoop-2.7.7/share/hadoop/common/*,
/usr/local/hadoop-2.7.7/share/hadoop/common/lib/*,
/usr/local/hadoop-2.7.7/share/hadoop/hdfs/*,
/usr/local/hadoop-2.7.7/share/hadoop/hdfs/lib/*,
/usr/local/hadoop-2.7.7/share/hadoop/mapreduce/*,
/usr/local/hadoop-2.7.7/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop-2.7.7/share/hadoop/yarn/*,
/usr/local/hadoop-2.7.7/share/hadoop/yarn/lib/*
修改yarn-site.xml
yarn.resourcemanager.hostname
hadoop-yarn
yarn.resourcemanager.address
hadoop-yarn:8032
yarn.resourcemanager.resource-tracker.address
hadoop-yarn:8031
yarn.resourcemanager.scheduler.address
hadoop-yarn:8030
yarn.resourcemanager.admin.address
hadoop-yarn:8033
yarn.resourcemanager.webapp.address
hadoop-yarn:8088
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.vmem-check-enabled
flase
yarn.nodemanager.vmem-pmem-ratio
6
每个任务使用的虚拟内存占物理内存的百分比
通过scp命令将上述修改的文件复制到其它服务器,复制已hadoop-yarn为例说明:
scp -r hadoop-2.7.7 hadoop-yarn:/usr/local
在hadoop-namenode上进行NameNode的格式化
cd /usr/local/hadoop-2.7.7
./bin/hdfs namenode -format
在hadoop-namenode上启动 namenode
./bin/hdfs --daemon start namenode
在hadoop-yarn上启动resourcemanaer,nodemanager
./bin/yarn --daemon start resourcemanager
./bin/yarn --daemon start nodemanager
在hadoop-datanode1,hadoop-datanode2,hadoop-datanode3上启动datanode,nodemanager
./bin/hdfs --daemon start datanode
./bin/yarn --daemon start nodemanager
通过jps命令可以查看启动的进程
通过自带例子测试hadoop集群安装的正确性
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar pi 1 2
hadoop-namenode:50070
hadoop-yarn:8088