基于hadoop 3.1.4
1、hadoop-3.1.4编译完成的包
链接: https://pan.baidu.com/s/1tKLDTRcwSnAptjhKZiwAKg 提取码: ekvc
2、需要jdk环境
链接: https://pan.baidu.com/s/18JtAWbVcamd2J_oIeSVzKw 提取码: bmny
3、vmware安装包
链接: https://pan.baidu.com/s/1YxDntBWSCEnN9mTYlH0FUA 提取码: uhsj
4、vmware许可证
链接: https://pan.baidu.com/s/10CsLc-nJXnH5V9IMP-KZeg 提取码: r5y5
5、linux下载
镜像下载地址
1、安装虚拟机
自行搜索!!!
2、配置静态ip
cd /etc/sysconfig/network-scripts/
###静态ip配置
IPADDR=192.168.109.103 ##我们需要指定的ip
NETMASK=255.255.255.0
GATEWAY=192.168.109.2
DNS1=8.8.8.8
3、linux安装jdk
卸载linux自带openjdk
rpm -qa|grep jdk
.noarch后缀的不要删除
rpm -e --nodeps XXX
tar -zxvf xxx.jar
vim /etc/profile
export JAVA_HOME=/usr/local/jdk1.8.0_361
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile
4、关闭防火墙
systemctl stop firewalld.service
systemctl diabled firewalld.service
规划
hadoop0 namenode datanode resourcemanager nodemanager
hadoop1 secondarynamenode datanode nodemanager
hadoop2 datanode nodemanager
5、配置hostname以及hosts文件
hostnamectl set-hostname hadoop0
vim /etc/hosts
192.168.109.101 hadoop0
192.168.109.102 hadoop1
192.168.109.103 hadoop2
三台机器都要配置全量的hosts!!!不然后续启动secondarynamenode会失败
6、免密登陆
cd /root/.ssh
如果没有.ssh 则执行 mkdir -p /root/.ssh
生成密码
ssh-keygen -t dsa
cd /root/.ssh
cat id_dsa.pub >> authorized_keys
解释一下,先是生成密钥,将密钥放在机器的授权密钥中,其他两台机器重复上面步骤,将hadoop1,hadoop2的id_dsa.pub拷贝至hadoop0的authorized_keys中,这时候hadoop0到hadoop0,hadoop1,hadoop2就能免密登陆
ssh hadoop1
7、统一工作目录
mkdir -p /export/server/
mkdir -p /export/data/
mkdir -p /export/software/
8、hadoop环境变量
vim /etc/profile
export HADOOP_HOME=/export/server/hadoop-3.1.4
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
9、hadoop配置文件(都在hadoop0上操作,后续会全量拷贝至其他机器)
vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_361
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
vim etc/hadoop/core-site.xml
fs.defaultFS
hdfs://hadoop0:8020
hadoop.tmp.dir
/export/data/hadoop-3.1.4
hadoop.http.staticuser.user
root
vim etc/hadoop/hdfs-site.xml
dfs.namenode.secondary.http-address
hadoop1:9868
vim etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn
yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=${HADOOP_HOME}
mapreduce.map.env
HADOOP_MAPRED_HOME=${HADOOP_HOME}
mapreduce.reduce.env
HADOOP_MAPRED_HOME=${HADOOP_HOME}
vim etc/hadoop/workers
hadoop0
hadoop1
hadoop2
vim etc/hadoop/yarn-site.xml
yarn.resourcemanager.hostname
hadoop0
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.scheduler.minimum-allocation-mb
512
yarn.scheduler.maximum-allocation-mb
2048
yarn.nodemanager.vmem-pmem-radio
4
10、hdfs 初始化(切记不可多次初始化)
hdfs namenode -format
2023-03-26 00:12:47,011 INFO common.Storage: Storage directory /export/data/hadoop-3.1.4/dfs/name has been successfully formatted.
total 16
-rw-r--r-- 1 root root 391 Mar 26 00:12 fsimage_0000000000000000000
-rw-r--r-- 1 root root 62 Mar 26 00:12 fsimage_0000000000000000000.md5
-rw-r--r-- 1 root root 2 Mar 26 00:12 seen_txid
-rw-r--r-- 1 root root 220 Mar 26 00:12 VERSION
出现如上文件代表ok!
11、集群启动
根据我们自己规划的机器,分别启动对应进程
hadoop0 namenode datanode resourcemanager nodemanager
hadoop1 secondarynamenode datanode nodemanager
hadoop2 datanode nodemanager
hdfs --daemon start namenode|datanode|secondarynamenode
hdfs --daemon stop namenode|datanode|secondarynamenode
yarn --daemon start resourcemanager|nodemanager
yarn --daemon stop resourcemanager|nodemanager