hadoop伪分布式集群搭建

基于hadoop 3.1.4

一、准备好需要的文件

1、hadoop-3.1.4编译完成的包
链接: https://pan.baidu.com/s/1tKLDTRcwSnAptjhKZiwAKg 提取码: ekvc
2、需要jdk环境
链接: https://pan.baidu.com/s/18JtAWbVcamd2J_oIeSVzKw 提取码: bmny
3、vmware安装包
链接: https://pan.baidu.com/s/1YxDntBWSCEnN9mTYlH0FUA 提取码: uhsj
4、vmware许可证
链接: https://pan.baidu.com/s/10CsLc-nJXnH5V9IMP-KZeg 提取码: r5y5
5、linux下载
镜像下载地址

二、准备工作

1、安装虚拟机
自行搜索!!!
2、配置静态ip

cd /etc/sysconfig/network-scripts/

hadoop伪分布式集群搭建_第1张图片

机器的ip,网关,子网掩码根据自己机器自行查看

###静态ip配置
IPADDR=192.168.109.103 ##我们需要指定的ip
NETMASK=255.255.255.0
GATEWAY=192.168.109.2
DNS1=8.8.8.8

3、linux安装jdk
卸载linux自带openjdk

rpm -qa|grep jdk

.noarch后缀的不要删除
rpm -e --nodeps XXX
tar -zxvf  xxx.jar
 
vim /etc/profile

export JAVA_HOME=/usr/local/jdk1.8.0_361
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

source /etc/profile	  

4、关闭防火墙

systemctl  stop firewalld.service
systemctl  diabled firewalld.service

规划
hadoop0 namenode datanode resourcemanager nodemanager
hadoop1 secondarynamenode datanode nodemanager
hadoop2 datanode nodemanager

5、配置hostname以及hosts文件

hostnamectl set-hostname hadoop0
vim /etc/hosts
192.168.109.101 hadoop0
192.168.109.102 hadoop1
192.168.109.103 hadoop2

三台机器都要配置全量的hosts!!!不然后续启动secondarynamenode会失败

6、免密登陆

cd /root/.ssh
如果没有.ssh 则执行 mkdir -p /root/.ssh

生成密码
ssh-keygen -t dsa
cd /root/.ssh
cat id_dsa.pub >> authorized_keys

解释一下,先是生成密钥,将密钥放在机器的授权密钥中,其他两台机器重复上面步骤,将hadoop1,hadoop2的id_dsa.pub拷贝至hadoop0的authorized_keys中,这时候hadoop0到hadoop0,hadoop1,hadoop2就能免密登陆

ssh hadoop1

7、统一工作目录

mkdir -p /export/server/
mkdir -p /export/data/
mkdir -p /export/software/

8、hadoop环境变量

vim /etc/profile

export HADOOP_HOME=/export/server/hadoop-3.1.4
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin

source /etc/profile

9、hadoop配置文件(都在hadoop0上操作,后续会全量拷贝至其他机器)

vim  etc/hadoop/hadoop-env.sh


export JAVA_HOME=/usr/local/jdk1.8.0_361

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

vim  etc/hadoop/core-site.xml


	fs.defaultFS
	hdfs://hadoop0:8020


  hadoop.tmp.dir
  /export/data/hadoop-3.1.4


  hadoop.http.staticuser.user
  root

vim etc/hadoop/hdfs-site.xml


dfs.namenode.secondary.http-address
hadoop1:9868


vim etc/hadoop/mapred-site.xml


  mapreduce.framework.name
  yarn


  yarn.app.mapreduce.am.env
  HADOOP_MAPRED_HOME=${HADOOP_HOME}



  mapreduce.map.env
  HADOOP_MAPRED_HOME=${HADOOP_HOME}


  mapreduce.reduce.env
  HADOOP_MAPRED_HOME=${HADOOP_HOME}


vim etc/hadoop/workers

hadoop0
hadoop1
hadoop2

vim etc/hadoop/yarn-site.xml


  yarn.resourcemanager.hostname
  hadoop0



  yarn.nodemanager.aux-services
  mapreduce_shuffle



  yarn.scheduler.minimum-allocation-mb
  512



  yarn.scheduler.maximum-allocation-mb
  2048



  yarn.nodemanager.vmem-pmem-radio
  4

10、hdfs 初始化(切记不可多次初始化)

hdfs namenode -format
2023-03-26 00:12:47,011 INFO common.Storage: Storage directory /export/data/hadoop-3.1.4/dfs/name has been successfully formatted.

total 16
-rw-r--r-- 1 root root 391 Mar 26 00:12 fsimage_0000000000000000000
-rw-r--r-- 1 root root  62 Mar 26 00:12 fsimage_0000000000000000000.md5
-rw-r--r-- 1 root root   2 Mar 26 00:12 seen_txid
-rw-r--r-- 1 root root 220 Mar 26 00:12 VERSION

出现如上文件代表ok!

11、集群启动
根据我们自己规划的机器,分别启动对应进程

hadoop0  namenode datanode resourcemanager nodemanager
hadoop1  secondarynamenode datanode nodemanager
hadoop2  datanode nodemanager


hdfs --daemon start  namenode|datanode|secondarynamenode
hdfs --daemon stop   namenode|datanode|secondarynamenode

yarn --daemon start  resourcemanager|nodemanager
yarn --daemon stop   resourcemanager|nodemanager

你可能感兴趣的:(大数据学习之路,hadoop)