前面把hadoop机器已经准备好了,zk集群搭建好了,本本就是开始搭建hdfs环境
hadoop环境准备
三台机器都创建hadoop用户
useradd hadoop -d /home/hadoop
echo "1q1w1e1r" | passwd --stdin hadoop
进入hadoop用户 su - hadoop
生成一个ssh密钥,执行如下命令一直回车就行
ssh-keygen -t rsa -C '[email protected]'
执行免密脚本,自己也需要跟自己互连
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop01
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop02
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop03
校验,可以ssh就没问题了
[hadoop@hadoop01 ~]$ ssh hadoop02
Last login: Thu Nov 16 10:10:29 2023
[hadoop@hadoop02 ~]$
hadoop选择3.2.4版本
hadoop3.2.4
scp hadoop-3.2.4.tar.gz hadoop@hadoop01:/home/hadoop
scp hadoop-3.2.4.tar.gz hadoop@hadoop02:/home/hadoop
scp hadoop-3.2.4.tar.gz hadoop@hadoop03:/home/hadoop
解压
tar -zxvf hadoop-3.2.4.tar.gz
vi ~/.bash_profile
export HADOOP_HOME=/home/hadoop/hadoop-3.2.4
PATH=$PATH:$HADOOP_HOME/bin
export PATH
source ~/.bash_profile 使环境变量生效
查看hadoop版本
[hadoop@hadoop01 ~]$ hadoop version
Hadoop 3.2.4
Source code repository Unknown -r 7e5d9983b388e372fe640f21f048f2f2ae6e9eba
Compiled by ubuntu on 2022-07-12T11:58Z
Compiled with protoc 2.5.0
From source with checksum ee031c16fe785bbb35252c749418712
This command was run using /home/hadoop/hadoop-3.2.4/share/hadoop/common/hadoop-common-3.2.4.jar
如上所示表示环境变量配置成功,其它两台机器一样操作
我们要搭建的是hdfs的HA集群,需要规划NameNode与DataNode的部署,由于我们只有三台机器,DataNode肯定三台都需要部署,NameNode就一主一备,zkfc与NameNode对应,journalnode每台机器都部署
集群如下
hadoop01 | hadoop02 | hadoop03 |
---|---|---|
NameNode | NameNode | |
DataNode | DataNode | DataNode |
journalnode | journalnode | journalnode |
zkfc | zkfc |
由于后面需要修改配置文件,并且每台主机配置都是需要一样的,那么可以写一个同步脚本
同步脚本如下,通过修改hadoop01的配置,执行脚本同步到hadoop02与hadoop03
syncFile.sh
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop02 hadoop03
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
执行效果如下
[hadoop@hadoop01 hadoop-3.2.4]$ /home/hadoop/hadoop-3.2.4/syncFile.sh /home/hadoop/hadoop-3.2.4/etc/
==================== hadoop02 ====================
sending incremental file list
sent 983 bytes received 19 bytes 668.00 bytes/sec
total size is 109,238 speedup is 109.02
==================== hadoop03 ====================
sending incremental file list
sent 983 bytes received 19 bytes 2,004.00 bytes/sec
total size is 109,238 speedup is 109.02
core-site.xml定义了hadoop的全局配置信息
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://shura/value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/home/hadoop/hadoop-3.2.4/tmpvalue>
property>
<property>
<name>ha.zookeeper.quorumname>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181value>
property>
<property>
<name>hadoop.proxyuser.bigdata.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.bigdata.groupsname>
<value>*value>
property>
<property>
<name>hadoop.http.staticuser.username>
<value>hadoopvalue>
property>
configuration>
<configuration>
<property>
<name>dfs.nameservicesname>
<value>shuravalue>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>/home/hadoop/hadoop-3.2.4/dfs/namenodevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/home/hadoop/hadoop-3.2.4/dfs/datanode/1,/home/hadoop/hadoop-3.2.4/dfs/datanode/2value>
property>
<property>
<name>dfs.replicationname>
<value>2value>
property>
<property>
<name>dfs.ha.namenodes.shuraname>
<value>nn1,nn2value>
property>
<property>
<name>dfs.namenode.rpc-address.shura.nn1name>
<value>hadoop01:9000value>
property>
<property>
<name>dfs.namenode.http-address.shura.nn1name>
<value>hadoop01:50070value>
property>
<property>
<name>dfs.namenode.rpc-address.shura.nn2name>
<value>hadoop03:9000value>
property>
<property>
<name>dfs.namenode.http-address.shura.nn2name>
<value>hadoop03:50070value>
property>
<property>
<name>dfs.journalnode.edits.dirname>
<value>/home/hadoop/hadoop-3.2.4/journaldatavalue>
property>
<property>
<name>dfs.namenode.shared.edits.dirname>
<value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/shuravalue>
property>
<property>
<name>dfs.ha.automatic-failover.enabledname>
<value>truevalue>
property>
<property>
<name>dfs.client.failover.proxy.provider.shuraname>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property>
<name>dfs.ha.fencing.methodsname>
<value>
sshfence
shell(/bin/true)
value>
property>
<property>
<name>dfs.ha.fencing.ssh.private-key-filesname>
<value>/home/hadoop/.ssh/id_rsavalue>
property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeoutname>
<value>30000value>
property>
<property>
<name>dfs.webhdfs.enabledname>
<value>truevalue>
property>
configuration>
hadoop01
hadoop02
hadoop03
以上配置文件在hadoop01上修改,通过syncFile.sh脚本同步
[hadoop@hadoop01 hadoop]$ /home/hadoop/hadoop-3.2.4/syncFile.sh /home/hadoop/hadoop-3.2.4/etc/
==================== hadoop02 ====================
sending incremental file list
etc/hadoop/
etc/hadoop/core-site.xml
etc/hadoop/hdfs-site.xml
etc/hadoop/workers
sent 5,601 bytes received 113 bytes 11,428.00 bytes/sec
total size is 112,138 speedup is 19.63
==================== hadoop03 ====================
sending incremental file list
etc/hadoop/
etc/hadoop/core-site.xml
etc/hadoop/hdfs-site.xml
etc/hadoop/workers
sent 5,601 bytes received 113 bytes 11,428.00 bytes/sec
total size is 112,138 speedup is 19.63
journalnode的定义见上一篇文章,hdfs概述与高可用原理
hdfs --daemon start journalnode
WARNING: /home/hadoop/hadoop-3.2.4/logs does not exist. Creating.
## 查看java进程
[hadoop@hadoop01 ~]$ jps
30272 JournalNode
31554 Jps
在hadoop01执行下面命令
hdfs namenode -format
该命令会在hdfs上创建一个新的文件系统,并且初始化其目录结构与元数据信息,它会清空NameNode节点的数据目录,包括fsimage和edits文件并且创建新的,是一个比较危险的动作,一般环境搭建的时候使用
fsimage 文件系统的元数据信息
edits 元数据的变化历史记录
执行完后目录效果
[hadoop@hadoop01 hadoop-3.2.4]$ ls -lrt dfs/ journaldata/
journaldata/:
total 4
drwxrwxr-x 4 hadoop hadoop 4096 Nov 16 16:01 shura
dfs/:
total 8
drwxrwxr-x 3 hadoop hadoop 4096 Nov 16 16:00 namenode
drwxrwxr-x 4 hadoop hadoop 4096 Nov 16 16:02 datanode
我们有两个NameNode,一个是active,一个是standby,standby是用于同步,在搭建环境的时候,可以手动进行一次同步
需要启动active ,在hadoop01启动
hdfs --daemon start namenode
[hadoop@hadoop01 ~]$ jps
30272 JournalNode
32617 NameNode
32686 Jps
在hadoop03,执行同步命令
hdfs namenode -bootstrapStandby
hdfs zkfc -formatZK
只需要一台机器执行start-dfs.sh就可以了
/home/hadoop/hadoop-3.2.4/sbin/start-dfs.sh
## hadoop01
[hadoop@hadoop01 sbin]$ jps
30272 JournalNode
1153 DataNode
1683 Jps
32617 NameNode
1545 DFSZKFailoverController
## hadoop02
[hadoop@hadoop02 ~]$ jps
22608 DataNode
20283 JournalNode
1581 Jps
## hadoop03
[hadoop@hadoop03 ~]$ jps
19265 JournalNode
1227 Jps
21725 DataNode
21598 NameNode
21934 DFSZKFailoverController
查看状态
[hadoop@hadoop01 ~]$ hdfs haadmin -getAllServiceState
hadoop01:9000 active
hadoop03:9000 standby
访问NameNode,我们配置的http地址是50070
如上结果所示,与我们规划部署一致
hdfs的部署就到这了,下节我们开始简单使用hdfs
欢迎关注,学习不迷路!