伪分布式:一个集群的所有角色都分布式在一个节点。
注意区分一个概念:单机模式
Hadoop从2.x开始,就开始分化了。逐渐演变成:HDFS、YARN、MapReduce三大应用模块,这三个应用模块分别的能力和作用是:
1、HDFS:分布式文件系统,用来解决海量大文件的存储问题
2、MapReduce:一套通用的用来解决海量大文件计算的编程模型API
3、YARN:资源调度/管理系统
其中需要注意的是:这三者之间的关系。彼此独立,又相互依赖。使用MapReduce的分布式编程API编写分布式计算应用程序,读取存储在HDFS上的海量大文件进行计算,由YARN提供计算资源。HDFS和YARN可以独立运行。主要表现在:
1、使用MapReduce编写的应用程序也可以运行在其他资源调度系统之上
2、使用其他编程模型编写的应用程序,比如Storm、Spark、Flink等也可以运行在YARN集群上
所以称Hadoop是一个分布式的成熟解决方案。
所以安装Hadoop,其实就是安装HDFS和YARN两个集群。HDFS和YARN都是一个一主多从的集群。
HDFS集群:
一个NameNode主节点/管理节点
多个DataNode从节点/工作节点
YARN集群:
一个ResourceManager主节点/管理节点
多个NodeManager从节点/工作节点
CentOS7.6 + Hadoop 2.7.7
节点名称 | HDFS | YARN |
---|---|---|
master | NameNode + DataNode + SecondaryNamenode | ResourceManager+NodeManager |
安装:
[root@master ~]# yum -y install redhat-lsb-core-4.1-27.el7.centos.1.x86_64
[root@master ~]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.6.1810 (Core)
Release: 7.6.1810
Codename: Core
[root@master ~]# hostnamectl set-hostname master
[root@master ~]# hostnamectl
Static hostname: master
Icon name: computer-vm
Chassis: vm
Machine ID: a21c09986dee4158905b391d0d5e0d3f
Boot ID: 86a0d4893c5a411a92d11d4e1cfc70bc
Virtualization: vmware
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-957.el7.x86_64
Architecture: x86-64
本次部署使用hadoop用户,需要添加hadoop用户
#创建hadoop组
[root@master ~]# groupadd hadoop
#创建一个hadoop组下的hadoop用户,并使用/bin/bash作为shell
[root@master ~]# useradd -m hadoop -g hadoop -s /bin/bash
#useradd参数
# -m:自动建立用户的登入目录
# -g:指定用户所属的附加群组
# -s:指定用户登入后所使用的shell
#设置hadoop用户密码
[root@master ~]# passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
可以为hadoop用户增加管理员权限
[root@master ~]# visudo
#在里面添加
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
集群、伪分布式模式的都需要用到SSH登录,一般情况下,CentOS默认安装了SSH client、SSH server,打开终端执行如下命令进行检验,查看是否包含了SSH client跟SSH server。
[root@master ~]# rpm -qa |grep ssh
openssh-server-7.4p1-16.el7.x86_64
openssh-clients-7.4p1-16.el7.x86_64
libssh2-1.4.3-12.el7.x86_64
openssh-7.4p1-16.el7.x86_64
如果不包含,可以通过yum进行安装
[root@master ~]# yum install openssh-clients
[root@master ~]# yum install openssh-server
测试ssh是否可用
#按照提示输入密码,就可以登录本机
[root@master ~]# ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:wnXk424rDwk9tKvMjS2pWcqXnJ6B+cSq3MZVxPI69mE.
ECDSA key fingerprint is MD5:62:51:10:f6:78:10:76:b9:8c:91:fc:9b:b2:e5:8d:18.
Are you sure you want to continue connecting (yes/no)?
生成密钥
#若没有该目录,请先执行一次ssh master
[hadoop@master ~]$ cd ~/.ssh/
-bash: cd: /home/hadoop/.ssh/: No such file or directory
# pwd查看当前目录,应为"/home/hadoop/"
[hadoop@master ~]$ pwd
/home/hadoop
#查看.ssh目录已经存在
[hadoop@master ~]$ cd .ssh
[hadoop@master .ssh]$ pwd
/home/hadoop/.ssh
#修改/etc/hosts文件
[hadoop@master ~]$ cat /etc/hosts
添加内容如下:
192.168.1.100 master
#生成密钥
[hadoop@master .ssh]$ ssh-keygen -t rsa #需要输入三次密码
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:JVm7S4NqjrJyeZY9wPAY58WAA94EBqhPTnLMT0isuQA hadoop@master
The key's randomart image is:
+---[RSA 2048]----+
|==.o . |
|E B . o . |
|oB + o o o |
|* X o o + . |
|.O @ . S + |
|. + * . . o |
| . +o . |
|. + ++o |
| o.=. .. |
+----[SHA256]-----+
[hadoop@master .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@master .ssh]$ chmod 600 authorized_keys
#此时再使用ssh master命令,无需输入密码就可以直接登录
[hadoop@master .ssh]$ ssh master
Last login: Wed Apr 14 08:34:34 2021 from localhost
[hadoop@master ~]$ exit
logout
Connection to master closed.
[hadoop@master .ssh]$
#关闭防火墙
[root@master local]# systemctl stop firewalld.service
#查看防火墙状态
[root@master local]# systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
#查看状态
[root@master local]# getenforce
Disabled
[root@master local]# /usr/sbin/sestatus -v
SELinux status: disabled
#临时关闭
[root@master local]# setenforce 0
#永久关闭
[root@master local]# vim /etc/selinux/config
#将SELINUX=disabled改为SELINUX=disabled
#设置后需要重启才能生效
略
略
[root@master local]# vim /etc/profile
export JAVA_HOME=/usr/local/jdk1.8.0_211
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
[root@master local]# source /etc/profile
[root@master local]# java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
略
[hadoop@master apps]$ tar -zxvf hadoop-2.7.7-centos7.tar.gz -C /home/hadoop/apps
#添加jdk安装目录
[hadoop@master hadoop-2.7.7]$ cd /home/hadoop/apps/hadoop-2.7.7/etc/hadoop/
[hadoop@master hadoop]$ vim hadoop-env.sh
#修改export JAVA_HOME=${JAVA_HOME}为:
export JAVA_HOME=/usr/local/jdk1.8.0_211
[hadoop@master hadoop]$ vim core-site.xml
#添加配置如下:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/hadoopdata</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
该文件的作用,用来指定Hadoop集群的从节点有哪些。由于我们搭建伪分布式,所以只有一个
[hadoop@master hadoop]$ vim slaves
#添加配置如下:
master
[hadoop@master hadoop]$ vim hdfs-site.xml
#添加配置如下:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/hadoopdata/name</value>
<description>为了保证元数据的安全一般配置多个不同目录</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/hadoopdata/data</value>
<description>datanode 的数据存储目录</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>HDFS 的数据块的副本存储个数, 默认是3</description>
</property>
</configuration>
[hadoop@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@master hadoop]$ vim mapred-site.xml
#添加配置如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[hadoop@master hadoop]$ vim yarn-site.xml
#添加配置如下:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>YARN 集群为 MapReduce 程序提供的 shuffle 服务</description>
</property>
</configuration>
执行命令:
[hadoop@master ~]$ vim .bash_profile
#添加配置如下:
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.7.7
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export PATH
#使配置立即生效
[hadoop@master ~]$ source .bash_profile
[hadoop@master ~]$ echo $HADOOP_HOME
/home/hadoop/apps/hadoop-2.7.7
[hadoop@master ~]$ mkdir -p /home/hadoop/data/hadoopdata/name
[hadoop@master ~]$ mkdir -p /home/hadoop/data/hadoopdata/data
在namenode所在节点执行命令
[hadoop@master ~]$ hadoop namenode -format
21/04/15 08:31:13 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
21/04/15 08:31:13 INFO namenode.NameNode: createNameNode [-format]
21/04/15 08:31:15 WARN common.Util: Path /home/hadoop/data/hadoopdata/name should be specified as a URI in configuration files. Please update hdfs configuration.
21/04/15 08:31:15 WARN common.Util: Path /home/hadoop/data/hadoopdata/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-dbd003a2-5380-45b0-a965-5e80285f26fc
21/04/15 08:31:15 INFO namenode.FSNamesystem: No KeyProvider found.
21/04/15 08:31:15 INFO namenode.FSNamesystem: fsLock is fair: true
21/04/15 08:31:15 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
21/04/15 08:31:15 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
21/04/15 08:31:15 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
21/04/15 08:31:15 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
21/04/15 08:31:15 INFO blockmanagement.BlockManager: The block deletion will start around 2021 Apr 15 08:31:15
21/04/15 08:31:15 INFO util.GSet: Computing capacity for map BlocksMap
21/04/15 08:31:15 INFO util.GSet: VM type = 64-bit
21/04/15 08:31:15 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
21/04/15 08:31:15 INFO util.GSet: capacity = 2^21 = 2097152 entries
21/04/15 08:31:15 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
21/04/15 08:31:15 INFO blockmanagement.BlockManager: defaultReplication = 2
21/04/15 08:31:15 INFO blockmanagement.BlockManager: maxReplication = 512
21/04/15 08:31:15 INFO blockmanagement.BlockManager: minReplication = 1
21/04/15 08:31:15 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
21/04/15 08:31:15 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
21/04/15 08:31:15 INFO blockmanagement.BlockManager: encryptDataTransfer = false
21/04/15 08:31:15 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
21/04/15 08:31:15 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
21/04/15 08:31:15 INFO namenode.FSNamesystem: supergroup = supergroup
21/04/15 08:31:15 INFO namenode.FSNamesystem: isPermissionEnabled = true
21/04/15 08:31:15 INFO namenode.FSNamesystem: HA Enabled: false
21/04/15 08:31:15 INFO namenode.FSNamesystem: Append Enabled: true
21/04/15 08:31:16 INFO util.GSet: Computing capacity for map INodeMap
21/04/15 08:31:16 INFO util.GSet: VM type = 64-bit
21/04/15 08:31:16 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
21/04/15 08:31:16 INFO util.GSet: capacity = 2^20 = 1048576 entries
21/04/15 08:31:16 INFO namenode.FSDirectory: ACLs enabled? false
21/04/15 08:31:16 INFO namenode.FSDirectory: XAttrs enabled? true
21/04/15 08:31:16 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
21/04/15 08:31:16 INFO namenode.NameNode: Caching file names occuring more than 10 times
21/04/15 08:31:16 INFO util.GSet: Computing capacity for map cachedBlocks
21/04/15 08:31:16 INFO util.GSet: VM type = 64-bit
21/04/15 08:31:16 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
21/04/15 08:31:16 INFO util.GSet: capacity = 2^18 = 262144 entries
21/04/15 08:31:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
21/04/15 08:31:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
21/04/15 08:31:16 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
21/04/15 08:31:16 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
21/04/15 08:31:16 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
21/04/15 08:31:16 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
21/04/15 08:31:16 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
21/04/15 08:31:16 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
21/04/15 08:31:16 INFO util.GSet: Computing capacity for map NameNodeRetryCache
21/04/15 08:31:16 INFO util.GSet: VM type = 64-bit
21/04/15 08:31:16 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
21/04/15 08:31:16 INFO util.GSet: capacity = 2^15 = 32768 entries
21/04/15 08:31:16 INFO namenode.FSImage: Allocated new BlockPoolId: BP-211112453-127.0.0.1-1618446676539
21/04/15 08:31:16 INFO common.Storage: Storage directory /home/hadoop/data/hadoopdata/name has been successfully formatted.
21/04/15 08:31:16 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/data/hadoopdata/name/current/fsimage.ckpt_0000000000000000000 using no compression
21/04/15 08:31:17 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/data/hadoopdata/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
21/04/15 08:31:17 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/04/15 08:31:17 INFO util.ExitUtil: Exiting with status 0
21/04/15 08:31:17 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/
[hadoop@master ~]$ start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/apps/hadoop-2.7.7/logs/hadoop-hadoop-namenode-master.out
master: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.7/logs/hadoop-hadoop-datanode-master.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:wnXk424rDwk9tKvMjS2pWcqXnJ6B+cSq3MZVxPI69mE.
ECDSA key fingerprint is MD5:62:51:10:f6:78:10:76:b9:8c:91:fc:9b:b2:e5:8d:18.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/apps/hadoop-2.7.7/logs/hadoop-hadoop-secondarynamenode-master.out
[hadoop@master ~]$ jps
107667 Jps
107239 NameNode
107544 SecondaryNameNode
107340 DataNode
[hadoop@master ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.7.7/logs/yarn-hadoop-resourcemanager-master.out
master: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.7.7/logs/yarn-hadoop-nodemanager-master.out
[hadoop@master ~]$ jps
108179 Jps
107842 NodeManager
107239 NameNode
107544 SecondaryNameNode
107340 DataNode
107742 ResourceManager
[hadoop@master ~]$ hadoop version
Hadoop 2.7.7
Subversion Unknown -r Unknown
Compiled by root on 2020-04-02T23:53Z
Compiled with protoc 2.5.0
From source with checksum 792e15d20b12c74bd6f19a1fb886490
This command was run using /home/hadoop/apps/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar
两种方式:
1、利用jps工具检测各进程是否启动成功
[hadoop@master ~]$ jps
107842 NodeManager
107239 NameNode
108455 Jps
107544 SecondaryNameNode
107340 DataNode
107742 ResourceManager
2、利用webui来进行查看
HDFS:http://192.168.1.100:50070
YARN: http://192.168.1.100:8088
Hadoop伪分布式搭建:https://www.cnblogs.com/zingp/p/11223220.html#_label1
Hadoop 2.7 伪分布式环境搭建(超详细):https://www.pianshen.com/article/3948129026/