Linux 7 Hadoop2.6.0单机和集群搭建

搭建Hadoop前的准备工作

/etc/hosts这个文件是记载LAN内接续的各主机的对应[HostName和IP]用的。
在LAN内,我们各个主机间访问通信的时候,用的是内网的IP地址进行访问(例:192.168.1.22,192.168.1.23),从而确立连接进行通信。
除了通过访问IP来确立通信访问之外,我们还可以通过HostName进行访问,我们在安装机器的时候都会给机器起一个名字,这个名字就是这台机器的HostName
	假如HostA的 hostname是centos1,HostB的hostname是centos2那我们怎么能不但通过IP确立连接,通过这个IP对应的HostName进行连接访问呢?解决的办法就是这个/etc/hosts这个文件,通过把LAN内的各主机的IP地址和HostName的一一对应写入这个文件的时候,就可以解决问题。

配置主机名和IP映射关系 ,最下面一行是增加的

[root@CentOS ~]# vi /etc/hosts 
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 
# IP是你当前机器的IP,相当于给你当前的机器起了一个别名
`IP CentOS`

在分布式系统中很多服务都是以主机名标示节点,因此配置IP和主机名的映射关系.用户可以查看以下文件

这个/etc/sysconfig/network文件是定义hostname和是否利用网络的不接触网络设备的对系统全体定义的文件。

设定形式:设定值=值
设定项目如下:
	NETWORKING 是否利用网络                                     
	GATEWAY 默认网关
	IPGATEWAYDEV 默认网关的接口名
	HOSTNAME 主机名
	DOMAIN 域名
[root@CentOS ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=CentOS

关闭并禁用防火墙

分布式服务之间可能会产生相互的调度,为了保证正常的通信,一般需要关闭防火墙。

[root@CentOS ~]# systemctl stop firewalld.service
[root@CentOS ~]# systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2020-03-18 22:26:51 EDT; 2min 52s ago
     Docs: man:firewalld(1)
  Process: 11429 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 11429 (code=exited, status=0/SUCCESS)

Mar 18 22:21:16 CentOSC systemd[1]: Starting firewalld - dynamic firewal....
Mar 18 22:21:41 CentOSC systemd[1]: Started firewalld - dynamic firewall....
Mar 18 22:26:50 CentOSC systemd[1]: Stopping firewalld - dynamic firewal....
Mar 18 22:26:51 CentOSC systemd[1]: Stopped firewalld - dynamic firewall....
Hint: Some lines were ellipsized, use -l to show in full.
[root@CentOS ~]# systemctl disable firewalld.service
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

配置主机SSH免密码认证(密匙)

[root@CentOS ~]# ssh-keygen -t rsa
[root@CentOS ~]# ssh-copy-id IP

Hadoop单机搭建

[root@CentOS ~]# tar -zxf hadoop-2.6.0_x64.tar.gz -C /usr/
[root@CentOS ~]# vi /root/.bashrc # 添加以下各项
HADOOP_HOME=/usr/hadoop-2.6.0
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_HOME
[root@CentOS ~]# source /root/.bashrc
[root@CentOS ~]# echo $HADOOP_HOME 
/usr/hadoop-2.6.0

配置Hadoop

编辑 /usr/hadoop-2.6.0/etc/hadoop/core-site.xml 加入以下配置


<property>
    <name>fs.defaultFSname>
    <value>hdfs://IP:9000value>
property>

<property>
    <name>hadoop.tmp.dirname>
    <value>/usr/hadoop-2.6.0/hadoop-${user.name}value>
property>


<property>
    <name>fs.trash.intervalname>
    <value>1value>
property>

/usr/hadoop-2.6.0/etc/hadoop/hdfs-site.xml 加入以下配置

<property>
    <name>dfs.replicationname>
    <value>1value> 
property>

/usr/hadoop-2.6.0/etc/hadoop/slaves 改为以下配置

当前机器IP

启动HDFS

# 格式化HDFS
[root@CentOS ~]# hdfs namenode -format
# 出现以下日志表示成功
"""
19/01/02 20:19:37 INFO common.Storage: Storage directory /usr/hadoop-2.6.0/hadoop- root/dfs/name has been successfully formatted. 
"""
[root@CentOS hadoop]# tree /usr/hadoop-2.6.0/hadoop-root/
/usr/hadoop-2.6.0/hadoop-root/
└── dfs
    └── name
        └── current
            ├── fsimage_0000000000000000000
            ├── fsimage_0000000000000000000.md5
            ├── seen_txid
            └── VERSION

3 directories, 4 files
# 启动HDFS
[root@CentOS hadoop]# start-dfs.sh
# 出现以下日志表示成功
"""
Starting namenodes on [CentOS]
CentOS: starting namenode, logging to /usr/hadoop-2.6.0/logs/hadoop-root-namenode-CentOS.out
CentOS: starting datanode, logging to /usr/hadoop-2.6.0/logs/hadoop-root-datanode-CentOS.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:ptOfP+xxYMRrBJLeNsNwUZIJ94bTeGiRqTbLjCfwMyo.
ECDSA key fingerprint is MD5:28:6e:4e:68:dd:c1:95:38:bc:76:cf:30:ef:30:0f:d2.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/hadoop-2.6.0/logs/hadoop-root-secondarynamenode-CentOS.out

"""
# 停止HDFS
[root@CentOS ~]# stop-dfs.sh

访问浏览器:http://IP:50070

# 测试下放个文件
[root@CentOS ~]# hdfs dfs -put /root/jdk-8u171-linux-x64.rpm /
[root@CentOS ~]# hdfs dfs -ls /
Found 1 items
-rw-r--r--   1 root supergroup  175262413 2020-03-19 03:54 /jdk-8u171-linux-x64.rpm

搭建yarn环境

编辑 /usr/hadoop-2.6.0/etc/hadoop/yarn-site.xml 加入以下配置

<property> 
    <name>yarn.nodemanager.aux-servicesname>
    <value>mapreduce_shufflevalue>
property>

 
<property> 
    <name>yarn.resourcemanager.hostnamename> 
    <value>IPvalue>
property>

/usr/hadoop-2.6.0/etc/hadoop/mapred-site.xml


<property>
    <name>mapreduce.framework.namename>
    <value>yarnvalue>
property>

启动yarn

[root@CentOS ~]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/hadoop-2.6.0/logs/yarn-root-resourcemanager-CentOS.out
CentOS: starting nodemanager, logging to /usr/hadoop-2.6.0/logs/yarn-root-nodemanager-CentOS.out
[root@CentOS ~]# jps
32160 NodeManager
27906 NameNode
32072 ResourceManager
27995 DataNode
28188 SecondaryNameNode
32477 Jps

访问:http://IP:8088/

HDFS高可用&YARN高可用集群

CentOSA CentOSB CentOSC
zookeeper zookeeper zookeeper
zkfc zkfc
nn1 nn2
journalnode journalnode journalnode
datenode datanode datanode
rm1 rm2
nodemanager nodemanager nodemanager
# 时钟同步
[root@CentOSX ~]# date -s '2020-03-19 16:38:15'
Thu Mar 19 16:38:15 EDT 2020
[root@CentOSX ~]# clock -w

注:所有操作中CentOSA的表示在A上执行,B表示在B上执行,C表示在C上执行,X表示所有机器均执行

安装zookeeper

[root@CentOSX ~]#  tar -zxf zookeeper-3.4.6.tar.gz -C /usr/
[root@CentOSX ~]# vi /usr/zookeeper-3.4.6/conf/zoo.cfg
tickTime=2000 
dataDir=/root/zkdata 
clientPort=2181
initLimit=5 
syncLimit=2 
server.1=CentOSA:2887:3887 
server.2=CentOSB:2887:3887 
server.3=CentOSC:2887:3887
[root@CentOSX ~]# mkdir /root/zkdata
[root@CentOSA ~]# echo 1 >> zkdata/myid
[root@CentOSB ~]# echo 2 >> zkdata/myid
[root@CentOSC ~]# echo 3 >> zkdata/myid
[root@CentOSX ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh start zoo.cfg
[root@CentOSX ~]# /usr/zookeeper-3.4.6/bin/zkServer.sh status zoo.cfg

配置Hadoop

[root@CentOSC ~]# tar -zxf hadoop-2.6.0_x64.tar.gz -C /usr/
[root@CentOSC ~]# vi /root/.bashrc
HADOOP_HOME=/usr/hadoop-2.6.0
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_HOME
[root@CentOSC ~]# source /root/.bashrc
[root@CentOSC ~]# echo $HADOOP_HOME
/usr/hadoop-2.6.0

编辑 /usr/hadoop-2.6.0/etc/hadoop/core-site.xml 加入以下配置


<property>
    <name>fs.defaultFSname>
    <value>hdfs://myclustervalue>
property>

<property>
    <name>hadoop.tmp.dirname>
    <value>/usr/hadoop-2.6.0/hadoop-${user.name}value>
property>


<property>
    <name>fs.trash.intervalname>
    <value>1value>
property>


<property>
    <name>net.topology.script.file.namename>
    <value>/usr/hadoop-2.6.0/etc/hadoop/rack.shvalue> 
property>

创建 /usr/hadoop-2.6.0/etc/hadoop/rack.sh 脚本

while [ $# -gt 0 ] ; do
	nodeArg=$1
    exec</usr/hadoop-2.6.0/etc/hadoop/topology.data
    result=""
    while read line ; do
    	ar=( $line )
        if [ "${ar[0]}" = "$nodeArg" ] ; then
        result="${ar[1]}"
        fi
    done
    shift
    
    if [ -z "$result" ] ; then
    	echo -n "/default-rack"
    else
    	echo -n "$result "
    fi 
done
# 为脚本增加执行权限
[root@CentOSA ~]# ll /usr/hadoop-2.6.0/etc/hadoop/rack.sh
-rwxr--r--. 1 root root 358 Mar 19 16:59 /usr/hadoop-2.6.0/etc/hadoop/rack.sh

创建机架映射文件 /usr/hadoop-2.6.0/etc/hadoop/topology.data

192.168.49.146 /rack1
192.168.49.147 /rack1 
192.168.49.148 /rack2

/usr/hadoop-2.6.0/etc/hadoop/hdfs-site.xml 加入以下配置

<property>
    <name>dfs.replicationname>
    <value>3value>
property>
 
<property> 
    <name>dfs.ha.automatic-failover.enabledname>
    <value>truevalue>
property>
 
<property> 
    <name>ha.zookeeper.quorumname>
    <value>CentOSA:2181,CentOSB:2181,CentOSC:2181value> 
property>

<property> 
    <name>dfs.nameservicesname>
    <value>myclustervalue> 
property>
<property>
    <name>dfs.ha.namenodes.myclustername>
    <value>nn1,nn2value> 
property>
<property> 
    <name>dfs.namenode.rpc-address.mycluster.nn1name>
    <value>CentOSA:9000value>
property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2name> 
    <value>CentOSB:9000value> 
property>

<property>
    <name>dfs.namenode.shared.edits.dirname>
    <value>qjournal://CentOSA:8485;CentOSB:8485;CentOSC:8485/myclustervalue>
property> 

<property> 
    <name>dfs.client.failover.proxy.provider.myclustername> 
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue> 
property>

<property>
    <name>dfs.ha.fencing.methodsname>
    <value>sshfencevalue>
property> 
<property>
    <name>dfs.ha.fencing.ssh.private-key-filesname>
    <value>/root/.ssh/id_rsavalue>
property>

/usr/hadoop-2.6.0/etc/hadoop/slaves 改为以下配置

CentOSA
CentOSB
CentOSC

启动HDFS

[root@CentOSX ~]# hadoop-daemon.sh start journalnode # 等上10秒钟,再进行下一步操作 
[root@CentOSA ~]# hdfs namenode -format
[root@CentOSA ~]# hadoop-daemon.sh start namenode 
[root@CentOSB ~]# hdfs namenode -bootstrapStandby #(下载active的namenode元数据)
[root@CentOSB ~]# hadoop-daemon.sh start namenode 
[root@CentOSA|B ~]# hdfs zkfc -formatZK #(可以在CentOSA或者CentOSB任意一台注册namenode信息) 
[root@CentOSA ~]# hadoop-daemon.sh start zkfc # (哨兵)
[root@CentOSB ~]# hadoop-daemon.sh start zkfc # (哨兵) 
[root@CentOSX ~]# hadoop-daemon.sh start datanode

搭建yarn环境

编辑 /usr/hadoop-2.6.0/etc/hadoop/yarn-site.xml 加入以下配置

<property>
    <name>yarn.nodemanager.aux-servicesname> 
    <value>mapreduce_shufflevalue>
property>
 
<property>
    <name>yarn.resourcemanager.ha.enabledname>
    <value>truevalue>
property> 
<property>
    <name>yarn.resourcemanager.zk-addressname> 
    <value>CentOSA:2181,CentOSB:2181,CentOSC:2181value>
property> 
<property>
    <name>yarn.resourcemanager.cluster-idname>
    <value>rmcluster01value> 
property> 
<property> 
    <name>yarn.resourcemanager.ha.rm-idsname>
    <value>rm1,rm2value>
property> 
<property> 
    <name>yarn.resourcemanager.hostname.rm1name>
    <value>CentOSBvalue>
property>
<property> 
    <name>yarn.resourcemanager.hostname.rm2name> 
    <value>CentOSCvalue>
property>

/usr/hadoop-2.6.0/etc/hadoop/mapred-site.xml


<property>
    <name>mapreduce.framework.namename>
    <value>yarnvalue>
property>

启动yarn

[root@CentOSB ~]# yarn-daemon.sh start resourcemanager
[root@CentOSC ~]# yarn-daemon.sh start resourcemanager 
[root@CentOSX ~]# yarn-daemon.sh start nodemanager

你可能感兴趣的:(大数据,hadoop)