(09)大数据之hadoop的分布式,伪分布式,单机环境搭建

2019年4月14日14:04:12
资料来自传智播客
apache hadoop三种架构介绍(standAlone,伪分布,分布式环境介绍以及安装)
hadoop 文档
http://hadoop.apache.org/docs/
5.1、StandAlone环境搭建
运行服务 服务器IP
NameNode 192.168.52.100
SecondaryNameNode 192.168.52.100
DataNode 192.168.52.100
ResourceManager 192.168.52.100
NodeManager 192.168.52.100

第一步:下载apache hadoop并上传到服务器
下载链接:
http://archive.apache.org/dist/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
解压命令
cd /export/softwares
tar -zxvf hadoop-2.7.5.tar.gz -C …/servers/

第二步:修改配置文件
修改core-site.xml
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim core-site.xml


fs.default.name
hdfs://192.168.52.100:8020


hadoop.tmp.dir
/export/servers/hadoop-2.7.5/hadoopDatas/tempDatas



io.file.buffer.size
4096



	fs.trash.interval
	10080

修改hdfs-site.xml 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim hdfs-site.xml dfs.blocksize 134217728 修改hadoop-env.sh 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim hadoop-env.sh vim hadoop-env.sh export JAVA_HOME=/export/servers/jdk1.8.0_141 修改mapred-site.xml 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim mapred-site.xml mapreduce.framework.name yarn

	mapreduce.job.ubertask.enable
	true



	mapreduce.jobhistory.address
	node01:10020



	mapreduce.jobhistory.webapp.address
	node01:19888

修改yarn-site.xml 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim yarn-site.xml yarn.resourcemanager.hostname node01 yarn.nodemanager.aux-services mapreduce_shuffle

	yarn.log-aggregation-enable
	true


	yarn.log-aggregation.retain-seconds
	604800

修改mapred-env.sh
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim mapred-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141
修改slaves
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim slaves
localhost
第三步:启动集群
要启动 Hadoop 集群,需要启动 HDFS 和 YARN 两个模块。
注意: 首次启动 HDFS 时,必须对其进行格式化操作。 本质上是一些清理和
准备工作,因为此时的 HDFS 在物理上还是不存在的。
hdfs namenode -format 或者 hadoop namenode –format
启动命令:
创建数据存放文件夹
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/tempDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/snn/name
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits

准备启动
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/
bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver

三个端口查看界面
http://node01:50070/explorer.html#/ 查看hdfs
http://node01:8088/cluster 查看yarn集群
http://node01:19888/jobhistory 查看历史完成的任务

5.2、伪分布式环境搭建(适用于学习测试开发集群模式)
服务规划
服务器IP 192.168.52.100 192.168.52.110 192.168.52.120
主机名 node01.hadoop.com node02.hadoop.com node03.hadoop.com
NameNode 是 否 否
Secondary
NameNode 是 否 否
dataNode 是 是 是
ResourceManager 是 否 否
NodeManager 是 是 是

停止单节点集群,删除/export/servers/hadoop-2.7.5/hadoopDatas文件夹,然后重新创建文件夹
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5
sbin/stop-dfs.sh
sbin/stop-yarn.sh
sbin/mr-jobhistory-daemon.sh stop historyserver

删除hadoopDatas然后重新创建文件夹
rm -rf /export/servers/hadoop-2.7.5/hadoopDatas

重新创建文件夹
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/tempDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/snn/name
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits

修改slaves文件,然后将安装包发送到其他机器,重新启动集群即可
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim slaves
node01
node02
node03

安装包的分发
第一台机器执行以下命令
cd /export/servers/
scp -r hadoop-2.7.5 node02: P W D s c p − r h a d o o p − 2.7.5 n o d e 03 : PWD scp -r hadoop-2.7.5 node03: PWDscprhadoop2.7.5node03:PWD

启动集群
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5
bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver

5.3、分布式环境搭建(适用于工作当中正式环境搭建)
使用完全分布式,实现namenode高可用,ResourceManager的高可用
集群运行服务规划
192.168.1.100 192.168.1.110 192.168.1.120
zookeeper zk zk zk
HDFS JournalNode JournalNode JournalNode
NameNode NameNode
ZKFC ZKFC
DataNode DataNode DataNode
YARN ResourceManager ResourceManager
NodeManager NodeManager NodeManager
MapReduce JobHistoryServer

安装包解压
停止之前的hadoop集群的所有服务,并删除所有机器的hadoop安装包,然后重新解压hadoop压缩包
解压压缩包
第一台机器执行以下命令进行解压
cd /export/softwares
tar -zxvf hadoop-2.7.5.tar.gz -C …/servers/

配置文件的修改
修改core-site.xml
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim core-site.xml


	ha.zookeeper.quorum
	node01:2181,node02:2181,node03:2181


	fs.defaultFS
	hdfs://ns


	hadoop.tmp.dir
	/export/servers/hadoop-2.7.5/data/tmp

 

	fs.trash.interval
	10080

修改hdfs-site.xml
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim hdfs-site.xml


	dfs.nameservices
	ns


	dfs.ha.namenodes.ns
	nn1,nn2




	dfs.namenode.rpc-address.ns.nn1
	node01:8020



	dfs.namenode.rpc-address.ns.nn2
	node02:8020



	dfs.namenode.servicerpc-address.ns.nn1
	node01:8022



	dfs.namenode.servicerpc-address.ns.nn2
	node02:8022




	dfs.namenode.http-address.ns.nn1
	node01:50070



	dfs.namenode.http-address.ns.nn2
	node02:50070




	dfs.namenode.shared.edits.dir
	qjournal://node01:8485;node02:8485;node03:8485/ns1



	dfs.client.failover.proxy.provider.ns
	org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider




	dfs.ha.fencing.methods
	sshfence




	dfs.ha.fencing.ssh.private-key-files
	/root/.ssh/id_rsa



	dfs.journalnode.edits.dir
	/export/servers/hadoop-2.7.5/data/dfs/jn



	dfs.ha.automatic-failover.enabled
	true



	dfs.namenode.name.dir
	file:///export/servers/hadoop-2.7.5/data/dfs/nn/name



	dfs.namenode.edits.dir
	file:///export/servers/hadoop-2.7.5/data/dfs/nn/edits



	dfs.datanode.data.dir
	file:///export/servers/hadoop-2.7.5/data/dfs/dn



	dfs.permissions
	false



	dfs.blocksize
	134217728

修改yarn-site.xml,注意node03与node02配置不同
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml


		yarn.log-aggregation-enable
		true

yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id mycluster yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 node03 yarn.resourcemanager.hostname.rm2 node02 yarn.resourcemanager.address.rm1 node03:8032 yarn.resourcemanager.scheduler.address.rm1 node03:8030 yarn.resourcemanager.resource-tracker.address.rm1 node03:8031 yarn.resourcemanager.admin.address.rm1 node03:8033 yarn.resourcemanager.webapp.address.rm1 node03:8088 yarn.resourcemanager.address.rm2 node02:8032 yarn.resourcemanager.scheduler.address.rm2 node02:8030 yarn.resourcemanager.resource-tracker.address.rm2 node02:8031 yarn.resourcemanager.admin.address.rm2 node02:8033 yarn.resourcemanager.webapp.address.rm2 node02:8088 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.ha.id rm1 If we want to launch more than one RM in single node, we need this configuration
   
yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore yarn.resourcemanager.zk-address node02:2181,node03:2181,node01:2181 For multiple zk services, separate them with comma yarn.resourcemanager.ha.automatic-failover.enabled true Enable automatic failover; By default, it is enabled only when HA is enabled. yarn.client.failover-proxy-provider org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider yarn.nodemanager.resource.cpu-vcores 4 yarn.nodemanager.resource.memory-mb 512 yarn.scheduler.minimum-allocation-mb 512 yarn.scheduler.maximum-allocation-mb 512 yarn.log-aggregation.retain-seconds 2592000 yarn.nodemanager.log.retain-seconds 604800 yarn.nodemanager.log-aggregation.compression-type gz yarn.nodemanager.local-dirs /export/servers/hadoop-2.7.5/yarn/local yarn.resourcemanager.max-completed-applications 1000 yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.connect.retry-interval.ms 2000

修改mapred-site.xml
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim mapred-site.xml

mapreduce.framework.name yarn mapreduce.jobhistory.address node03:10020 mapreduce.jobhistory.webapp.address node03:19888 mapreduce.jobtracker.system.dir /export/servers/hadoop-2.7.5/data/system/jobtracker mapreduce.map.memory.mb 1024 mapreduce.reduce.memory.mb 1024 mapreduce.task.io.sort.mb 100 mapreduce.task.io.sort.factor 10 mapreduce.reduce.shuffle.parallelcopies 25 yarn.app.mapreduce.am.command-opts -Xmx1024m yarn.app.mapreduce.am.resource.mb 1536 mapreduce.cluster.local.dir /export/servers/hadoop-2.7.5/data/system/local 修改slaves 第一台机器执行以下命令 cd /export/servers/hadoop-2.7.5/etc/hadoop vim slaves node01 node02 node03

修改hadoop-env.sh
第一台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141

集群启动过程
将第一台机器的安装包发送到其他机器上
第一台机器执行以下命令:
cd /export/servers
scp -r hadoop-2.7.5/ node02: P W D s c p − r h a d o o p − 2.7.5 / n o d e 03 : PWD scp -r hadoop-2.7.5/ node03: PWDscprhadoop2.7.5/node03:PWD
三台机器上共同创建目录
三台机器执行以下命令
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits

更改node02的rm2
第二台机器执行以下命令
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml

       
	yarn.resourcemanager.ha.id
	rm2
   If we want to launch more than one RM in single node, we need this configuration

启动HDFS过程
node01机器执行以下命令
cd /export/servers/hadoop-2.7.5
bin/hdfs zkfc -formatZK
sbin/hadoop-daemons.sh start journalnode
bin/hdfs namenode -format
bin/hdfs namenode -initializeSharedEdits -force
sbin/start-dfs.sh

node02上面执行
cd /export/servers/hadoop-2.7.5
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动yarn过程
node03上面执行
cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh
node02上执行
cd /export/servers/hadoop-2.7.5
sbin/start-yarn.sh
查看resourceManager状态
node03上面执行
cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm1

node02上面执行
cd /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm2
node03启动jobHistory
node03机器执行以下命令启动jobHistory
cd /export/servers/hadoop-2.7.5
sbin/mr-jobhistory-daemon.sh start historyserver
hdfs状态查看
node01机器查看hdfs状态
http://192.168.52.100:50070/dfshealth.html#tab-overview
node02机器查看hdfs状态
http://192.168.52.110:50070/dfshealth.html#tab-overview

yarn集群访问查看
http://node03:8088/cluster
历史任务浏览界面
页面访问:
http://192.168.52.120:19888/jobhistory

你可能感兴趣的:(大数据学习)