Hadoop-HA配置笔记

1.zookeeper集群

1.1.启动zookeeper集群

查看jps,都有QuorumPeerMain
查看./bin/zkServer.sh status
有leader,有follower,ok

1.2.进入zookeeper的shell

通过./bin/zkCli.sh进入shell
通过ls /查看根目录情况

[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]

全新的zookeeper,还没用过。

2.Hadoop hdfs HA

2.1.确定集群规划

master slave1 slave2
172.16.30.100 172.16.30.101 172.16.30.102
NameNode
DataNode DataNode DataNode
NodeManager NodeManager NodeManager
HistoryServer ResourceManager SecondaryNameNode

2.2.配置jdk,hadoop环境

vi ~/.bash_profile
在末尾添加

PATH=$PATH:$HOME/bin
export JAVA_HOME=/home/hadoop/env/jdk1.7.0_80
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/opt/modules/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$PATH

2.3.配置hdfs-site.xml


	
		
		dfs.replication
		2
	
	
		
		dfs.permissions.enabled
		false
	
	
		
		dfs.nameservices
		cluster
	
	
		
		dfs.ha.namenodes.cluster
		nn1,nn2
	
	
		
		dfs.namenode.rpc-address.cluster.nn1
		master:9000
	
	
		
		dfs.namenode.rpc-address.cluster.nn2
		slave1:9000
	
	
		
		dfs.namenode.http-address.cluster.nn1
		master:50070
	
	
		
		dfs.namenode.http-address.cluster.nn2
		slave1:50070
	
	
		
		dfs.ha.automatic-failover.enabled
		true
	
	
		
		dfs.namenode.shared.edits.dir
		qjournal://master:8485;slave1:8485;slave2:8485/cluster
	
	
		
		dfs.journalnode.edits.dir
		/opt/data/hadoop/ha_hdfs/jn
	
	
		
		dfs.client.failover.proxy.provider.cluster
		org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
	
	
		dfs.ha.fencing.methods
		sshfence
	
	
		dfs.ha.fencing.ssh.private-key-files
		/home/hadoop/.ssh/id_rsa
	
	
		
		dfs.datanode.socket.write.timeout
		1200000
	
	
		
		dfs.client.socket-timeout
		300000
	
	
		
		dfs.datanode.max.xcievers
		8192
	
	
		
		dfs.namenode.name.dir
		/opt/data/hadoop/ha_hdfs/name
	
	
		
		dfs.datanode.data.dir
		/opt/data/hadoop/ha_hdfs/data
	
	
		
		dfs.namenode.handler.count
		12
	
	
		
		dfs.blocksize
		134217728
	
	
		dfs.datanode.balance.bandwidthPerSec
		10485760
	
	
		
		dfs.ha.zkfc.port
		8019
	

这里,需要注意的是,/opt/data/hadoop/这里需要设置同一个用户如果不是同一个用户,哪怕给它777权限,也会出现问题。我遇到的问题是,datanode进程起不来。

2.4.配置core-site.xml


	
		
		fs.defaultFS
		hdfs://cluster
	
	
		hadoop.tmp.dir
		/opt/data/hadoop/ha_tmp
	
	
		io.file.buffer.size
		4096
	
	

		ha.zookeeper.quorum
		master:2181,slave1:2181,slave2:2181
	

2.5.配置slaves

master
slave1
slave2

2.6.配置hadoop-env.sh

配置下
export HADOOP_PID_DIR=/opt/data/hadoop/ha_pid

2.7.启动hdfs

2.7.1.配置手动故障转移

三台分别启动journalNode
./sbin/hadoop-daemon.sh start journalnode

2.7.2.启动namenode

在第一台进行namenode格式化
./bin/hdfs namenode -format
启动第一台namenode
./sbin/hadoop-deamon.sh start namenode
在第二台namenode同步(前置条件,第一台需要启动namenode
./bin/hdfs namenode -bootstrapStandby
再启动第二台namenode
./sbin/hadoop-deamon.sh start namenode

查看hdfs web页面,都是standby状态。
强制切换第一台为active
./bin/hdfs haadmin -transitionToActive nn1 -forcemanual
此时再查看页面,第一台变为active

手动故障转移配置结束

2.7.2.配置自动故障转移

先把整个集群关闭,zookeeper不关。
选第一台,格式化zkfc
./bin/hdfs zkfc –formatZK

查看zookeeper目录树,多了一个 hadoop-ha的znode

[zk: localhost:2181(CONNECTED) 4] ls /
[hadoop-ha, zookeeper]

2.7.3.启动hdfs

./sbin/start-dfs.sh

查看master节点进程

[hadoop@master hadoop]$ jps
36329 JournalNode
36151 DataNode
36052 NameNode
36775 Jps
35440 ZooKeeperMain
36479 DFSZKFailoverController
34688 QuorumPeerMain

ZookeeperMain这个进程是客户端连到zookeeper的进程。

查看slave1节点进程

[hadoop@slave1 zookeeper]$ jps
56760 DFSZKFailoverController
56990 Jps
56572 DataNode
56502 NameNode
56667 JournalNode
17361 QuorumPeerMain

查看slave2节点进程

[hadoop@slave2 ~]$ jps
105523 QuorumPeerMain
106869 Jps
106772 JournalNode
106595 DataNode

2.7.4.测试ha

先确定哪一台是active的namenode,然后把这台namenode杀掉。此时,另外一台会从standby变为active。然后再把先前杀掉的那台启动namenode(./sbin/hadoop-deamon.sh start namenode),重复上述操作,刚手动启动的namenode应该会从standby变为active。

hdfs启动完毕。

2.8.启动yarn

2.8.1.配置mapred-site.xml


	
		mapreduce.framework.name
		yarn
	

鉴于虚拟机内存设置太小,很多参数只能设置为默认,否则mapreduce跑起来可能会出现问题。

2.8.2.配置yarn-site.xml


	
		yarn.nodemanager.aux-services
		mapreduce_shuffle
	

	
		
		yarn.resourcemanager.ha.enabled
		true
	
	
		
		yarn.resourcemanager.cluster-id
		yarn-cluster
	
	
		
		yarn.resourcemanager.ha.rm-ids
		rm1,rm2
	
	
		
		yarn.resourcemanager.hostname.rm1
		slave1
	
	
		
		yarn.resourcemanager.hostname.rm2
		slave2
	
	
		
		yarn.resourcemanager.zk-address
		master:2181,slave1:2181,slave2:2181
	
	
		yarn.resourcemanager.recovery.enabled
		true
	
	
		yarn.resoucemanager.store.class
		org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
	

2.8.3.启动

主节点启动yarn
./sbin/start-yarn.sh
然后,启动两个resourcemanager节点
./sbin/yarn-daemon.sh start resourcemanager

此时,查看zookeeper

[zk: localhost:2181(CONNECTED) 0] ls /
[yarn-leader-election, hadoop-ha, zookeeper]

多了个yarn-leader-election

2.9.检查

至此,hadoop ha集群搭建完成

master进程情况

[hadoop@master hadoop]$ jps
38296 NodeManager
37230 NameNode
36329 JournalNode
38469 ZooKeeperMain
38743 Jps
36151 DataNode
36479 DFSZKFailoverController
34688 QuorumPeerMain

slave1进程情况

[hadoop@slave1 hadoop]$ jps                                                                                  
58676 Jps
56760 DFSZKFailoverController
56572 DataNode
56667 JournalNode
57922 ResourceManager
57769 NodeManager
17361 QuorumPeerMain
57211 NameNode

slave2进程情况

[hadoop@slave2 hadoop]$ jps
105523 QuorumPeerMain
107074 NodeManager
107482 Jps
106772 JournalNode
106595 DataNode
107229 ResourceManager

运行一个mapreduce程序
先创建目录
hdfs dfs -mkdir /input
再导入文件
vi /opt/data/wc.input

hadoop hive
hbase spark storm
sqoop hadoop hive
spark hadoop

hdfs dfs -put /opt/data/wc.input /input/wc.input
执行例子mapreduce
yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/wc.input /output2

[hadoop@master hadoop]$ yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/wc.input /output2                                                                                            
19/02/13 06:43:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/13 06:43:46 INFO input.FileInputFormat: Total input paths to process : 1
19/02/13 06:43:47 INFO mapreduce.JobSubmitter: number of splits:1
19/02/13 06:43:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1550010526021_0002
19/02/13 06:43:47 INFO impl.YarnClientImpl: Submitted application application_1550010526021_0002
19/02/13 06:43:47 INFO mapreduce.Job: The url to track the job: http://slave1:8088/proxy/application_1550010526021_0002/
19/02/13 06:43:47 INFO mapreduce.Job: Running job: job_1550010526021_0002
19/02/13 06:44:01 INFO mapreduce.Job: Job job_1550010526021_0002 running in uber mode : false
19/02/13 06:44:01 INFO mapreduce.Job:  map 0% reduce 0%
19/02/13 06:44:08 INFO mapreduce.Job:  map 100% reduce 0%
19/02/13 06:44:14 INFO mapreduce.Job:  map 100% reduce 100%
19/02/13 06:44:15 INFO mapreduce.Job: Job job_1550010526021_0002 completed successfully
19/02/13 06:44:15 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=78
                FILE: Number of bytes written=199957
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=155
                HDFS: Number of bytes written=48
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=4333
                Total time spent by all reduces in occupied slots (ms)=4422
                Total time spent by all map tasks (ms)=4333
                Total time spent by all reduce tasks (ms)=4422
                Total vcore-seconds taken by all map tasks=4333
                Total vcore-seconds taken by all reduce tasks=4422
                Total megabyte-seconds taken by all map tasks=4436992
                Total megabyte-seconds taken by all reduce tasks=4528128
        Map-Reduce Framework
                Map input records=4
                Map output records=10
                Map output bytes=101
                Map output materialized bytes=78
                Input split bytes=94
                Combine input records=10
                Combine output records=6
                Reduce input groups=6
                Reduce shuffle bytes=78
                Reduce input records=6
                Reduce output records=6
                Spilled Records=12
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=152
                CPU time spent (ms)=1540
                Physical memory (bytes) snapshot=298127360
                Virtual memory (bytes) snapshot=4154863616
                Total committed heap usage (bytes)=138915840
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=61
        File Output Format Counters 
                Bytes Written=48

查看结果

[hadoop@master hadoop]$ hdfs dfs -ls /output2                                                               
19/02/14 13:47:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2019-02-13 06:36 /output2/_SUCCESS
-rw-r--r--   2 hadoop supergroup         48 2019-02-13 06:36 /output2/part-r-00000

[hadoop@master hadoop]$ hdfs dfs -cat /output2/part-r-00000                                                  
19/02/14 13:47:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop  3
hbase   1
hive    2
spark   2
sqoop   1
storm   1

至此,结束。

你可能感兴趣的:(大数据,hadoop,hadoop集群,hadoop,ha,hadoop完全分布式,高可用)