分布式资源调度框架 ——YARN

1 YARN 产生背景

  • MapReduce1.x 存在的问题:单点故障和 节点压力大不易扩展;
  • Hadoop1.x 时,MapReduce -> Master/Slave 架构,1个 JobTracker 带多个 TaskTracker
  • JobTracker : 负责资源管理和作业调度
  • TaskTracker: 定期向 JT 汇报 本节点的健康状况、资源使用情况、作业执行情况;接受来自JT 的命令——启动任务
    分布式资源调度框架 ——YARN_第1张图片
    分布式资源调度框架 ——YARN_第2张图片
  • YARN:不同计算框架可以共享同一个 HDFS 集群上的数据,享受整体的资源调度

2 YARN 的架构

http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0/hadoop-yarn/hadoop-yarn-site/YARN.html

  • ResourceManager:RM,整个集群同一时间提供服务的RM只有一个,负责集群资源的统一管理和调度,处理客户端的请求——提交一个作业,杀死一个作业;监控NM,一旦某个NM挂了,那么该 NM 上运行的任务需要告诉 AM;
  • NodeManager:NM,整个集群有多个,负责本节点资源管理和使用,定时向 RM 汇报本节点的资源使用情况;接收并处理来自 RM 的各种命令:启动 Container; 处理来自 AM 的命令;单个节点的资源管理
  • **ApplicationMaster **: AM,负责应用程序的管理,每个应用程序对应一个:MR,Spark;为应用程序向 RM 申请资源(core,memory),分配给内部的 task;需要与 NM 通信:启动/停止 task,task 是运行在 container 里面, AM也是运行在 container里面;
  • Container:封装了CPU,Memory 等资源的一个容器,是一个任务运行环境的抽象
  • Client:提交作业,查看进度
    分布式资源调度框架 ——YARN_第3张图片

3 YARN 环境搭建

3.1 mapred-site.xml

<property>
        <name>mapreduce.framework.namename>
        <value>yarnvalue>
property>

3.2 yarn-site.xml

<property>
        <name>yarn.nodemanager.aux-servicesname>
        <value>mapreduce_shufflevalue>
 property>

3.3 启动 YARN

[hadoop@node1 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-node1.out
node1: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-node1.out

浏览器访问 http://node1:8088
分布式资源调度框架 ——YARN_第4张图片

4 提交 MapReduce 作业到 YARN

自带案例 /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce2

hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar
[hadoop@node1 mapreduce2]$ hadoop jar 
RunJar jarFile [mainClass] args...

[hadoop@node1 mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[hadoop@node1 mapreduce2]$ 
[hadoop@node1 mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi
Usage: org.apache.hadoop.examples.QuasiMonteCarlo  
Generic options supported are
-conf      specify an application configuration file
-D             use value for given property
-fs       specify a namenode
-jt     specify a ResourceManager
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

[hadoop@node1 mapreduce2]$ 

[hadoop@node1 mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
Number of Maps  = 2
Samples per Map = 3
18/10/29 22:19:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
18/10/29 22:19:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/29 22:19:03 INFO input.FileInputFormat: Total input paths to process : 2
18/10/29 22:19:04 INFO mapreduce.JobSubmitter: number of splits:2
18/10/29 22:19:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540822729980_0001
18/10/29 22:19:04 INFO impl.YarnClientImpl: Submitted application application_1540822729980_0001
18/10/29 22:19:04 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1540822729980_0001/
18/10/29 22:19:04 INFO mapreduce.Job: Running job: job_1540822729980_0001
18/10/29 22:19:16 INFO mapreduce.Job: Job job_1540822729980_0001 running in uber mode : false
18/10/29 22:19:16 INFO mapreduce.Job:  map 0% reduce 0%
18/10/29 22:19:26 INFO mapreduce.Job:  map 50% reduce 0%
18/10/29 22:19:27 INFO mapreduce.Job:  map 100% reduce 0%
18/10/29 22:19:32 INFO mapreduce.Job:  map 100% reduce 100%
18/10/29 22:19:33 INFO mapreduce.Job: Job job_1540822729980_0001 completed successfully
18/10/29 22:19:33 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=335472
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=522
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=15859
		Total time spent by all reduces in occupied slots (ms)=4321
		Total time spent by all map tasks (ms)=15859
		Total time spent by all reduce tasks (ms)=4321
		Total vcore-seconds taken by all map tasks=15859
		Total vcore-seconds taken by all reduce tasks=4321
		Total megabyte-seconds taken by all map tasks=16239616
		Total megabyte-seconds taken by all reduce tasks=4424704
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=286
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=245
		CPU time spent (ms)=1260
		Physical memory (bytes) snapshot=458809344
		Virtual memory (bytes) snapshot=8175378432
		Total committed heap usage (bytes)=262033408
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=236
	File Output Format Counters 
		Bytes Written=97
Job Finished in 30.938 seconds
Estimated value of Pi is 4.00000000000000000000
[hadoop@node1 mapreduce2]$ 

你可能感兴趣的:(#,快速入门大数据)