YARN学习总结-第十节-YARN-Timeline-Server-V.2

YARN-Timeline-Server-V.2

V.2自从V.1和V.1.5之后,有了很大改进。

1.伸缩性,V.2将读和写分开,并且支持分布式架构,后端存储使用HBase。

2.可用性提升。

架构

YARN Timeline Service v.2 使用一系列collector(writers)去写数据到后端存储。collectors,AM会把跟应用相关的数据发送到timeline collectors。

对于一个给定的应用,AM可以把跟应用相关的数据写到co-located timeline collectors(是一个NM的辅助服务)。运行任务容器的节点也会把相关数据发送到timeline collectors。

同时,资源管理器也维护自己的timeline collector,并写到后端存储。

timeline readers 是跟timeline collector分离的守护进程。它主要提供查询REST API。

YARN学习总结-第十节-YARN-Timeline-Server-V.2_第1张图片

图片来自互联网,链接:http://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html

当前状态和未来计划

当开启此功能后,YARN-generic时间被发送到后端存储,包括YARN system metrics,此外,一些跟应用相关的包括分布式shell和MapReduce可以将框架数据也写到YARN Timeline Service v.2.

当前不支持命令行访问,仅支持REST。

collectors目前嵌入到node managers作为一个辅助服务。

在alpha2中,Timeline Service v.2支持简单认证,可以设置白名单。

部署

基本配置

yarn.timeline-service.enabled

yarn.timeline-service.version:默认1.0

yarn.timeline-service.writer.class:默认为HBase

yarn.timeline-service.reader.class:默认为HBase

yarn.system-metrics-publisher.enabled:默认为false

yarn.timeline-service.schema.prefix:默认为prod

高级配置

yarn.timeline-service.hostname

yarn.timeline-service.reader.webapp.address

yarn.timeline-service.reader.webapp.https.address

yarn.timeline-service.reader.bind-host

yarn.timeline-service.hbase.configuration.file:默认为null

yarn.timeline-service.writer.flush-interval-seconds:默认60s

yarn.timeline-service.app-collector.linger-period.ms:默认60s

yarn.timeline-service.timeline-client.number-of-async-entities-to-merge:默认为10

yarn.timeline-service.hbase.coprocesser.app-final-value-retention-milliseconds:默认为3day

yarn.rm.system-metrics-publisher.emit-container-events:默认为false

安全配置

可以通过yarn.timeline-service.http-authentication.type设置为kerberos去开启认证。然后配置下面的属性

yarn.timeline-service.http-authentication.type

yarn.timeline-service.http-authentication.simple.anonymous.allowed

yarn.timeline-service.http-authentication.kerberos.principal

yarn.timeline-service.http-authentication.kerberos.keytab

yarn.timeline-service.principal

yarn.timeline-service.keytab

yarn.timeline-service.delegation.key.update-interval

yarn.timeline-service.delegation.token.renew-interval:

yarn.timeline-service.delegation.token.max-lifetime

yarn.timeline-service.read.authentication.enabled

yarn.timeline-service.read.allowed.users:默认值为none

开启CORS支持

yarn-site.xml,yarn.timeline-service.http-cross-origin.enabled=true

core-site.xml,hadoop.http.filter.initializers=org.apache.hadoop.security.HttpCrossOriginFilterInitializer

如果yarn.timeline-service.http-cross-origin.enabled, if set to true, overrides hadoop.http.cross-origin.enabled

开启Timeline Service v.2

准备后端存储

Step 1)设置HBase集群

支持的HBase集群版本为1.2.6,1.0.x不工作。编辑hbase-site.xml设置hbase.rootdir


  
    hbase.rootdir
    hdfs://namenode.example.org:8020/hbase
  
  
    hbase.cluster.distributed
    false
  

Step 2)开启协处理器

在这个版本中,协处理器是动态加载的。

复制timeline service jar 到HDFS使得HBase可以加载到。需要创建flowrun表,默认的HDFS位置是/hbase/coprocessor

hadoop fs -mkdir /hbase/coprocessor

hadoop fs -put hadoop-yarn-server-timelineservice-hbase-3.0.0-alphal-SNAPSHOT.jar /hbase/coprocessor/hadoop-yarn-server-timelineservice.jar

如果需要把jar放置到不同目录,需要配置如下


  yarn.timeline-service.hbase.coprocessor.jar.hdfs.location
  /custom/hdfs/path/jarName

Step 3)创建schema

/bin/hadoop org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -create

可以指定一些可选参数,-skipExistingTable(-s for short)默认的表有个前缀,'prod'。

开启Timeline Service v.2


  yarn.timeline-service.version
  2.0f



  yarn.timeline-service.enabled
  true



  yarn.nodemanager.aux-services
  mapreduce_shuffle,timeline_collector



  yarn.nodemanager.aux-services.timeline_collector.class
  org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService



  The setting that controls whether yarn system metrics is
  published on the Timeline service or not by RM And NM.
  yarn.system-metrics-publisher.enabled
  true



  The setting that controls whether yarn container events are
  published to the timeline service or not by RM. This configuration setting
  is for ATS V2.
  yarn.rm.system-metrics-publisher.emit-container-events
  true

可以设置yarn集群的名字,用于在HBase中存放多个yarn集群的场景。


  yarn.resourcemanager.cluster-id
  my_research_test_cluster

同时,添加hbase-site.xml配置文件到客户端hadoop集群。


   Optional URL to an hbase-site.xml configuration file to be
  used to connect to the timeline-service hbase cluster. If empty or not
  specified, then the HBase configuration will be loaded from the classpath.
  When specified the values in the specified configuration file will override
  those from the ones that are present on the classpath.
  
  yarn.timeline-service.hbase.configuration.file
  file:/etc/hbase/hbase-ats-dc1/hbase-site.xml

运行Timeline Service v.2

$ yarn-daemon.sh start timelinereader

允许MapReduce写到Timeline Service v.2


  mapreduce.job.emit-timeline-data
  true

从alpha1更新到alpha2

清理表数据,truncate tables

协处理器是动态加载,删除表后,替换协处理器jar,重启Region Server 重新创建flowrun表

推送数据到Timeline Server v.2

TimelineEntity拥有下列字段:

events,configs,metrics,info,isrelatedtoEntities,relatestoEntities。

Timeline Server 支持两种应用级别的聚合,TimelineMetricOperation: MAX|SUM

Timeline Service v.2 REST API

API实现如下:/ws/v2/timeline/

GET /ws/v2/timeline/

GET /ws/v2/timeline/clusters/{cluster name}/flows/ or GET /ws/v2/timeline/flows

支持以下查询参数:

1.limit:定义返回多少个flows,如果不指定,或者值小于0,则认为是100

2.daterange:格式为:"[startdate]-[enddate]"

3.fromid:如果指定了,则从fromid开始的flows都会返回,而且包括fromid。

GET /ws/v2/timeline/clusters/{cluster name}/users/{user name}/flows/{flow name}/runs/ 

or

GET /ws/v2/timeline/users/{user name}/flows/{flow name}/runs/

支持以下查询参数:

1.limit

2.createdtimestart

3.createdtimeend

4.metricstoretrieve

5.fields

6.fromid

GET /ws/v2/timeline/clusters/{cluster name}/users/{user name}/flows/{flow name}/runs/{run id}

or

GET /ws/v2/timeline/users/{user name]/flows/{flow name}/runs/{run id}

支持以下查询参数:

1.metricstoretrieve

GET /ws/v2/timeline/cluster/{cluster name}/users/{user name}/flows/{flow name}/apps

or

GET /ws/v2/timeline/users/{user name}/flows/{flow name}/apps

支持以下查询参数:

1.limit

2.createtimestart

3.createtimeend

4.relatesto

5.isrelatedto

6.infofilters

7.conffilters

8.metricfilters

9.eventfilters

10.metricstoretrieve

11.confstoretrieve

12.fields

13.metricslimit

14.metricstimestart

15.metricstimeend

16.fromid

GET /ws/v2/timeline/clusters/{cluster name}/users/{user name}/flows/{flow name}/runs/{run id}/apps

or

GET /ws/v2/timeline/users/{user name}/flows/{flow name}/runs/{run id}/apps

支持如下查询参数:

1.limit

2.createdtimestart

3.createdtimeend

4.relatesto

5.isrelatedto

6.infofilters

7.conffilters

8.metricfilters

9.eventfilters

10.metricstoretrieve

11.confstoretrieve

12.fields

13.metricslimit

14.metricstimestart

15.metricstimeend

16.fromid

GET /ws/v2/timeline/clusters/{cluster name}/apps/{app id}

or

GET /ws/v2/timeline/apps/{app id}

支持以下查询参数:

1.userid

2.flowname

3.flowrunid

4.metricstoretrieve

5.confstoretrieve

6.fields

7.metricslimit

8.metricstimestart

9.metricstimeend

GET /ws/v2/timeline/clusters/{cluster name}/apps/{app id}/entities/{entity type}

or

GET /ws/v2/timeline/apps/{app id}/entities/{entity type}

支持以下查询参数:

1.userid

2.flowname

3.flowrunid

4.limit

5.createtimestart

6.createtimeend

7.relatesto

8.isrelatedto

9.infofilters

10.conffilters

11.metricfilters

12.eventfilters

13.metricstoretrieve

14.confstoretrieve

15.fields

16.metricslimit

17.metricstimestart

18.metricstimeend

19.fromid

GET /ws/v2/timeline/clusters/{cluster name}/users/{userid}/entities/{entitytype}

or

GET /ws/v2/timeline/users/{user id}/entities/{entitytype}

支持以下查询参数:

1.limit

2.createdtimestart

3.createdtimeend

4.relatesto

5.isrelatedto

6.infofilters

7.conffilters

8.metricfilters

9.eventfilters

10.metricstoretrieve

11.confstoretrieve

12.fields

13.metricslimit

14.metricstimestart

15.metricstimeend

16.fromid

GET /ws/v2/timeline/clusters/{cluster name}/apps/{app id}/entities/{entity type}/{entity id}

or

GET /ws/v2/timeline/apps/{app id}/entities/{entity type}/{entity id}

支持以下查询参数:

1.userid

2.flowname

3.flowrunid

4.metricstoretrieve

5.confstoretrieve

6.fields

7.metricslimit

8.metricstimestart

9.metricstimeend

10.entityidprefix

GET /ws/v2/timeline/clusters/{cluster name}/users/{user id}/entities/{entitytype}/{entityid}

or

GET /ws/v2/timeline/users/{user id}/entities/{entitytype}/{entityid}

支持以下查询参数:

1.metricstoretrieve

2.confstoretrieve

3.fields

4.metricslimit

5.metricstimestart

6.metricstimeend

7.fromid

GET /ws/v2/timeline/apps/{app id}/entity-types

or

GET /ws/v2/timeline/clusters/{cluster id}/apps/{appid}/entity-types

支持以下查询参数:

1.userid

2.flowname

3.flowrunid

 

你可能感兴趣的:(大数据)