大数据平台监控指标整理

hadoop metrics2

监控的内容:
1. yarn
2. jvm
3. rpc
4. rpcdetailed
5. metricssystem
6. mapred
7. dfs
8. ugi

已经提供的:

Source : org.apache.hadoop.metrics2.source.JvmMerticsorg.apache.hadoop.metrics2.source.JvmMetricsInfo

其他相关

FSOpDurations : org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSOpDurations

fairscheduler-op-durations context :

  @Metric("Duration for a continuous scheduling run")  MutableRate continuousSchedulingRun;
  @Metric("Duration to handle a node update")  MutableRate nodeUpdateCall;
  @Metric("Duration for a update thread run")  MutableRate updateThreadRun;
  @Metric("Duration for an update call")  MutableRate updateCall;
  @Metric("Duration for a preempt call") MutableRate preemptCall;

QueueMetrics : org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics

yarn context :

  @Metric("# of apps submitted") MutableCounterInt appsSubmitted;
  @Metric("# of running apps") MutableGaugeInt appsRunning;
  @Metric("# of pending apps") MutableGaugeInt appsPending;
  @Metric("# of apps completed") MutableCounterInt appsCompleted;
  @Metric("# of apps killed") MutableCounterInt appsKilled;
  @Metric("# of apps failed") MutableCounterInt appsFailed;
  @Metric("Allocated memory in MB") MutableGaugeInt allocatedMB;
  @Metric("Allocated CPU in virtual cores") MutableGaugeInt allocatedVCores;
  @Metric("# of allocated containers") MutableGaugeInt allocatedContainers;
  @Metric("Aggregate # of allocated containers") MutableCounterLong aggregateContainersAllocated;
  @Metric("Aggregate # of released containers") MutableCounterLong aggregateContainersReleased;
  @Metric("Available memory in MB") MutableGaugeInt availableMB;
  @Metric("Available CPU in virtual cores") MutableGaugeInt availableVCores;
  @Metric("Pending memory allocation in MB") MutableGaugeInt pendingMB;
  @Metric("Pending CPU allocation in virtual cores") MutableGaugeInt pendingVCores;
  @Metric("# of pending containers") MutableGaugeInt pendingContainers;
  @Metric("# of reserved memory in MB") MutableGaugeInt reservedMB;
  @Metric("Reserved CPU in virtual cores") MutableGaugeInt reservedVCores;
  @Metric("# of reserved containers") MutableGaugeInt reservedContainers;
  @Metric("# of active users") MutableGaugeInt activeUsers;
  @Metric("# of active applications") MutableGaugeInt activeApplications;

FSQueueMetrics : org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics

yarn context :

  @Metric("Fair share of memory in MB") MutableGaugeInt fairShareMB;
  @Metric("Fair share of CPU in vcores") MutableGaugeInt fairShareVCores;
  @Metric("Steady fair share of memory in MB") MutableGaugeInt steadyFairShareMB;
  @Metric("Steady fair share of CPU in vcores") MutableGaugeInt steadyFairShareVCores;
  @Metric("Minimum share of memory in MB") MutableGaugeInt minShareMB;
  @Metric("Minimum share of CPU in vcores") MutableGaugeInt minShareVCores;
  @Metric("Maximum share of memory in MB") MutableGaugeInt maxShareMB;
  @Metric("Maximum share of CPU in vcores") MutableGaugeInt maxShareVCores;

MetricsSystemImpl : org.apache.hadoop.metrics2.impl.MetricsSystemImpl

metricssystem context :

  @Metric({"Snapshot", "Snapshot stats"}) MutableStat snapshotStat;
  @Metric({"Publish", "Publishing stats"}) MutableStat publishStat;
  @Metric("Dropped updates by all sinks") MutableCounterLong droppedPubAll;

Sink : org.apache.hadoop.metrics2.sink.GraphiteSinkorg.apache.hadoop.metrics2.sink.FileSink 以及org.apache.hadoop.metrics2.sink.AbstractGangliaSink

metricsSystem : org.apache.hadoop.metrics2.lib.DefaultMetricsSystem

自我实现:

  1. Source : org.apache.hadoop.metrics2.MetricsSource
  2. Sink : org.apache.hadoop.metrics2.MetricsSink
  3. MetricsSystem : org.apache.hadoop.metrics2.MetricsSystem

使用方式:

$HADOOP_HOME/etc/hadoop/hadoop-metrics2.properties中配置就可以

spark metrics

可以获取到的内容主要有:
1. master
2. applications
3. worker
4. executor
5. driver

实现的Sink有

ConsoleSink : org.apache.spark.metrics.sink.ConsoleSink

CSVSink : org.apache.spark.metrics.sink.CSVSink

JmxSink : org.apache.spark.metrics.sink.JmxSink

MetricsServlet : org.apache.spark.metrics.sink.MetricsServlet

GraphiteSink : org.apache.spark.metrics.sink.GraphiteSink

Slf4jSink : org.apache.spark.metrics.sink.Slf4jSink

实现的Source有

JvmSource : org.apache.spark.metrics.source.JvmSource

ApplicationSource : org.apache.spark.deploy.master.ApplicationSource

  1. status
  2. runtime_ms
  3. cores

BlockManagerSource : org.apache.spark.storage.BlockManagerSource

  1. maxMem_MB
  2. remainingMem_MB
  3. memUsed_MB
  4. diskSpaceUsed_MB

DAGSchedulerSource : org.apache.spark.scheduler.DAGSchedulerSource

  1. failedStages
  2. runningStages
  3. waitingStages
  4. allJobs
  5. activeJobs

ExecutorAllocationManagerSource : package org.apache.spark.ExecutorAllocationManagerSource

  1. numberExecutorsToAdd
  2. numberExecutorsPendingToRemove
  3. numberAllExecutors
  4. numberTargetExecutors
  5. numberMaxNeededExecutors

ExecutorSource : org.apache.spark.executor.ExecutorSource

  1. activeTasks
  2. completeTasks
  3. currentPool_size
  4. maxPool_size
  5. read_bytes
  6. write_bytes
  7. read_ops
  8. largeRead_ops
  9. write_ops

MasterSource : org.apache.spark.deploy.master.MasterSource

  1. workers
  2. aliveWorkers
  3. apps
  4. waitingApps

MesosClusterSchedulerSource : org.apache.spark.scheduler.cluster.mesos.MesosClusterSchedulerSource

  1. waitingDrivers
  2. launchedDrivers
  3. retryDrivers

WorkerSource : org.apache.spark.deploy.worker.WorkerSource

  1. executors
  2. coresUsed
  3. memUsed_MB
  4. coresFree
  5. memFree_MB

你可能感兴趣的:(linux,spark,hdfs)