Uber jvm profiler 使用

背景

uber jvm profiler是用于在分布式监控收集jvm 相关指标,如:cpu/memory/io/gc信息等

安装

确保安装了maven和JDK>=8前提下,直接mvn clean package

java application

  • 说明

    直接以java agent的部署就可以使用

  • 使用

    java -javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,brokerList='kafka1:9092',topicPrefix=demo_,tag=tag-demo,metricInterval=5000,sampleInterval=0 -cp target/jvm-profiler-1.0.0.jar 
  • 选项解释

    参数 说明
    reporter reporter类别, 此处直接默认为com.uber.profiling.reporters.KafkaOutputReporter就可以
    brokerList 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则brokerList为kafka列表,以逗号分隔
    topicPrefix 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则topicPrefix为kafka topic的前缀
    tag key为tag的metric,会输出到reporter中
    metricInterval metric report的频率,根据实际情况设置,单位为ms
    sampleInterval jvm堆栈metrics report的频率,根据实际情况设置,单位为ms
  • 结果展示

    {
      "nonHeapMemoryTotalUsed": 11890584.0,
      "bufferPools": [
          {
              "totalCapacity": 0,
              "name": "direct",
              "count": 0,
              "memoryUsed": 0
          },
          {
              "totalCapacity": 0,
              "name": "mapped",
              "count": 0,
              "memoryUsed": 0
          }
      ],
      "heapMemoryTotalUsed": 24330736.0,
      "epochMillis": 1515627003374,
      "nonHeapMemoryCommitted": 13565952.0,
      "heapMemoryCommitted": 257425408.0,
      "memoryPools": [
          {
              "peakUsageMax": 251658240,
              "usageMax": 251658240,
              "peakUsageUsed": 1194496,
              "name": "Code Cache",
              "peakUsageCommitted": 2555904,
              "usageUsed": 1173504,
              "type": "Non-heap memory",
              "usageCommitted": 2555904
          },
          {
              "peakUsageMax": -1,
              "usageMax": -1,
              "peakUsageUsed": 9622920,
              "name": "Metaspace",
              "peakUsageCommitted": 9830400,
              "usageUsed": 9622920,
              "type": "Non-heap memory",
              "usageCommitted": 9830400
          },
          {
              "peakUsageMax": 1073741824,
              "usageMax": 1073741824,
              "peakUsageUsed": 1094160,
              "name": "Compressed Class Space",
              "peakUsageCommitted": 1179648,
              "usageUsed": 1094160,
              "type": "Non-heap memory",
              "usageCommitted": 1179648
          },
          {
              "peakUsageMax": 1409286144,
              "usageMax": 1409286144,
              "peakUsageUsed": 24330736,
              "name": "PS Eden Space",
              "peakUsageCommitted": 67108864,
              "usageUsed": 24330736,
              "type": "Heap memory",
              "usageCommitted": 67108864
          },
          {
              "peakUsageMax": 11010048,
              "usageMax": 11010048,
              "peakUsageUsed": 0,
              "name": "PS Survivor Space",
              "peakUsageCommitted": 11010048,
              "usageUsed": 0,
              "type": "Heap memory",
              "usageCommitted": 11010048
          },
          {
              "peakUsageMax": 2863661056,
              "usageMax": 2863661056,
              "peakUsageUsed": 0,
              "name": "PS Old Gen",
              "peakUsageCommitted": 179306496,
              "usageUsed": 0,
              "type": "Heap memory",
              "usageCommitted": 179306496
          }
      ],
      "processCpuLoad": 0.0008024004394748531,
      "systemCpuLoad": 0.23138430784607697,
      "processCpuTime": 496918000,
      "appId": null,
      "name": "24103@machine01",
      "host": "machine01",
      "processUuid": "3c2ec835-749d-45ea-a7ec-e4b9fe17c23a",
      "tag": "mytag",
      "gc": [
          {
              "collectionTime": 0,
              "name": "PS Scavenge",
              "collectionCount": 0
          },
          {
              "collectionTime": 0,
              "name": "PS MarkSweep",
              "collectionCount": 0
          }
      ]

spark application

  • 说明

    和java应用不同,需要把jvm-profiler.jar分发到各个节点上

  • 使用

         --jars hdfs:///public/libs/jvm-profiler-1.0.0.jar   
         --conf spark.driver.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,brokerList='kafka1:9092',topicPrefix=demo_,tag=tag-demo,metricInterval=5000,sampleInterval=0 
         --conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,brokerList='kafka1:9092',topicPrefix=demo_,tag=tag-demo,metricInterval=5000,sampleInterval=0
  • 选项解释

    参数 说明
    reporter reporter类别, 此处直接默认为com.uber.profiling.reporters.KafkaOutputReporter就可以
    brokerList 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则brokerList为kafka列表,以逗号分隔
    topicPrefix 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则topicPrefix为kafka topic的前缀
    tag key为tag的metric,会输出到reporter中
    metricInterval metric report的频率,根据实际情况设置,单位为ms
    sampleInterval jvm堆栈metrics report的频率,根据实际情况设置,单位为ms
    • 结果展示

        "nonHeapMemoryTotalUsed": 11890584.0,
        "bufferPools": [
            {
                "totalCapacity": 0,
                "name": "direct",
                "count": 0,
                "memoryUsed": 0
            },
            {
                "totalCapacity": 0,
                "name": "mapped",
                "count": 0,
                "memoryUsed": 0
            }
        ],
        "heapMemoryTotalUsed": 24330736.0,
        "epochMillis": 1515627003374,
        "nonHeapMemoryCommitted": 13565952.0,
        "heapMemoryCommitted": 257425408.0,
        "memoryPools": [
            {
                "peakUsageMax": 251658240,
                "usageMax": 251658240,
                "peakUsageUsed": 1194496,
                "name": "Code Cache",
                "peakUsageCommitted": 2555904,
                "usageUsed": 1173504,
                "type": "Non-heap memory",
                "usageCommitted": 2555904
            },
            {
                "peakUsageMax": -1,
                "usageMax": -1,
                "peakUsageUsed": 9622920,
                "name": "Metaspace",
                "peakUsageCommitted": 9830400,
                "usageUsed": 9622920,
                "type": "Non-heap memory",
                "usageCommitted": 9830400
            },
            {
                "peakUsageMax": 1073741824,
                "usageMax": 1073741824,
                "peakUsageUsed": 1094160,
                "name": "Compressed Class Space",
                "peakUsageCommitted": 1179648,
                "usageUsed": 1094160,
                "type": "Non-heap memory",
                "usageCommitted": 1179648
            },
            {
                "peakUsageMax": 1409286144,
                "usageMax": 1409286144,
                "peakUsageUsed": 24330736,
                "name": "PS Eden Space",
                "peakUsageCommitted": 67108864,
                "usageUsed": 24330736,
                "type": "Heap memory",
                "usageCommitted": 67108864
            },
            {
                "peakUsageMax": 11010048,
                "usageMax": 11010048,
                "peakUsageUsed": 0,
                "name": "PS Survivor Space",
                "peakUsageCommitted": 11010048,
                "usageUsed": 0,
                "type": "Heap memory",
                "usageCommitted": 11010048
            },
            {
                "peakUsageMax": 2863661056,
                "usageMax": 2863661056,
                "peakUsageUsed": 0,
                "name": "PS Old Gen",
                "peakUsageCommitted": 179306496,
                "usageUsed": 0,
                "type": "Heap memory",
                "usageCommitted": 179306496
            }
        ],
        "processCpuLoad": 0.0008024004394748531,
        "systemCpuLoad": 0.23138430784607697,
        "processCpuTime": 496918000,
        "appId": null,
        "name": "24103@machine01",
        "host": "machine01",
        "processUuid": "3c2ec835-749d-45ea-a7ec-e4b9fe17c23a",
        "tag": "mytag",
        "gc": [
            {
                "collectionTime": 0,
                "name": "PS Scavenge",
                "collectionCount": 0
            },
            {
                "collectionTime": 0,
                "name": "PS MarkSweep",
                "collectionCount": 0
            }
        ]
      }

分析

  • 已有的reporter
reporter 说明
ConsoleOutputReporter 默认的repoter,一般用于调试
FileOutputReporter 基于文件的reporter,分布式环境下不适用,得设置outputDir
KafkaOutputReporter 基于kafka的reporter,正式环境用的多,得设置brokerList,topicPrefix
GraphiteOutputReporter 基于Graphite的reporter,需设置graphite.host等配置
RedisOutputReporter 基于redis的reporter,构建命令 mvn -P redis clean package
InfluxDBOutputReporter 基于InfluxDB的reporter,构建命令mvn -P influxdb clean package,需设置influxdb.host等配置
建议在生产环境下使用KafkaOutputReporter,操作灵活性高,可以结合clickhouse grafana进行指标展示
  • 源码分析

    该jvm-profiler整体是基于java agent实现,项目pom文件 指定了MANIFEST.MF中的Premain-Class项和Agent-Class为com.uber.profiling.Agent
    具体的实现类为AgentImpl
    就具体的AgentImpl类的run方法来进行分析

    public void run(Arguments arguments, Instrumentation instrumentation, Collection objectsToCloseOnShutdown) {
          if (arguments.isNoop()) {
              logger.info("Agent noop is true, do not run anything");
              return;
          }
          
          Reporter reporter = arguments.getReporter();
    
          String processUuid = UUID.randomUUID().toString();
    
          String appId = null;
          
          String appIdVariable = arguments.getAppIdVariable();
          if (appIdVariable != null && !appIdVariable.isEmpty()) {
              appId = System.getenv(appIdVariable);
          }
          
          if (appId == null || appId.isEmpty()) {
              appId = SparkUtils.probeAppId(arguments.getAppIdRegex());
          }
    
          if (!arguments.getDurationProfiling().isEmpty()
                  || !arguments.getArgumentProfiling().isEmpty()) {
              instrumentation.addTransformer(new JavaAgentFileTransformer(arguments.getDurationProfiling(), arguments.getArgumentProfiling()));
          }
    
          List profilers = createProfilers(reporter, arguments, processUuid, appId);
          
          ProfilerGroup profilerGroup = startProfilers(profilers);
    
          Thread shutdownHook = new Thread(new ShutdownHookRunner(profilerGroup.getPeriodicProfilers(), Arrays.asList(reporter), objectsToCloseOnShutdown));
          Runtime.getRuntime().addShutdownHook(shutdownHook);
      }

1.其中CpuAndMemoryProfilerThreadInfoProfilerProcessInfoProfiler是从JMX中读取数据,ProcessInfoProfiler还会从 /pro读取数据;
2.如果设置了durationProfiling,argumentProfiling,sampleInterval,ioProfiling,则会增加对应的MethodDurationProfiler(输出方法调用花费的时间),MethodArgumentProfiler(输出方法参数的值),StacktraceReporterProfiler,IOProfiler;
3.MethodArgumentProfilerMethodDurationProfiler利用javassist第三方字节码编译工具来改写对应的类,具体实现参照JavaAgentFileTransformer
4.StacktraceReporterProfiler从JMX中读取数据
5.IOProfiler则是读取本地机器上的/pro文件对应的目录的数据

其中还会区分oneTimeProfilers和periodicProfilers,ProcessInfoProfiler就属于oneTimeProfilers,因为process的信息,在运行期间是不会变的,不需要周期行的reporter
至此,整个流程结束

本文由博客群发一文多发等运营工具平台 OpenWrite 发布

你可能感兴趣的:(spark)