背景
uber jvm profiler是用于在分布式监控收集jvm 相关指标,如:cpu/memory/io/gc信息等
安装
确保安装了maven和JDK>=8前提下,直接mvn clean package
java application
- 说明
直接以java agent的部署就可以使用
使用
java -javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,brokerList='kafka1:9092',topicPrefix=demo_,tag=tag-demo,metricInterval=5000,sampleInterval=0 -cp target/jvm-profiler-1.0.0.jar
选项解释
参数 说明 reporter reporter类别, 此处直接默认为com.uber.profiling.reporters.KafkaOutputReporter就可以 brokerList 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则brokerList为kafka列表,以逗号分隔 topicPrefix 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则topicPrefix为kafka topic的前缀 tag key为tag的metric,会输出到reporter中 metricInterval metric report的频率,根据实际情况设置,单位为ms sampleInterval jvm堆栈metrics report的频率,根据实际情况设置,单位为ms 结果展示
{ "nonHeapMemoryTotalUsed": 11890584.0, "bufferPools": [ { "totalCapacity": 0, "name": "direct", "count": 0, "memoryUsed": 0 }, { "totalCapacity": 0, "name": "mapped", "count": 0, "memoryUsed": 0 } ], "heapMemoryTotalUsed": 24330736.0, "epochMillis": 1515627003374, "nonHeapMemoryCommitted": 13565952.0, "heapMemoryCommitted": 257425408.0, "memoryPools": [ { "peakUsageMax": 251658240, "usageMax": 251658240, "peakUsageUsed": 1194496, "name": "Code Cache", "peakUsageCommitted": 2555904, "usageUsed": 1173504, "type": "Non-heap memory", "usageCommitted": 2555904 }, { "peakUsageMax": -1, "usageMax": -1, "peakUsageUsed": 9622920, "name": "Metaspace", "peakUsageCommitted": 9830400, "usageUsed": 9622920, "type": "Non-heap memory", "usageCommitted": 9830400 }, { "peakUsageMax": 1073741824, "usageMax": 1073741824, "peakUsageUsed": 1094160, "name": "Compressed Class Space", "peakUsageCommitted": 1179648, "usageUsed": 1094160, "type": "Non-heap memory", "usageCommitted": 1179648 }, { "peakUsageMax": 1409286144, "usageMax": 1409286144, "peakUsageUsed": 24330736, "name": "PS Eden Space", "peakUsageCommitted": 67108864, "usageUsed": 24330736, "type": "Heap memory", "usageCommitted": 67108864 }, { "peakUsageMax": 11010048, "usageMax": 11010048, "peakUsageUsed": 0, "name": "PS Survivor Space", "peakUsageCommitted": 11010048, "usageUsed": 0, "type": "Heap memory", "usageCommitted": 11010048 }, { "peakUsageMax": 2863661056, "usageMax": 2863661056, "peakUsageUsed": 0, "name": "PS Old Gen", "peakUsageCommitted": 179306496, "usageUsed": 0, "type": "Heap memory", "usageCommitted": 179306496 } ], "processCpuLoad": 0.0008024004394748531, "systemCpuLoad": 0.23138430784607697, "processCpuTime": 496918000, "appId": null, "name": "24103@machine01", "host": "machine01", "processUuid": "3c2ec835-749d-45ea-a7ec-e4b9fe17c23a", "tag": "mytag", "gc": [ { "collectionTime": 0, "name": "PS Scavenge", "collectionCount": 0 }, { "collectionTime": 0, "name": "PS MarkSweep", "collectionCount": 0 } ]
spark application
- 说明
和java应用不同,需要把jvm-profiler.jar分发到各个节点上
使用
--jars hdfs:///public/libs/jvm-profiler-1.0.0.jar --conf spark.driver.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,brokerList='kafka1:9092',topicPrefix=demo_,tag=tag-demo,metricInterval=5000,sampleInterval=0 --conf spark.executor.extraJavaOptions=-javaagent:jvm-profiler-1.0.0.jar=reporter=com.uber.profiling.reporters.KafkaOutputReporter,brokerList='kafka1:9092',topicPrefix=demo_,tag=tag-demo,metricInterval=5000,sampleInterval=0
选项解释
参数 说明 reporter reporter类别, 此处直接默认为com.uber.profiling.reporters.KafkaOutputReporter就可以 brokerList 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则brokerList为kafka列表,以逗号分隔 topicPrefix 如reporter为com.uber.profiling.reporters.KafkaOutputReporter,则topicPrefix为kafka topic的前缀 tag key为tag的metric,会输出到reporter中 metricInterval metric report的频率,根据实际情况设置,单位为ms sampleInterval jvm堆栈metrics report的频率,根据实际情况设置,单位为ms 结果展示
"nonHeapMemoryTotalUsed": 11890584.0, "bufferPools": [ { "totalCapacity": 0, "name": "direct", "count": 0, "memoryUsed": 0 }, { "totalCapacity": 0, "name": "mapped", "count": 0, "memoryUsed": 0 } ], "heapMemoryTotalUsed": 24330736.0, "epochMillis": 1515627003374, "nonHeapMemoryCommitted": 13565952.0, "heapMemoryCommitted": 257425408.0, "memoryPools": [ { "peakUsageMax": 251658240, "usageMax": 251658240, "peakUsageUsed": 1194496, "name": "Code Cache", "peakUsageCommitted": 2555904, "usageUsed": 1173504, "type": "Non-heap memory", "usageCommitted": 2555904 }, { "peakUsageMax": -1, "usageMax": -1, "peakUsageUsed": 9622920, "name": "Metaspace", "peakUsageCommitted": 9830400, "usageUsed": 9622920, "type": "Non-heap memory", "usageCommitted": 9830400 }, { "peakUsageMax": 1073741824, "usageMax": 1073741824, "peakUsageUsed": 1094160, "name": "Compressed Class Space", "peakUsageCommitted": 1179648, "usageUsed": 1094160, "type": "Non-heap memory", "usageCommitted": 1179648 }, { "peakUsageMax": 1409286144, "usageMax": 1409286144, "peakUsageUsed": 24330736, "name": "PS Eden Space", "peakUsageCommitted": 67108864, "usageUsed": 24330736, "type": "Heap memory", "usageCommitted": 67108864 }, { "peakUsageMax": 11010048, "usageMax": 11010048, "peakUsageUsed": 0, "name": "PS Survivor Space", "peakUsageCommitted": 11010048, "usageUsed": 0, "type": "Heap memory", "usageCommitted": 11010048 }, { "peakUsageMax": 2863661056, "usageMax": 2863661056, "peakUsageUsed": 0, "name": "PS Old Gen", "peakUsageCommitted": 179306496, "usageUsed": 0, "type": "Heap memory", "usageCommitted": 179306496 } ], "processCpuLoad": 0.0008024004394748531, "systemCpuLoad": 0.23138430784607697, "processCpuTime": 496918000, "appId": null, "name": "24103@machine01", "host": "machine01", "processUuid": "3c2ec835-749d-45ea-a7ec-e4b9fe17c23a", "tag": "mytag", "gc": [ { "collectionTime": 0, "name": "PS Scavenge", "collectionCount": 0 }, { "collectionTime": 0, "name": "PS MarkSweep", "collectionCount": 0 } ] }
分析
- 已有的reporter
reporter | 说明 |
---|---|
ConsoleOutputReporter | 默认的repoter,一般用于调试 |
FileOutputReporter | 基于文件的reporter,分布式环境下不适用,得设置outputDir |
KafkaOutputReporter | 基于kafka的reporter,正式环境用的多,得设置brokerList,topicPrefix |
GraphiteOutputReporter | 基于Graphite的reporter,需设置graphite.host等配置 |
RedisOutputReporter | 基于redis的reporter,构建命令 mvn -P redis clean package |
InfluxDBOutputReporter | 基于InfluxDB的reporter,构建命令mvn -P influxdb clean package ,需设置influxdb.host等配置 |
建议在生产环境下使用KafkaOutputReporter,操作灵活性高,可以结合clickhouse grafana进行指标展示
源码分析
该jvm-profiler整体是基于java agent实现,项目pom文件 指定了MANIFEST.MF中的Premain-Class项和Agent-Class为com.uber.profiling.Agent
具体的实现类为AgentImpl
就具体的AgentImpl类的run方法来进行分析public void run(Arguments arguments, Instrumentation instrumentation, Collection
objectsToCloseOnShutdown) { if (arguments.isNoop()) { logger.info("Agent noop is true, do not run anything"); return; } Reporter reporter = arguments.getReporter(); String processUuid = UUID.randomUUID().toString(); String appId = null; String appIdVariable = arguments.getAppIdVariable(); if (appIdVariable != null && !appIdVariable.isEmpty()) { appId = System.getenv(appIdVariable); } if (appId == null || appId.isEmpty()) { appId = SparkUtils.probeAppId(arguments.getAppIdRegex()); } if (!arguments.getDurationProfiling().isEmpty() || !arguments.getArgumentProfiling().isEmpty()) { instrumentation.addTransformer(new JavaAgentFileTransformer(arguments.getDurationProfiling(), arguments.getArgumentProfiling())); } List profilers = createProfilers(reporter, arguments, processUuid, appId); ProfilerGroup profilerGroup = startProfilers(profilers); Thread shutdownHook = new Thread(new ShutdownHookRunner(profilerGroup.getPeriodicProfilers(), Arrays.asList(reporter), objectsToCloseOnShutdown)); Runtime.getRuntime().addShutdownHook(shutdownHook); } - arguments.getReporter()) 获取reporter,如果没有设置则设置为reporterConstructor,否则设置为指定的reporter
- String appId ,设置appId,首先从配置中查找,如果没有设置,再从env中查找,对于spark应用则取spark.app.id的值
- List
profilers = createProfilers(reporter, arguments, processUuid, appId) ,创建profilers,默认有CpuAndMemoryProfiler,ThreadInfoProfiler,ProcessInfoProfiler ;
1.其中CpuAndMemoryProfiler,ThreadInfoProfiler,ProcessInfoProfiler是从JMX中读取数据,ProcessInfoProfiler还会从 /pro读取数据;
2.如果设置了durationProfiling,argumentProfiling,sampleInterval,ioProfiling,则会增加对应的MethodDurationProfiler(输出方法调用花费的时间),MethodArgumentProfiler(输出方法参数的值),StacktraceReporterProfiler,IOProfiler;
3.MethodArgumentProfiler和MethodDurationProfiler利用javassist第三方字节码编译工具来改写对应的类,具体实现参照JavaAgentFileTransformer
4.StacktraceReporterProfiler从JMX中读取数据
5.IOProfiler则是读取本地机器上的/pro文件对应的目录的数据
- ProfilerGroup profilerGroup = startProfilers(profilers) 开始进行profiler的定时report
其中还会区分oneTimeProfilers和periodicProfilers,ProcessInfoProfiler就属于oneTimeProfilers,因为process的信息,在运行期间是不会变的,不需要周期行的reporter
至此,整个流程结束
本文由博客群发一文多发等运营工具平台 OpenWrite 发布