在SparkEnv中,度量系统是必不可少的一个子组件,其创建代码如下:
//org.apache.spark.SparkEnv
val metricsSystem = if (isDriver) {
MetricsSystem.createMetricsSystem("driver", conf, securityManager)
} else {
conf.set("spark.executor.id", executorId)
val ms = MetricsSystem.createMetricsSystem("executor", conf, securityManager)
ms.start()
ms
}
根据代码可知创建度量系统时,将根据当前实例是Driver还是Executor:
MetricsSystem拥有的成员属性如下:
//org.apache.spark.metrics.MetricsSystem
private[this] val metricsConfig = new MetricsConfig(conf)
private val sinks = new mutable.ArrayBuffer[Sink]
private val sources = new mutable.ArrayBuffer[Source]
private val registry = new MetricRegistry()
private var running: Boolean = false
private var metricsServlet: Option[MetricsServlet] = None
MetricsConfig保存了MetricsSystem的配置信息,有以下两个成员属性:
//org.apache.spark.metrics.MetricsConfig
//度量的属性信息
private[metrics] val properties = new Properties()
//每个实例的子属性
private[metrics] var propertyCategories: mutable.HashMap[String, Properties] = null
1.1 设置默认属性
setDefaultProperties方法用于给MetricsConfig的properties中添加默认的度量属性
//org.apache.spark.metrics.MetricsConfig
private def setDefaultProperties(prop: Properties) {
prop.setProperty("*.sink.servlet.class", "org.apache.spark.metrics.sink.MetricsServlet")
prop.setProperty("*.sink.servlet.path", "/metrics/json")
prop.setProperty("master.sink.servlet.path", "/metrics/master/json")
prop.setProperty("applications.sink.servlet.path", "/metrics/applications/json")
}
1.2 从文件中加载度量属性
loadPropertiesFromFile用于给MetricsConfig的properties中添加指定文件加载 的度量属性
//org.apache.spark.metrics.MetricsConfig
private[this] def loadPropertiesFromFile(path: Option[String]): Unit = {
var is: InputStream = null
try {
is = path match {
case Some(f) => new FileInputStream(f)
case None => Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_METRICS_CONF_FILENAME)
}
if (is != null) {
properties.load(is)
}
} catch {
case e: Exception =>
val file = path.getOrElse(DEFAULT_METRICS_CONF_FILENAME)
logError(s"Error loading configuration file $file", e)
} finally {
if (is != null) {
is.close()
}
}
}
根据代码,如果没有指定度量属性文件或者此文件不存在,那么将从类路径下的属性文件metrics.properties(即常量DEFAULT_METRICS_CONF_FILENAME)中加载度量属性。
1.3 提取实例的属性
setProperties方法对于properties中的每个属性kv,通过正则表达式匹配找kv的key的前缀和后缀,以前缀为实例,后缀作为 新的属性kv2的key,kv的value作为新的属性kv2的value,最后将属性kv2添加到实例对应的属性集合中作为实例的属性。
//org.apache.spark.metrics.MetricsConfig
def subProperties(prop: Properties, regex: Regex): mutable.HashMap[String, Properties] = {
val subProperties = new mutable.HashMap[String, Properties]
prop.asScala.foreach { kv =>
if (regex.findPrefixOf(kv._1.toString).isDefined) {
val regex(prefix, suffix) = kv._1.toString
subProperties.getOrElseUpdate(prefix, new Properties).setProperty(suffix, kv._2.toString)
}
}
subProperties
1.4 初始化
在构造MetricsSystem的过程中将会执行以下代码:
//org.apache.spark.metrics.MetricsSystem
metricsConfig.initialize()
MetricSystem的initialize方法实现代码如下:
//org.apache.spark.metrics.MetricsConfig
def initialize() {
setDefaultProperties(properties)
loadPropertiesFromFile(conf.getOption("spark.metrics.conf"))
val prefix = "spark.metrics.conf."
conf.getAll.foreach {
case (k, v) if k.startsWith(prefix) =>
properties.setProperty(k.substring(prefix.length()), v)
case _ =>
}
propertyCategories = subProperties(properties, INSTANCE_REGEX)
if (propertyCategories.contains(DEFAULT_PREFIX)) {
val defaultProperty = propertyCategories(DEFAULT_PREFIX).asScala
for((inst, prop) <- propertyCategories if (inst != DEFAULT_PREFIX);
(k, v) <- defaultProperty if (prop.get(k) == null)) {
prop.put(k, v)
}
}
}
其代码实现步骤如下:
2.1 构建度量源的注册名
buildRegistryName方法用于给Source生成向MetricRegistry中注册的注册名
//org.apache.spark.metrics.MetricsSystem
private[spark] def buildRegistryName(source: Source): String = {
val appId = conf.getOption("spark.app.id")
val executorId = conf.getOption("spark.executor.id")
val defaultName = MetricRegistry.name(source.sourceName)
if (instance == "driver" || instance == "executor") {
if (appId.isDefined && executorId.isDefined) {
MetricRegistry.name(appId.get, executorId.get, source.sourceName)
} else {
val warningMsg = s"Using default name $defaultName for source because %s is not set."
if (appId.isEmpty) { logWarning(warningMsg.format("spark.app.id")) }
if (executorId.isEmpty) { logWarning(warningMsg.format("spark.executor.id")) }
defaultName
}
} else { defaultName }
}
buildRegistryName方法涉及的变量有几个:
其执行步骤如下:
2.2 注册度量源
registerSource方法用于向MetricsSystem中注册度量源
//org.apache.spark.metrics.MetricsSystem
def registerSource(source: Source) {
sources += source
try {
val regName = buildRegistryName(source)
registry.register(regName, source.metricRegistry)
} catch {
case e: IllegalArgumentException => logInfo("Metrics already registered", e)
}
}
2.3 获取ServletContextHandler
为了将度量系统和Spark UI结合起来使用,即将Spark UI的Web展现也作为度量系统的Sink之一,我们需要一个ServletContextHandler能在Spark UI中接收请求,并且还需要能将度量系统的Sink之一,我们需要一个ServletContextHandler能在SparkUI中接收请求,并且还需要能将度量输出到Spark UI的页面。MetricsServlet是特质Sink的实现之一,但是它还不是一个ServletContextHandler,因此需要有一个转换,代码如下:
//org.apache.spark.metrics.MetricsSystem
def getServletHandlers: Array[ServletContextHandler] = {
require(running, "Can only call getServletHandlers on a running MetricsSystem")
metricsServlet.map(_.getHandlers(conf)).getOrElse(Array())
}
上述代码中调用了MetricServlet的getHandlers方法来实现转换,getHandlers的实现如代码:
//org.apache.spark.metrics.MetricsSystem
def getHandlers(conf: SparkConf): Array[ServletContextHandler] = {
Array[ServletContextHandler](
createServletHandler(servletPath,
new ServletParams(request => getMetricsSnapshot(request), "text/json"), securityMgr, conf)
)
}
getHandlers也调用了JettyUtils的createServletHandler方法创建ServletContextHandler。
SparkEnv会调用MetricsSystem的start方法启动MetricsSystem,start方法的实现代码如下:
//org.apache.spark.metrics.MetricsSystem
def start() {
require(!running, "Attempting to start a MetricsSystem that is already running")
running = true
StaticSources.allSources.foreach(registerSource)
registerSources()
registerSinks()
sinks.foreach(_.start)
}
根据代码,启动MetricsSystem的步骤如下:
//org.apache.spark.metrics.source.StaticSources
private[spark] object StaticSources {
val allSources = Seq(CodegenMetrics)
}
//org.apache.spark.metrics.MetricsSystem
private def registerSources() {
val instConfig = metricsConfig.getInstance(instance)
val sourceConfigs = metricsConfig.subProperties(instConfig, MetricsSystem.SOURCE_REGEX)
sourceConfigs.foreach { kv =>
val classPath = kv._2.getProperty("class")
try {
val source = Utils.classForName(classPath).newInstance()
registerSource(source.asInstanceOf[Source])
} catch {
case e: Exception => logError("Source class " + classPath + " cannot be instantiated", e)
}
}
}
根据上述代码清单,注册度量配置中度量源的步骤如下:
①获取当前实例的度量属性。根据MetricsConfig的getInstance方法的逻辑,当实例不存在时返回默认实例的。以driver来讲,默认不存在driver的实例属性,因此返回 * 对应的属性,即:
*->{
sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet
sink.servlet.path=/metrics/json
}
②匹配正则表达式^source\\.(.+)\\.(.+),获取所有度量源更细粒度的实例及属性
③使用实例的class属性,通过Java反射生成度量源的实例,并调用registerSource方法将此度量源注册到MetricRegistry
4)从初始化完成的MetricsConfig中获取当前实例的度量输出属性,并将这些度量输出注册到sinks。
//org.apache.spark.metrics.MetricsSystem
private def registerSinks() {
val instConfig = metricsConfig.getInstance(instance)
val sinkConfigs = metricsConfig.subProperties(instConfig, MetricsSystem.SINK_REGEX)
sinkConfigs.foreach { kv =>
val classPath = kv._2.getProperty("class")
if (null != classPath) {
try {
val sink = Utils.classForName(classPath)
.getConstructor(classOf[Properties], classOf[MetricRegistry], classOf[SecurityManager])
.newInstance(kv._2, registry, securityMgr)
if (kv._1 == "servlet") {
metricsServlet = Some(sink.asInstanceOf[MetricsServlet])
} else {
sinks += sink.asInstanceOf[Sink]
}
} catch {
case e: Exception =>
logError("Sink class " + classPath + " cannot be instantiated")
throw e
}
}
}
}
根据代码,注册度量配置中度量输出的步骤如下:
*->{
sink.servlet.class=org.apache.spark.metrics.sink.MetricsServlet
sink.servlet.path=/metrics/json
}
Map(servlet -> {class=org.spark.metrics.sink.MetricsServlet,path=/metrics/json})