RPC配置类 TransportConf
TransportConf给Spark的RPC框架提供配置信息,它有两个成员属性——配置提供者conf和配置的模块名称module。这两个属性的定义如下:
//配置提供者
private final ConfigProvider conf;
//模块名称
private final String module;
ConfigProvider是一个抽象类,代码如下:
/**
* Provides a mechanism for constructing a {@link TransportConf} using some sort of configuration.
*/
public abstract class ConfigProvider {
/** Obtains the value of the given config, throws NoSuchElementException if it doesn't exist. */
public abstract String get(String name);
public String get(String name, String defaultValue) {
try {
return get(name);
} catch (NoSuchElementException e) {
return defaultValue;
}
}
public int getInt(String name, int defaultValue) {
return Integer.parseInt(get(name, Integer.toString(defaultValue)));
}
public long getLong(String name, long defaultValue) {
return Long.parseLong(get(name, Long.toString(defaultValue)));
}
public double getDouble(String name, double defaultValue) {
return Double.parseDouble(get(name, Double.toString(defaultValue)));
}
public boolean getBoolean(String name, boolean defaultValue) {
return Boolean.parseBoolean(get(name, Boolean.toString(defaultValue)));
}
}
ConfigProvider中包括get、getInt、getLong、getDouble、getBoolean等方法,这些方法都是基于抽象方法get获取值,经过一次类型转换而实现。这个抽象的get方法将需要子类去实现。
实际代码中,get的实现实际是代理了SparkConf的get方法
/** Get a parameter; throws a NoSuchElementException if it's not set */
def get(key: String): String = {
getOption(key).getOrElse(throw new NoSuchElementException(key))
}
由scala单例类SparkTransportConf创建TransportConf对象。
package org.apache.spark.network.netty
import org.apache.spark.SparkConf
import org.apache.spark.network.util.{ConfigProvider, TransportConf}
/**
* Provides a utility for transforming from a SparkConf inside a Spark JVM (e.g., Executor,
* Driver, or a standalone shuffle service) into a TransportConf with details on our environment
* like the number of cores that are allocated to this JVM.
*/
object SparkTransportConf {
/**
* Specifies an upper bound on the number of Netty threads that Spark requires by default.
* In practice, only 2-4 cores should be required to transfer roughly 10 Gb/s, and each core
* that we use will have an initial overhead of roughly 32 MB of off-heap memory, which comes
* at a premium.
*
* Thus, this value should still retain maximum throughput and reduce wasted off-heap memory
* allocation. It can be overridden by setting the number of serverThreads and clientThreads
* manually in Spark's configuration.
*/
private val MAX_DEFAULT_NETTY_THREADS = 8
/**
* Utility for creating a [[TransportConf]] from a [[SparkConf]].
* @param _conf the [[SparkConf]]
* @param module the module name
* @param numUsableCores if nonzero, this will restrict the server and client threads to only
* use the given number of cores, rather than all of the machine's cores.
* This restriction will only occur if these properties are not already set.
*/
def fromSparkConf(_conf: SparkConf, module: String, numUsableCores: Int = 0): TransportConf = {
val conf = _conf.clone
// Specify thread configuration based on our JVM's allocation of cores (rather than necessarily
// assuming we have all the machine's cores).
// NB: Only set if serverThreads/clientThreads not already set.
val numThreads = defaultNumThreads(numUsableCores)
conf.setIfMissing(s"spark.$module.io.serverThreads", numThreads.toString)
conf.setIfMissing(s"spark.$module.io.clientThreads", numThreads.toString)
new TransportConf(module, new ConfigProvider {
override def get(name: String): String = conf.get(name)
})
}
/**
* Returns the default number of threads for both the Netty client and server thread pools.
* If numUsableCores is 0, we will use Runtime get an approximate number of available cores.
*/
private def defaultNumThreads(numUsableCores: Int): Int = {
val availableCores =
if (numUsableCores > 0) numUsableCores else Runtime.getRuntime.availableProcessors()
math.min(availableCores, MAX_DEFAULT_NETTY_THREADS)
}
}
从代码看到,可以使用SparkTransportConf的fromSparkConf方法来构造TransportConf。传递的三个参数分别为SparkConf、模块名module及可用的内核数num-UsableCores。如果numUsableCores小于等于0,那么线程数是系统可用处理器的数量,不过系统的内核数不可能全部用于网络传输,所以这里将分配给网络传输的内核数量最多限制在8个。
最终确定的线程数将用于设置客户端传输线程spark.module.io.clientThreads属性和服务端传输线程数spark.module.io.serverThreads属性。
博客基于《Spark内核设计的艺术:架构设计与实现》一书