Flink启动主要是启动JobManager进程和TaskManager进程,本章我们总结一下JobManager的启动流程:
启动类是org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint
public static void main(String[] args) {
// startup checks and logging
//提供对JVM执行环境的访问的实用程序类,如执行用户(getHadoopUser())、启动选项或JVM版本。
EnvironmentInformation.logEnvironmentInfo(LOG, StandaloneSessionClusterEntrypoint.class.getSimpleName(), args);
SignalHandler.register(LOG);
JvmShutdownSafeguard.installAsShutdownHook(LOG);
EntrypointClusterConfiguration entrypointClusterConfiguration = null;
final CommandLineParser commandLineParser = new CommandLineParser<>(new EntrypointClusterConfigurationParserFactory());
try {
entrypointClusterConfiguration = commandLineParser.parse(args);
} catch (FlinkParseException e) {
LOG.error("Could not parse command line arguments {}.", args, e);
commandLineParser.printHelp(StandaloneSessionClusterEntrypoint.class.getSimpleName());
System.exit(1);
}
//解析配置参数
Configuration configuration = loadConfiguration(entrypointClusterConfiguration);
//构造StandaloneSessionClusterEntrypoint对象,独立会话集群的入口
StandaloneSessionClusterEntrypoint entrypoint = new StandaloneSessionClusterEntrypoint(configuration);
//启动会话集群
ClusterEntrypoint.runClusterEntrypoint(entrypoint);
}
public void startCluster() throws ClusterEntrypointException {
LOG.info("Starting {}.", getClass().getSimpleName());
try {
//PluginManager负责管理集群插件,这些插件是使用单独的类加载器加载的,
// 以便它们的依赖关系,不要干扰Flink的依赖关系。
PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
//根据配置初始化文件系统
configureFileSystems(configuration, pluginManager);
//安全配置Context
SecurityContext securityContext = installSecurityContext(configuration);
//启动jobmanager服务
securityContext.runSecured((Callable) () -> {
runCluster(configuration, pluginManager);
return null;
});
} catch (Throwable t)
private void runCluster(Configuration configuration, PluginManager pluginManager) throws Exception {
synchronized (lock) {
//初始化服务
//commonRpcService: 基于Akka的RpcService实现。RPC服务启动Akka参与者来接收从RpcGateway调用RPC
//haServices: 提供对高可用性所需的所有服务的访问注册,分布式计数器和领导人选举
//blobServer: 负责侦听传入的请求生成线程来处理这些请求。它还负责创建要存储的目录结构blob或临时缓存它们
//heartbeatServices: 提供心跳所需的所有服务。这包括创建心跳接收器和心跳发送者。
//metricRegistry: 跟踪所有已注册的Metric,它作为连接MetricGroup和MetricReporter
//archivedExecutionGraphStore: 存储执行图ExecutionGraph的可序列化形式。
initializeServices(configuration, pluginManager);
// write host information into configuration 将jobmanager的信息写入道配置文件
configuration.setString(JobManagerOptions.ADDRESS, commonRpcService.getAddress());
configuration.setInteger(JobManagerOptions.PORT, commonRpcService.getPort());
final DispatcherResourceManagerComponentFactory dispatcherResourceManagerComponentFactory = createDispatcherResourceManagerComponentFactory(configuration);
//创建dispatcherResourceManagerComponent,包含三大组件
// Dispatcher: 负责用于接收作业提交,持久化它们,生成要执行的作业管理器任务,并在主任务失败时恢复它们。此外,它知道关于Flink会话集群的状态。
// ResourceManager:负责资源的分配和记帐。registerJobManager(JobMasterId, ResourceID, String, JobID, Time)负责注册jobmaster, requestSlot(JobMasterId, SlotRequest, Time)从资源管理器请求一个槽
// WebMonitorEndpoint:服务于web前端Rest调用的Rest端点
clusterComponent = dispatcherResourceManagerComponentFactory.create(
configuration,
ioExecutor,
commonRpcService,
haServices,
blobServer,
heartbeatServices,
metricRegistry,
archivedExecutionGraphStore,
new RpcMetricQueryServiceRetriever(metricRegistry.getMetricQueryServiceRpcService()),
this);
clusterComponent.getShutDownFuture().whenComplete(
(ApplicationStatus applicationStatus, Throwable throwable) -> {
if (throwable != null) {
shutDownAsync(
ApplicationStatus.UNKNOWN,
ExceptionUtils.stringifyException(throwable),
false);
} else {
// This is the general shutdown path. If a separate more specific shutdown was
// already triggered, this will do nothing
shutDownAsync(
applicationStatus,
null,
true);
}
});
}
}
基于Akka的RpcService实现。RPC服务启动Akka参与者来接收从RpcGateway调用RPC
commonRpcService其实是一个基于akka得actorSystem,其实就是一个tcp的rpc服务,端口为:6123。它的主要配置如下:
Config(SimpleConfigObject(
{"akka":{
"actor":{
"default-dispatcher":{
"executor":"fork-join-executor",
"fork-join-executor":{"parallelism-factor":2,
"parallelism-max":64,
"parallelism-min":8
},
"throughput":15
},
"guardian-supervisor-strategy":"org.apache.flink.runtime.akka.EscalatingSupervisorStrategy",
"provider":"akka.remote.RemoteActorRefProvider",
"supervisor-dispatcher":{
"executor":"thread-pool-executor",
"thread-pool-executor":{
"core-pool-size-max":1,
"core-pool-size-min":1
},"type":"Dispatcher"
},
"warn-about-java-serializer-usage":"off"},
"daemonic":"off",
"jvm-exit-on-fatal-error":"on",
"log-config-on-start":"off",
"log-dead-letters":"off",
"log-dead-letters-during-shutdown":"off",
"loggers":["akka.event.slf4j.Slf4jLogger"],
"logging-filter":"akka.event.slf4j.Slf4jLoggingFilter",
"loglevel":"ERROR",
"remote":{
"log-remote-lifecycle-events":"off",
"netty":{
"tcp":{
"bind-hostname":"0.0.0.0",
"bind-port":6123,
"client-socket-worker-pool":{
"pool-size-factor":1,
"pool-size-max":2,
"pool-size-min":1
},
"connection-timeout":"20000ms",
"hostname":"localhost",
"maximum-frame-size":"10485760b",
"port":6123,
"server-socket-worker-pool":{
"pool-size-factor":1,
"pool-size-max":2,
"pool-size-min":1
},
"tcp-nodelay":"on",
"transport-class":"akka.remote.transport.netty.NettyTransport"
}
},
"retry-gate-closed-for":"50 ms",
"startup-timeout":"100000ms",
"transport-failure-detector":{
"acceptable-heartbeat-pause":"6000000ms",
"heartbeat-interval":"1000000ms","threshold":300
}
},
"serialize-messages":"off",
"stdout-loglevel":"OFF"
}}))
提供对高可用性所需的所有服务的访问注册,分布式计数器和领导人选举
haServices服务创建的代码如下:
switch (highAvailabilityMode) {
case NONE:
final Tuple2 hostnamePort = getJobManagerAddress(configuration);
final String resourceManagerRpcUrl = AkkaRpcServiceUtils.getRpcUrl(
hostnamePort.f0,
hostnamePort.f1,
AkkaRpcServiceUtils.createWildcardName(ResourceManager.RESOURCE_MANAGER_NAME),
addressResolution,
configuration);
final String dispatcherRpcUrl = AkkaRpcServiceUtils.getRpcUrl(
hostnamePort.f0,
hostnamePort.f1,
AkkaRpcServiceUtils.createWildcardName(Dispatcher.DISPATCHER_NAME),
addressResolution,
configuration);
final String webMonitorAddress = getWebMonitorAddress(
configuration,
addressResolution);
return new StandaloneHaServices(
resourceManagerRpcUrl,
dispatcherRpcUrl,
webMonitorAddress);
case ZOOKEEPER:
BlobStoreService blobStoreService = BlobUtils.createBlobStoreFromConfig(configuration);
return new ZooKeeperHaServices(
ZooKeeperUtils.startCuratorFramework(configuration),
executor,
configuration,
blobStoreService);
case FACTORY_CLASS:
return createCustomHAServices(configuration, executor);
可以看出,HA服务创建有三种:
1. none: 表示没有ha,实现服务是StandaloneHaServices
public class StandaloneHaServices extends AbstractNonHaServices {
/** The fix address of the ResourceManager. */
private final String resourceManagerAddress;
/** The fix address of the Dispatcher. */
private final String dispatcherAddress;
private final String clusterRestEndpointAddress;
/**
* Creates a new services class for the fix pre-defined leaders.
*
* @param resourceManagerAddress The fix address of the ResourceManager
* @param clusterRestEndpointAddress
*/
public StandaloneHaServices(
String resourceManagerAddress,
String dispatcherAddress,
String clusterRestEndpointAddress) {
this.resourceManagerAddress = checkNotNull(resourceManagerAddress, "resourceManagerAddress");
this.dispatcherAddress = checkNotNull(dispatcherAddress, "dispatcherAddress");
this.clusterRestEndpointAddress = checkNotNull(clusterRestEndpointAddress, clusterRestEndpointAddress);
}
2. zookeeper:通过zookeeper实现的ha服务,实现服务是ZooKeeperHaServices
public ZooKeeperHaServices(
CuratorFramework client,
Executor executor,
Configuration configuration,
BlobStoreService blobStoreService) {
this.client = checkNotNull(client);
this.executor = checkNotNull(executor);
this.configuration = checkNotNull(configuration);
this.runningJobsRegistry = new ZooKeeperRunningJobsRegistry(client, configuration);
this.blobStoreService = checkNotNull(blobStoreService);
}
3. factory_class: 自定义实现的ha, 通过highAvailabilityServicesFactory工厂创建
负责侦听传入的请求生成线程来处理这些请求。它还负责创建要存储的目录结构blob或临时缓存它们
blobServer的构造函数:
public BlobServer(Configuration config, BlobStore blobStore) throws IOException {
this.blobServiceConfiguration = checkNotNull(config);
this.blobStore = checkNotNull(blobStore);
this.readWriteLock = new ReentrantReadWriteLock();
// configure and create the storage directory
this.storageDir = BlobUtils.initLocalStorageDirectory(config);
LOG.info("Created BLOB server storage directory {}", storageDir);
// configure the maximum number of concurrent connections
final int maxConnections = config.getInteger(BlobServerOptions.FETCH_CONCURRENT);
if (maxConnections >= 1) {
this.maxConnections = maxConnections;
}
else {
LOG.warn("Invalid value for maximum connections in BLOB server: {}. Using default value of {}",
maxConnections, BlobServerOptions.FETCH_CONCURRENT.defaultValue());
this.maxConnections = BlobServerOptions.FETCH_CONCURRENT.defaultValue();
}
// configure the backlog of connections
int backlog = config.getInteger(BlobServerOptions.FETCH_BACKLOG);
if (backlog < 1) {
LOG.warn("Invalid value for BLOB connection backlog: {}. Using default value of {}",
backlog, BlobServerOptions.FETCH_BACKLOG.defaultValue());
backlog = BlobServerOptions.FETCH_BACKLOG.defaultValue();
}
// Initializing the clean up task
//初始化定时任务,用来清理TTL过期的blob
this.cleanupTimer = new Timer(true);
this.cleanupInterval = config.getLong(BlobServerOptions.CLEANUP_INTERVAL) * 1000;
this.cleanupTimer
.schedule(new TransientBlobCleanupTask(blobExpiryTimes, readWriteLock.writeLock(),
storageDir, LOG), cleanupInterval, cleanupInterval);
this.shutdownHook = ShutdownHookUtil.addShutdownHook(this, getClass().getSimpleName(), LOG);
// ----------------------- start the server -------------------
//构造一个serverSocket
final String serverPortRange = config.getString(BlobServerOptions.PORT);
final Iterator ports = NetUtils.getPortRangeFromString(serverPortRange);
final ServerSocketFactory socketFactory;
if (SSLUtils.isInternalSSLEnabled(config) && config.getBoolean(BlobServerOptions.SSL_ENABLED)) {
try {
socketFactory = SSLUtils.createSSLServerSocketFactory(config);
}
catch (Exception e) {
throw new IOException("Failed to initialize SSL for the blob server", e);
}
}
else {
socketFactory = ServerSocketFactory.getDefault();
}
final int finalBacklog = backlog;
final String bindHost = config.getOptional(JobManagerOptions.BIND_HOST).orElseGet(NetUtils::getWildcardIPAddress);
this.serverSocket = NetUtils.createSocketFromPorts(ports,
(port) -> socketFactory.createServerSocket(port, finalBacklog, InetAddress.getByName(bindHost)));
if (serverSocket == null) {
throw new IOException("Unable to open BLOB Server in specified port range: " + serverPortRange);
}
// start the server thread
setName("BLOB Server listener at " + getPort());
setDaemon(true);
if (LOG.isInfoEnabled()) {
LOG.info("Started BLOB server at {}:{} - max concurrent requests: {} - max backlog: {}",
serverSocket.getInetAddress().getHostAddress(), getPort(), maxConnections, backlog);
}
}
构造函数主要做了两件事:
1.初始化定时任务,用来清理TTL过期的blob
2.构造一个serverSocket
启动blobserver:
blobServer = new BlobServer(configuration, haServices.createBlobStore());
blobServer.start();
@Override
public void run() {
try {
while (!this.shutdownRequested.get()) {
BlobServerConnection conn = new BlobServerConnection(serverSocket.accept(), this);
try {
synchronized (activeConnections) {
while (activeConnections.size() >= maxConnections) {
activeConnections.wait(2000);
}
activeConnections.add(conn);
}
conn.start();
conn = null;
}
finally {
if (conn != null) {
conn.close();
synchronized (activeConnections) {
activeConnections.remove(conn);
}
}
}
}
}
catch (Throwable t) {
if (!this.shutdownRequested.get()) {
LOG.error("BLOB server stopped working. Shutting down", t);
try {
close();
} catch (Throwable closeThrowable) {
LOG.error("Could not properly close the BlobServer.", closeThrowable);
}
}
}
}
提供心跳所需的所有服务。这包括创建心跳接收器和心跳发送者。
跟踪所有已注册的Metric,它作为连接MetricGroup和MetricReporter
public static RpcService startRemoteMetricsRpcService(Configuration configuration, String hostname) throws Exception {
final String portRange = configuration.getString(MetricOptions.QUERY_SERVICE_PORT);
return startMetricRpcService(configuration, AkkaRpcServiceUtils.remoteServiceBuilder(configuration, hostname, portRange));
}
metricQueryServiceRpcService也是一个actorSystem:
存储执行图ExecutionGraph的可序列化形式。
@Override
protected ArchivedExecutionGraphStore createSerializableExecutionGraphStore(
Configuration configuration,
ScheduledExecutor scheduledExecutor) throws IOException {
final File tmpDir = new File(ConfigurationUtils.parseTempDirectories(configuration)[0]);
final Time expirationTime = Time.seconds(configuration.getLong(JobManagerOptions.JOB_STORE_EXPIRATION_TIME));
final int maximumCapacity = configuration.getInteger(JobManagerOptions.JOB_STORE_MAX_CAPACITY);
final long maximumCacheSizeBytes = configuration.getLong(JobManagerOptions.JOB_STORE_CACHE_SIZE);
return new FileArchivedExecutionGraphStore(
tmpDir,
expirationTime,
maximumCapacity,
maximumCacheSizeBytes,
scheduledExecutor,
Ticker.systemTicker());
}
public FileArchivedExecutionGraphStore(
File rootDir,
Time expirationTime,
int maximumCapacity,
long maximumCacheSizeBytes,
ScheduledExecutor scheduledExecutor,
Ticker ticker) throws IOException {
final File storageDirectory = initExecutionGraphStorageDirectory(rootDir);
LOG.info(
"Initializing {}: Storage directory {}, expiration time {}, maximum cache size {} bytes.",
FileArchivedExecutionGraphStore.class.getSimpleName(),
storageDirectory,
expirationTime.toMilliseconds(),
maximumCacheSizeBytes);
//存储地址
this.storageDir = Preconditions.checkNotNull(storageDirectory);
Preconditions.checkArgument(
storageDirectory.exists() && storageDirectory.isDirectory(),
"The storage directory must exist and be a directory.");
//缓存cache
this.jobDetailsCache = CacheBuilder.newBuilder()
.expireAfterWrite(expirationTime.toMilliseconds(), TimeUnit.MILLISECONDS)
.maximumSize(maximumCapacity)
.removalListener(
(RemovalListener) notification -> deleteExecutionGraphFile(notification.getKey()))
.ticker(ticker)
.build();
//LoadingCache
this.archivedExecutionGraphCache = CacheBuilder.newBuilder()
.maximumWeight(maximumCacheSizeBytes)
.weigher(this::calculateSize)
.build(new CacheLoader() {
@Override
public ArchivedExecutionGraph load(JobID jobId) throws Exception {
return loadExecutionGraph(jobId);
}});
this.cleanupFuture = scheduledExecutor.scheduleWithFixedDelay(
jobDetailsCache::cleanUp,
expirationTime.toMilliseconds(),
expirationTime.toMilliseconds(),
TimeUnit.MILLISECONDS);
this.shutdownHook = ShutdownHookUtil.addShutdownHook(this, getClass().getSimpleName(), LOG);
this.numFinishedJobs = 0;
this.numFailedJobs = 0;
this.numCanceledJobs = 0;
}
public DispatcherResourceManagerComponent create(
Configuration configuration,
Executor ioExecutor,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
BlobServer blobServer,
HeartbeatServices heartbeatServices,
MetricRegistry metricRegistry,
ArchivedExecutionGraphStore archivedExecutionGraphStore,
MetricQueryServiceRetriever metricQueryServiceRetriever,
FatalErrorHandler fatalErrorHandler) throws Exception {
//检索当前leader并进行通知一个倾听者的服务
LeaderRetrievalService dispatcherLeaderRetrievalService = null;
//检索当前leader并进行通知一个倾听者的服务
LeaderRetrievalService resourceManagerRetrievalService = null;
//服务于web前端Rest调用的Rest端点。
WebMonitorEndpoint> webMonitorEndpoint = null;
//ResourceManager实现。资源管理器负责资源的分配和记帐
ResourceManager> resourceManager = null;
//封装Dispatcher如何执行的
DispatcherRunner dispatcherRunner = null;
try {
dispatcherLeaderRetrievalService = highAvailabilityServices.getDispatcherLeaderRetriever();
resourceManagerRetrievalService = highAvailabilityServices.getResourceManagerLeaderRetriever();
//LeaderGatewayRetriever检索和存储leading {@link RpcGateway}。
final LeaderGatewayRetriever dispatcherGatewayRetriever = new RpcGatewayRetriever<>(
rpcService,
DispatcherGateway.class,
DispatcherId::fromUuid,
10,
Time.milliseconds(50L));
//LeaderGatewayRetriever检索和存储leading {@link RpcGateway}。
final LeaderGatewayRetriever resourceManagerGatewayRetriever = new RpcGatewayRetriever<>(
rpcService,
ResourceManagerGateway.class,
ResourceManagerId::fromUuid,
10,
Time.milliseconds(50L));
final ScheduledExecutorService executor = WebMonitorEndpoint.createExecutorService(
configuration.getInteger(RestOptions.SERVER_NUM_THREADS),
configuration.getInteger(RestOptions.SERVER_THREAD_PRIORITY),
"DispatcherRestEndpoint");
final long updateInterval = configuration.getLong(MetricOptions.METRIC_FETCHER_UPDATE_INTERVAL);
//MetricFetcher可用于从JobManager和所有注册的taskmanager获取指标
final MetricFetcher metricFetcher = updateInterval == 0
? VoidMetricFetcher.INSTANCE
: MetricFetcherImpl.fromConfiguration(
configuration,
metricQueryServiceRetriever,
dispatcherGatewayRetriever,
executor);
webMonitorEndpoint = restEndpointFactory.createRestEndpoint(
configuration,
dispatcherGatewayRetriever,
resourceManagerGatewayRetriever,
blobServer,
executor,
metricFetcher,
highAvailabilityServices.getClusterRestEndpointLeaderElectionService(),
fatalErrorHandler);
log.debug("Starting Dispatcher REST endpoint.");
webMonitorEndpoint.start();
final String hostname = RpcUtils.getHostname(rpcService);
resourceManager = resourceManagerFactory.createResourceManager(
configuration,
ResourceID.generate(),
rpcService,
highAvailabilityServices,
heartbeatServices,
fatalErrorHandler,
new ClusterInformation(hostname, blobServer.getPort()),
webMonitorEndpoint.getRestBaseUrl(),
metricRegistry,
hostname);
final HistoryServerArchivist historyServerArchivist = HistoryServerArchivist.createHistoryServerArchivist(configuration, webMonitorEndpoint, ioExecutor);
final PartialDispatcherServices partialDispatcherServices = new PartialDispatcherServices(
configuration,
highAvailabilityServices,
resourceManagerGatewayRetriever,
blobServer,
heartbeatServices,
() -> MetricUtils.instantiateJobManagerMetricGroup(metricRegistry, hostname),
archivedExecutionGraphStore,
fatalErrorHandler,
historyServerArchivist,
metricRegistry.getMetricQueryServiceGatewayRpcAddress());
log.debug("Starting Dispatcher.");
dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(
highAvailabilityServices.getDispatcherLeaderElectionService(),
fatalErrorHandler,
new HaServicesJobGraphStoreFactory(highAvailabilityServices),
ioExecutor,
rpcService,
partialDispatcherServices);
log.debug("Starting ResourceManager.");
resourceManager.start();
resourceManagerRetrievalService.start(resourceManagerGatewayRetriever);
dispatcherLeaderRetrievalService.start(dispatcherGatewayRetriever);
return new DispatcherResourceManagerComponent(
dispatcherRunner,
resourceManager,
dispatcherLeaderRetrievalService,
resourceManagerRetrievalService,
webMonitorEndpoint);
} catch (Exception exception) {
// clean up all started components
if (dispatcherLeaderRetrievalService != null) {
try {
dispatcherLeaderRetrievalService.stop();
} catch (Exception e) {
exception = ExceptionUtils.firstOrSuppressed(e, exception);
}
}
if (resourceManagerRetrievalService != null) {
try {
resourceManagerRetrievalService.stop();
} catch (Exception e) {
exception = ExceptionUtils.firstOrSuppressed(e, exception);
}
}
final Collection> terminationFutures = new ArrayList<>(3);
if (webMonitorEndpoint != null) {
terminationFutures.add(webMonitorEndpoint.closeAsync());
}
if (resourceManager != null) {
terminationFutures.add(resourceManager.closeAsync());
}
if (dispatcherRunner != null) {
terminationFutures.add(dispatcherRunner.closeAsync());
}
final FutureUtils.ConjunctFuture terminationFuture = FutureUtils.completeAll(terminationFutures);
try {
terminationFuture.get();
} catch (Exception e) {
exception = ExceptionUtils.firstOrSuppressed(e, exception);
}
throw new FlinkException("Could not create the DispatcherResourceManagerComponent.", exception);
}
}
dispatcherResourceManagerComponent,包含6个服务: 1. Dispatcher: 负责用于接收作业提交,持久化它们,生成要执行的作业管理器任务,并在主任务失败时恢复它们。此外,它知道关于Flink会话集群的状态。 2. ResourceManager:负责资源的分配和记帐。registerJobManager(JobMasterId, ResourceID, String, JobID, Time)负责注册jobmaster, requestSlot(JobMasterId, SlotRequest, Time)从资源管理器请求一个槽 3. WebMonitorEndpoint:服务于web前端Rest调用的Rest端点 4.dispatcherLeaderRetrievalService:检索当前dispatcher leader并进行通知一个倾听者的服务:dispatcherGatewayRetriever 5.resourceManagerRetrievalService:检索当前resourceManager leader并进行通知一个倾听者的服务:resourceManagerGatewayRetriever 6.partialDispatcherServices
负责用于接收作业提交,持久化它们,生成要执行的作业管理器任务,并在主任务失败时恢复它们。此外,它知道关于Flink会话集群的状态。
负责资源的分配和记帐。registerJobManager(JobMasterId, ResourceID, String, JobID, Time)负责注册jobmaster, requestSlot(JobMasterId, SlotRequest, Time)从资源管理器请求一个槽
服务于web前端Rest调用的Rest端点