Flink源码分析系列文档目录
请点击:Flink 源码分析系列文档目录
背景
Flink 分布式缓存(Distributed Cache)可用于向作业的各个TaskManager分发文件。典型的使用场景为流推理作业时候向集群内分发训练模型。文件分发的操作由Flink自动进行,无需用户干预,使用非常方便。
使用方法可参考Flink 使用之配置与调优中使用分布式缓存章节。
另外可以参考官方文档的使用示例:
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/dataset/overview/#distributed-cache
注册文件到分布式缓存中:
val env = ExecutionEnvironment.getExecutionEnvironment
// register a file from HDFS
env.registerCachedFile("hdfs:///path/to/your/file", "hdfsFile")
// register a local executable file (script, executable, ...)
env.registerCachedFile("file:///path/to/exec/file", "localExecFile", true)
// define your program and execute
...
val input: DataSet[String] = ...
val result: DataSet[Integer] = input.map(new MyMapper())
...
env.execute()
在Task Manager 算子中读取位于分布式缓存中的文件:
// extend a RichFunction to have access to the RuntimeContext
class MyMapper extends RichMapFunction[String, Int] {
override def open(config: Configuration): Unit = {
// access cached file via RuntimeContext and DistributedCache
val myFile: File = getRuntimeContext.getDistributedCache.getFile("hdfsFile")
// read the file (or navigate the directory)
...
}
override def map(value: String): Int = {
// use content of cached file
...
}
}
分布式缓存的使用方式较为简单,介绍到此为止。下面章节开始分布式缓存整个处理流程的源代码分析。
注册缓存文件
下面从ExecutionEnvironment
的registerCachedFile
方法开始分析Distributed Cache的执行流程。
public void registerCachedFile(String filePath, String name) {
registerCachedFile(filePath, name, false);
}
public void registerCachedFile(String filePath, String name, boolean executable) {
this.cacheFile.add(new Tuple2<>(name, new DistributedCacheEntry(filePath, executable)));
}
这个方法将需要缓存的文件封装为DistributedCacheEntry
,存入cacheFile
集合中,key为缓存文件name(用户获取缓存文件的时候以name为标识符),value为封装好的DistributedCacheEntry
对象。
注意,还可以使用pipeline.cached-files
配置项注明需要加入分布式缓存的文件。官网给出的描述如下:
Files to be registered at the distributed cache under the given name. The files will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.
Example:
name:file1,path:`file:///tmp/file1`;name:file2,path:`hdfs://
该配置值的读取过程在StreamExecutionEnvironment
或者是ExecutionEnvironment
中。
configuration
.getOptional(PipelineOptions.CACHED_FILES)
.ifPresent(
f -> {
this.cacheFile.clear();
this.cacheFile.addAll(DistributedCache.parseCachedFilesFromString(f));
});
在所有需要缓存的文件信息保存到cacheFile
变量之后,接下来是生成StreamGraph
的流程。StreamExecutionEnvironment
将其传递给StreamGraphGenerator
。
private StreamGraphGenerator getStreamGraphGenerator(List> transformations) {
if (transformations.size() <= 0) {
throw new IllegalStateException(
"No operators defined in streaming topology. Cannot execute.");
}
// We copy the transformation so that newly added transformations cannot intervene with the
// stream graph generation.
return new StreamGraphGenerator(
new ArrayList<>(transformations), config, checkpointCfg, configuration)
.setStateBackend(defaultStateBackend)
.setChangelogStateBackendEnabled(changelogStateBackendEnabled)
.setSavepointDir(defaultSavepointDirectory)
.setChaining(isChainingEnabled)
.setUserArtifacts(cacheFile)
.setTimeCharacteristic(timeCharacteristic)
.setDefaultBufferTimeout(bufferTimeout)
.setSlotSharingGroupResource(slotSharingGroupResources);
}
cacheFile
在StreamGraphGenerator
持有,它的成员变量名称改变成了userArtifacts
。
在StreamGraphGenerator
的generate
方法中调用了configureStreamGraph(streamGraph)
,这个方法将userAtrifacts
传递给了StreamGraph
。
private void configureStreamGraph(final StreamGraph graph) {
// ...
graph.setUserArtifacts(userArtifacts);
// ...
}
然后流程到了从StreamGraph
生成JobGraph
的阶段。我们查看StreamingJobGraphGenerator
的createJobGraph
方法。这个方法将StreamGraph
携带的userArtifacts
传递给JobGraph
。
private JobGraph createJobGraph() {
// ...
final Map distributedCacheEntries =
JobGraphUtils.prepareUserArtifactEntries(
streamGraph.getUserArtifacts().stream()
.collect(Collectors.toMap(e -> e.f0, e -> e.f1)),
jobGraph.getJobID());
for (Map.Entry entry :
distributedCacheEntries.entrySet()) {
jobGraph.addUserArtifact(entry.getKey(), entry.getValue());
}
// ...
}
缓存文件内容上传
跟踪JobGraph::getUserArtifacts
调用位置。我们发现有两处调用值得关注:
- ClientUtils.extractAndUploadJobGraphFiles
- YarnClusterDescriptor.startAppMaster
下面分别分析这两种情况。
ClientUtils
ClientUtils
的extractAndUploadJobGraphFiles
方法顾名思义,解压并上传JobGraph文件。在客户端提交生成的JobGraph
的时候执行。
public static void extractAndUploadJobGraphFiles(
JobGraph jobGraph, SupplierWithException clientSupplier)
throws FlinkException {
List userJars = jobGraph.getUserJars();
Collection> userArtifacts =
jobGraph.getUserArtifacts().entrySet().stream()
.map(
entry ->
Tuple2.of(
entry.getKey(),
new Path(entry.getValue().filePath)))
.collect(Collectors.toList());
uploadJobGraphFiles(jobGraph, userJars, userArtifacts, clientSupplier);
}
该方法调用uploadJobGraphFiles
上传userJars
和userArtifacts
。
public static void uploadJobGraphFiles(
JobGraph jobGraph,
Collection userJars,
Collection> userArtifacts,
SupplierWithException clientSupplier)
throws FlinkException {
if (!userJars.isEmpty() || !userArtifacts.isEmpty()) {
try (BlobClient client = clientSupplier.get()) {
uploadAndSetUserJars(jobGraph, userJars, client);
uploadAndSetUserArtifacts(jobGraph, userArtifacts, client);
} catch (IOException ioe) {
throw new FlinkException("Could not upload job files.", ioe);
}
}
jobGraph.writeUserArtifactEntriesToConfiguration();
}
上传userArtifacts
的逻辑位于uploadAndSetUserArtifacts
方法,我们继续跟踪调用:
private static void uploadAndSetUserArtifacts(
JobGraph jobGraph,
Collection> artifactPaths,
BlobClient blobClient)
throws IOException {
Collection> blobKeys =
uploadUserArtifacts(jobGraph.getJobID(), artifactPaths, blobClient);
setUserArtifactBlobKeys(jobGraph, blobKeys);
}
uploadUserArtifacts
方法将文件上传到BlobServer。然后通过setUserArtifactBlobKeys
方法在JobGraph
中设置文件对应的blobKey
。
private static Collection> uploadUserArtifacts(
JobID jobID, Collection> userArtifacts, BlobClient blobClient)
throws IOException {
Collection> blobKeys =
new ArrayList<>(userArtifacts.size());
for (Tuple2 userArtifact : userArtifacts) {
// only upload local files
if (!userArtifact.f1.getFileSystem().isDistributedFS()) {
final PermanentBlobKey blobKey = blobClient.uploadFile(jobID, userArtifact.f1);
blobKeys.add(Tuple2.of(userArtifact.f0, blobKey));
}
}
return blobKeys;
}
private static void setUserArtifactBlobKeys(
JobGraph jobGraph, Collection> blobKeys)
throws IOException {
for (Tuple2 blobKey : blobKeys) {
jobGraph.setUserArtifactBlobKey(blobKey.f0, blobKey.f1);
}
}
最后到了JobGraph,我们看下setUserArtifactBlobKey
方法:
public void setUserArtifactBlobKey(String entryName, PermanentBlobKey blobKey)
throws IOException {
byte[] serializedBlobKey;
serializedBlobKey = InstantiationUtil.serializeObject(blobKey);
userArtifacts.computeIfPresent(
entryName,
(key, originalEntry) ->
new DistributedCache.DistributedCacheEntry(
originalEntry.filePath,
originalEntry.isExecutable,
serializedBlobKey,
originalEntry.isZipped));
}
该方法将缓存文件对应的BlobKey补充到userArtifacts
集合中。
YarnClusterDescriptor
还有一个上传用户缓存文件的地方位于YarnClusterDescriptor::startAppMaster
。此处调用的时机是Flink on Yarn集群启动的时候。
下面这段代码为startAppMaster
涉及文件上传的代码片段。使用application或者是yarn session模式提交作业的时候jobGraph
为空(Application模式用户代码main方法的执行位于Flink Yarn集群,此时jobGraph
还没有生成。Yarn session模式仅仅是启动了一个Flink yarn集群,作业还没有运行)。只有per-job方式提交的时候,jobGraph
才不为空。所以说下面这段代码仅仅针对于per-job模式的缓存文件上传。
Flink Yarn 集群的启动流程参见Flink 源码之 yarn-session 启动流程。
// only for per job mode
if (jobGraph != null) {
for (Map.Entry entry :
jobGraph.getUserArtifacts().entrySet()) {
// only upload local files
// 通过判断文件路径的scheme(file:///或者hdfs://等),确定文件位于远程共享存储还是本地存储
// 只上传位于本地存储的文件到yarn集群
if (!Utils.isRemotePath(entry.getValue().filePath)) {
Path localPath = new Path(entry.getValue().filePath);
// 上传本地文件到yarn集群(application目录,application访问级别)
Tuple2 remoteFileInfo =
fileUploader.uploadLocalFileToRemote(localPath, entry.getKey());
// 修改jobGraph中保存的缓存文件路径为上传到yarn集群之后的文件路径
jobGraph.setUserArtifactRemotePath(
entry.getKey(), remoteFileInfo.f0.toString());
}
}
// 将缓存文件信息写回到configuration中
jobGraph.writeUserArtifactEntriesToConfiguration();
}
JobGraph
的setUserArtifactRemotePath
方法将userArtifact
中缓存的文件的路径从本地路径替换为上传到yarn集群之后的路径。代码如下:
public void setUserArtifactRemotePath(String entryName, String remotePath) {
userArtifacts.computeIfPresent(
entryName,
(key, originalEntry) ->
new DistributedCache.DistributedCacheEntry(
remotePath,
originalEntry.isExecutable,
null,
originalEntry.isZipped));
}
最后调用JobGraph::writeUserArtifactEntriesToConfiguration
方法,将缓存文件的名称和路径等配置信息写入到configuration
中。
public void writeUserArtifactEntriesToConfiguration() {
for (Map.Entry userArtifact :
userArtifacts.entrySet()) {
DistributedCache.writeFileInfoToConfig(
userArtifact.getKey(), userArtifact.getValue(), jobConfiguration);
}
}
缓存文件的获取
缓存文件的获取位于TaskManager中。我们查看Task
类的doRun
方法。该方法创建了一系列后台复制任务,复制缓存文件到TM本地目录。相关代码片段如下所示:
// ...
// next, kick off the background copying of files for the distributed cache
try {
// 从configuration中读取DistributedCache中所有缓存的文件信息
for (Map.Entry entry :
DistributedCache.readFileInfoFromConfig(jobConfiguration)) {
LOG.info("Obtaining local cache file for '{}'.", entry.getKey());
// 将缓存的文件下载到缓存目录中
// 在专用的线程池中运行
// 会自动判断文件是否具有BlobKey,选择从BlobServer下载还是从远程文件系统中下载
Future cp =
fileCache.createTmpFile(
entry.getKey(), entry.getValue(), jobId, executionId);
// 将下载任务Future放置到distributedCacheEntries集合中
distributedCacheEntries.put(entry.getKey(), cp);
}
} catch (Exception e) {
throw new Exception(
String.format(
"Exception while adding files to distributed cache of task %s (%s).",
taskNameWithSubtask, executionId),
e);
}
// ...
然后Task
将distributedCacheEntries
传入到RuntimeEnvironment
中:
// distributedCacheEntries传入到RuntimeEnvironment中
Environment env =
new RuntimeEnvironment(
jobId,
vertexId,
executionId,
executionConfig,
taskInfo,
jobConfiguration,
taskConfiguration,
userCodeClassLoader,
memoryManager,
ioManager,
broadcastVariableManager,
taskStateManager,
aggregateManager,
accumulatorRegistry,
kvStateRegistry,
inputSplitProvider,
distributedCacheEntries,
consumableNotifyingPartitionWriters,
inputGates,
taskEventDispatcher,
checkpointResponder,
operatorCoordinatorEventGateway,
taskManagerConfig,
metrics,
this,
externalResourceInfoProvider);
Environment
接下来从task传入AbstractStreamOperator
中,然后传递给StreamingRuntimeContext
。
我们查看AbstractStreamOperator
的setup
方法:
final Environment environment = containingTask.getEnvironment();
// ...
this.runtimeContext =
new StreamingRuntimeContext(
environment,
environment.getAccumulatorRegistry().getUserMap(),
getMetricGroup(),
getOperatorID(),
getProcessingTimeService(),
null,
environment.getExternalResourceInfoProvider());
其中新建了一个StreamingRuntimeContext
,传入了environment
。
StreamingRuntimeContext
的构造函数如下所示:
public StreamingRuntimeContext(
Environment env,
Map> accumulators,
OperatorMetricGroup operatorMetricGroup,
OperatorID operatorID,
ProcessingTimeService processingTimeService,
@Nullable KeyedStateStore keyedStateStore,
ExternalResourceInfoProvider externalResourceInfoProvider) {
super(
checkNotNull(env).getTaskInfo(),
env.getUserCodeClassLoader(),
env.getExecutionConfig(),
accumulators,
env.getDistributedCacheEntries(),
operatorMetricGroup);
this.taskEnvironment = env;
this.streamConfig = new StreamConfig(env.getTaskConfiguration());
this.operatorUniqueID = checkNotNull(operatorID).toString();
this.processingTimeService = processingTimeService;
this.keyedStateStore = keyedStateStore;
this.externalResourceInfoProvider = externalResourceInfoProvider;
}
它的父类为AbstractRuntimeUDFContext
,继续跟踪它的构造函数,代码如下:
public AbstractRuntimeUDFContext(
TaskInfo taskInfo,
UserCodeClassLoader userCodeClassLoader,
ExecutionConfig executionConfig,
Map> accumulators,
Map> cpTasks,
OperatorMetricGroup metrics) {
this.taskInfo = checkNotNull(taskInfo);
this.userCodeClassLoader = userCodeClassLoader;
this.executionConfig = executionConfig;
this.distributedCache = new DistributedCache(checkNotNull(cpTasks));
this.accumulators = checkNotNull(accumulators);
this.metrics = metrics;
}
Task创建的copytask终于传递到了AbstractRuntimeUDFContext
中,重新包装到distributedCache
之内。
DistributedCache
构造函数如下所示:
public DistributedCache(Map> cacheCopyTasks) {
this.cacheCopyTasks = cacheCopyTasks;
}
最终,DistributedCache
将copyTask缓存到cacheCopyTasks
。
用户代码获取DistributedCache中缓存的文件
用户算子需要继承RichXXXFunction
。RichXXXFunction
可以通过RuntimeContext
的getDistributedCache
方法获取DistributedCache
,然后读取需要的内容。示例代码如下:
val demoFile = getRuntimeContext.getDistributedCache.getFile("demo")
这里的getRuntimeContext
获取的正是AbstractRuntimeUDFContext
对象,它的getDistributedCache
返回了distributedCache
对象。
@Override
public DistributedCache getDistributedCache() {
return this.distributedCache;
}
从缓存读取文件的逻辑位于getFile
方法。它等待文件从BlobServer或者远程文件系统复制到TM本地之后,返回文件位于本地存储的路径。代码如下:
public File getFile(String name) {
if (name == null) {
throw new NullPointerException("name must not be null");
}
Future future = cacheCopyTasks.get(name);
if (future == null) {
throw new IllegalArgumentException(
"File with name '"
+ name
+ "' is not available."
+ " Did you forget to register the file?");
}
try {
// 阻塞等待后台文件复制工作结束之后,获取复制后文件所在路径
final Path path = future.get();
// 获取合规的URI,包含scheme,authority和path,然后返回
URI tmp = path.makeQualified(path.getFileSystem()).toUri();
return new File(tmp);
} catch (ExecutionException e) {
throw new RuntimeException("An error occurred while copying the file.", e.getCause());
} catch (Exception e) {
throw new RuntimeException(
"Error while getting the file registered under '"
+ name
+ "' from the distributed cache",
e);
}
}
本博客为作者原创,欢迎大家参与讨论和批评指正。如需转载请注明出处。