所有有关Flink作业执行的介绍都包含以下的这个流程,今天我们就是实战一些这些转换是如何完成的?
StreamGraph
Class representing the streaming topology. It contains all the information necessary to build the jobgraph for the execution. 这个类表示流处理的拓扑结构,包含构造JobGraph
的所有信息,从而满足任务执行。JobGraph
: JobGraph
代表Flink
的 dataflow
程序,处于JobManager
接受的底层。来自更高级别API
的所有程序都将转换为JobGraphs
。在此之前,都是在client里面进行运行的。并且可以根据ExplainPlan获取执行计划ExecutionGraph
: ExecutionGraph
是JobGraph的并行化版本,是调度层(Schduler)最核心的数据结构。Operator : 算子,理解为function定义。
Transformation: 转换,包含输入、算子、与输出,理解为一个完整的流程,function runtime。
flink-examples-streaming
工程下面的org.apache.flink.streaming.examples.wordcount.WordCount
public static void main(String[] args) throws Exception {
// Checking input parameters
final MultipleParameterTool params = MultipleParameterTool.fromArgs(args);
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params);
// get input data
DataStream<String> text = null;
if (params.has("input")) {
// union all the inputs from text files
for (String input : params.getMultiParameterRequired("input")) {
if (text == null) {
text = env.readTextFile(input);
} else {
text = text.union(env.readTextFile(input));
}
}
Preconditions.checkNotNull(text, "Input DataStream should not be null.");
} else {
System.out.println("Executing WordCount example with default input data set.");
System.out.println("Use --input to specify file input.");
// get default test text data
text = env.fromElements(WordCountData.WORDS);
}
DataStream<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.keyBy(0).sum(1);
// emit result
if (params.has("output")) {
counts.writeAsText(params.get("output"));
} else {
System.out.println("Printing result to stdout. Use --output to specify output path.");
counts.print();
}
// execute program
env.execute("Streaming WordCount");
}
text = env.fromElements(WordCountData.WORDS);
DataStream> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.keyBy(0).sum(1);
初始化是之前当前程序的定义,主要是整个数据流的定义元数据收集
DataStream#doTransform
(2)创建Transformation(包含输入实例、算子 、输出)
(3)创建结果流
(4)添加算子到当前上下文
public void addOperator(Transformation> transformation) {
Preconditions.checkNotNull(transformation, "transformation must not be null.");
this.transformations.add(transformation);
}
(5)返回结果流
sum还是在当前的dataStream流上面。
需要注意的是,有些 transform 操作并不会生成StreamNode 如 PartitionTransformtion,而是生成个虚拟节点。
调用transform即可flatmap一样。会把自己作为transformations。
env.execute("Streaming WordCount");
===>
public JobExecutionResult execute(String jobName) throws Exception {
Preconditions.checkNotNull(jobName, "Streaming Job name should not be null.");
return execute(getStreamGraph(jobName));
}
public StreamGraph getStreamGraph(String jobName, boolean clearTransformations) {
StreamGraph streamGraph = getStreamGraphGenerator().setJobName(jobName).generate();
if (clearTransformations) {
this.transformations.clear();
}
return streamGraph;
}
Generate全图
拓扑图的生成逻辑,循环处理每一个节点
Generate Transformation
这里对操作符的类型进行判断,并以此调用相应的处理逻辑.简而言之,
处理的核心:是递归的将该节点和节点的上游节点加入图
private Collection<Integer> transform(Transformation<?> transform) {
if (alreadyTransformed.containsKey(transform)) {
return alreadyTransformed.get(transform);
}
LOG.debug("Transforming " + transform);
if (transform.getMaxParallelism() <= 0) {
// if the max parallelism hasn't been set, then first use the job wide max parallelism
// from the ExecutionConfig.
int globalMaxParallelismFromConfig = executionConfig.getMaxParallelism();
if (globalMaxParallelismFromConfig > 0) {
transform.setMaxParallelism(globalMaxParallelismFromConfig);
}
}
// call at least once to trigger exceptions about MissingTypeInfo
transform.getOutputType();
Collection<Integer> transformedIds;
if (transform instanceof OneInputTransformation<?, ?>) {
transformedIds = transformOneInputTransform((OneInputTransformation<?, ?>) transform);
} else if (transform instanceof TwoInputTransformation<?, ?, ?>) {
transformedIds = transformTwoInputTransform((TwoInputTransformation<?, ?, ?>) transform);
} else if (transform instanceof SourceTransformation<?>) {
// .......... 省略
// need this check because the iterate transformation adds itself before
// transforming the feedback edges
if (!alreadyTransformed.containsKey(transform)) {
alreadyTransformed.put(transform, transformedIds);
}
if (transform.getBufferTimeout() >= 0) {
streamGraph.setBufferTimeout(transform.getId(), transform.getBufferTimeout());
} else {
streamGraph.setBufferTimeout(transform.getId(), defaultBufferTimeout);
}
if (transform.getUid() != null) {
streamGraph.setTransformationUID(transform.getId(), transform.getUid());
}
if (transform.getUserProvidedNodeHash() != null) {
streamGraph.setTransformationUserHash(transform.getId(), transform.getUserProvidedNodeHash());
}
if (!streamGraph.getExecutionConfig().hasAutoGeneratedUIDsEnabled()) {
if (transform instanceof PhysicalTransformation &&
transform.getUserProvidedNodeHash() == null &&
transform.getUid() == null) {
throw new IllegalStateException("Auto generated UIDs have been disabled " +
"but no UID or hash has been assigned to operator " + transform.getName());
}
}
if (transform.getMinResources() != null && transform.getPreferredResources() != null) {
streamGraph.setResources(transform.getId(), transform.getMinResources(), transform.getPreferredResources());
}
streamGraph.setManagedMemoryWeight(transform.getId(), transform.getManagedMemoryWeight());
return transformedIds;
}
创建节点并加入图
执行
PipelinExecutor
execute 执行
生成jobGraph
Job Graph生成逻辑
可以看到上面 key-aggregate-sink合并为了一个chain。
JobGraph 对象结构如上图所示,taskVertices 中只存在三个 TaskVertex,Sink operator 被嵌到 Keyed operator 中去了。
这里相当于把本地作业提交到集群上面。就是一个作业元数据上传的过程。底层的RestClient使用的是Netty。
ClientUtils#submitJob
public static JobExecutionResult submitJob(
ClusterClient> client,
JobGraph jobGraph) throws ProgramInvocationException {
checkNotNull(client);
checkNotNull(jobGraph);
try {
return client
.submitJob(jobGraph)
.thenApply(DetachedJobExecutionResult::new)
.get();
} catch (InterruptedException | ExecutionException e) {
ExceptionUtils.checkInterrupted(e);
throw new ProgramInvocationException("Could not run job in detached mode.", jobGraph.getJobID(), e);
}
}
通过 RestClusterClient 提交到集群上
public CompletableFuture submitJob(@Nonnull JobGraph jobGraph) {
// 生成二进制包,用于作业恢复
CompletableFuture jobGraphFileFuture = CompletableFuture.supplyAsync(() -> {
try {
final java.nio.file.Path jobGraphFile = Files.createTempFile("flink-jobgraph", ".bin");
try (ObjectOutputStream objectOut = new ObjectOutputStream(Files.newOutputStream(jobGraphFile))) {
objectOut.writeObject(jobGraph);
}
return jobGraphFile;
} catch (IOException e) {
throw new CompletionException(new FlinkException("Failed to serialize JobGraph.", e));
}
}, executorService);
// 查询所有需要上传的包
CompletableFuture>> requestFuture = jobGraphFileFuture.thenApply(jobGraphFile -> {
List jarFileNames = new ArrayList<>(8);
List artifactFileNames = new ArrayList<>(8);
Collection filesToUpload = new ArrayList<>(8);
filesToUpload.add(new FileUpload(jobGraphFile, RestConstants.CONTENT_TYPE_BINARY));
for (Path jar : jobGraph.getUserJars()) {
jarFileNames.add(jar.getName());
filesToUpload.add(new FileUpload(Paths.get(jar.toUri()), RestConstants.CONTENT_TYPE_JAR));
}
for (Map.Entry artifacts : jobGraph.getUserArtifacts().entrySet()) {
final Path artifactFilePath = new Path(artifacts.getValue().filePath);
try {
// Only local artifacts need to be uploaded.
if (!artifactFilePath.getFileSystem().isDistributedFS()) {
artifactFileNames.add(new JobSubmitRequestBody.DistributedCacheFile(artifacts.getKey(), artifactFilePath.getName()));
filesToUpload.add(new FileUpload(Paths.get(artifacts.getValue().filePath), RestConstants.CONTENT_TYPE_BINARY));
}
} catch (IOException e) {
throw new CompletionException(
new FlinkException("Failed to get the FileSystem of artifact " + artifactFilePath + ".", e));
}
}
final JobSubmitRequestBody requestBody = new JobSubmitRequestBody(
jobGraphFile.getFileName().toString(),
jarFileNames,
artifactFileNames);
return Tuple2.of(requestBody, Collections.unmodifiableCollection(filesToUpload));
});
// 上传jar包,提交作业
final CompletableFuture submissionFuture = requestFuture.thenCompose(
requestAndFileUploads -> sendRetriableRequest(
JobSubmitHeaders.getInstance(),
EmptyMessageParameters.getInstance(),
requestAndFileUploads.f0,
requestAndFileUploads.f1,
isConnectionProblemOrServiceUnavailable())
);
// 删除生成的jobGraph文件
submissionFuture
.thenCombine(jobGraphFileFuture, (ignored, jobGraphFile) -> jobGraphFile)
.thenAccept(jobGraphFile -> {
try {
Files.delete(jobGraphFile);
} catch (IOException e) {
LOG.warn("Could not delete temporary file {}.", jobGraphFile, e);
}
});
// 返回jobId
return submissionFuture
.thenApply(ignore -> jobGraph.getJobID())
.exceptionally(
(Throwable throwable) -> {
throw new CompletionException(new JobSubmissionException(jobGraph.getJobID(), "Failed to submit JobGraph.", ExceptionUtils.stripCompletionException(throwable)));
});
}
// 提交到集群
private , U extends MessageParameters, R extends RequestBody, P extends ResponseBody> CompletableFuture
sendRetriableRequest(M messageHeaders, U messageParameters, R request, Collection filesToUpload, Predicate retryPredicate) {
return retry(() -> getWebMonitorBaseUrl().thenCompose(webMonitorBaseUrl -> {
try {
return restClient.sendRequest(webMonitorBaseUrl.getHost(), webMonitorBaseUrl.getPort(), messageHeaders, messageParameters, request, filesToUpload);
} catch (IOException e) {
throw new CompletionException(e);
}
}), retryPredicate);
}
属性
ExecutionGraphBuilder# buildGraph逻辑
create a new execution graph, if none exists so far
set the basic properties
initialize the vertices that have a master initialization hook
file output formats create directories here, input formats create splits
topologically sort the job vertices and attach the graph to the existing one
configure the state checkpointing
create all the metrics for the Execution Graph
JobMaster
调度器执行
resetAndStartScheduler =》
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jC5iQsH7-1592732400299)(images/image-20200621172505926.png)]
https://ververica.cn/developers/advanced-tutorial-2-flink-job-execution-depth-analysis/ Flink 作业执行深度解析