Apache Hudi初探(四)(与flink的结合)--Flink Sql中hudi的createDynamicTableSource/createDynamicTableSink/是怎么被调用

背景

本篇文章主要是结合hui中涉及到的HoodieTableFactoryHoodieCatalogFactory来说明一下Flink中createDynamicTableSource/createDynamicTableSink/createCatalog是什么时候被调用的

闲说杂谈

先上图:
Apache Hudi初探(四)(与flink的结合)--Flink Sql中hudi的createDynamicTableSource/createDynamicTableSink/是怎么被调用_第1张图片

createDynamicTableSink调用逻辑

最主要的逻辑还是在PlannerBasetranslate的方法中:

  override def translate(
      modifyOperations: util.List[ModifyOperation]): util.List[Transformation[_]] = {
    validateAndOverrideConfiguration()
    if (modifyOperations.isEmpty) {
      return List.empty[Transformation[_]]
    }

    val relNodes = modifyOperations.map(translateToRel)
    val optimizedRelNodes = optimize(relNodes)
    val execGraph = translateToExecNodeGraph(optimizedRelNodes)
    val transformations = translateToPlan(execGraph)
    cleanupInternalConfigurations()
    transformations
  }

以上逻辑是SQL转换为Flink transformation的流程,对应的图为:

Apache Hudi初探(四)(与flink的结合)--Flink Sql中hudi的createDynamicTableSource/createDynamicTableSink/是怎么被调用_第2张图片

对应到我们这里的调用流程为:

modifyOperations.translateToRel
      ||
      \/
planBase.getTableSink
      ||
      \/
factoryUtil.createTableSink
      ||
      \/
HoodieTableFactory.createDynamicTableSink

也说该方法的调用是在逻辑生成阶段的.(createDynamicTableSource方法的调用逻辑也是一样的)

createCatalog(hudi)调用逻辑

Flink中要创建并且使用自定义的catalog可以通过如下方式:

// java中 
  tableEnv.registerCatalog("myhive", catalog);
// sql中
CREATE CATALOG hoodie_catalog
WITH (
  'type'='hudi',
  'catalog.path' = '${catalog root path}', -- only valid if the table options has no explicit declaration of table path
  'hive.conf.dir' = '${dir path where hive-site.xml is located}',
  'mode'='hms' -- also support 'dfs' mode so that all the table metadata are stored with the filesystem
);

对应到SQL中的调用逻辑为

TableEnvironmentImpl.executeInternal
      ||
      \/
TableEnvironmentImpl.createCatalog
      ||
      \/
FactoryUtil.createCatalog
      ||
      \/
HoodieCatalogFactory.createCatalog

最终会调用catalogManager.registerCatalog方法,用catalogManager管理了起来,这样在用到的时候就会调用该get方法得到对应的catalog

其他

  • 自定义source和sink参考Flink官网
  • 自定义catalog以及使用参考Flink官网

你可能感兴趣的:(hudi,flink,apache,flink,sql,hudi)