OpenMetadata 获取 MySQL 数据库表血缘关系详解

概述

OpenMetadata 是一个开源的元数据管理平台,支持端到端的血缘关系追踪。对于 MySQL 数据库,OpenMetadata 通过解析表的外键约束、视图定义及查询日志(可选)构建表级血缘。本文结合源码分析其实现机制。


环境配置与数据摄取

1. 配置文件示例(YAML)

source:
  type: mysql
  serviceName: mysql_dev
  serviceConnection:
    config:
      type: Mysql
      username: admin
      password: pass
      hostPort: localhost:3306
      databaseSchema: sales_db
  sourceConfig:
    config:
      includeViews: true
      includeTables: true
      markDeletedTables: true
      lineageQuery: "SELECT * FROM information_schema.views WHERE view_definition LIKE '%{table}%';"
sink:
  type: metadata-rest
  config: {
   }
workflowConfig:
  openMetadataServerConfig:
    hostPort: "http://localhost:8585/api"
    authProvider: openmetadata
    securityConfig:
      jwtToken: "token"

2. 关键配置项

  • lineageQuery: 自定义血缘分析 SQL(可选)
  • includeViews: 是否解析视图血缘
  • markDeletedTables: 处理已删除表

源码解析与核心流程

1. 入口类:MysqlSource

路径:openmetadata-ingestion/src/metadata/ingestion/source/database/mysql/connection.py

class MysqlSource(RDBMSSource):
    def __init__(self, config: WorkflowSource, metadata_config: OpenMetadataConnection):
        super().__init__(config, metadata_config)
        self.connection = MysqlConnection(config.serviceConnection.__root__.config)

2. 血缘提取核心方法

路径:openmetadata-ingestion/src/metadata/ingestion/source/database/common_db_source.py

class CommonDbSourceService(ABC):
    def process_table_lineage(self, table: Table) -> None:
        # 通过外键解析直接血缘
        for column in table.columns:
            if column.foreignKeys:
                self._build_foreign_key_lineage(table, column)
        
        # 通过视图定义解析
        if self.config.sourceConfig.config.includeViews:
            view_def = self._get_view_definition(table.name)
            self._parse_view_lineage(view_def, table)

3. SQL 解析器

路径:openmetadata-ingestion/src/metadata/ingestion/source/database/lineage/parser.py

class LineageParser:
    @staticmethod
    def parse(sql: str) -> List[LineageEdge]:
        # 使用 ANTLR 解析 SQL,生成语法树
        parser = SqlLineageParser(sql)
        return parser.get_lineage_edges()

4. 流程图

你可能感兴趣的:(数据治理,数据库,mysql,元数据治理)