Mysql 表结构 到 hive 表结构的转换 (DDL自动生成)

最近在做mysql 入hive数仓的工作,由于业务表数量较大,单独写hive DDL太过耗时,就找到了如下方法。

准备一张维度表:dim_ddl_convert,建表语句如下:

CREATE TABLE
    dim_ddl_convert
    (
        source VARCHAR(100) NOT NULL,
        data_type1 VARCHAR(100) NOT NULL,
        target VARCHAR(100) NOT NULL,
        data_type2 VARCHAR(100),
        update_time VARCHAR(26),
        PRIMARY KEY (source, data_type1, target)
    )
    ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='数据库表结构转换';

导入数据MYSQL数据类型与HIVE数据类型的映射关系,结果如下:

INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'bigint', 'hive', 'BIGINT', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'bigint', 'odps', 'BIGINT', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'binary', 'hive', 'BINARY', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'binary', 'odps', 'BINARY', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'char', 'hive', 'STRING', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'char', 'odps', 'STRING', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'datetime', 'hive', 'STRING', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'datetime', 'odps', 'DATETIME', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'decimal', 'hive', 'DOUBLE', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'decimal', 'odps', 'DOUBLE', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'double', 'hive', 'DOUBLE', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'double', 'odps', 'DOUBLE', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'float', 'hive', 'DOUBLE', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'float', 'odps', 'DOUBLE', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'int', 'hive', 'INT', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'int', 'odps', 'BIGINT', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'json', 'hive', 'MAP', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'json', 'odps', 'MAP', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'mediumtext', 'hive', 'STRING', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'mediumtext', 'odps', 'STRING', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'smallint', 'hive', 'INT', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'smallint', 'odps', 'INT', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'text', 'hive', 'STRING', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'text', 'odps', 'STRING', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'time', 'hive', 'STRING', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'time', 'odps', 'STRING', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'timestamp', 'hive', 'STRING', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'timestamp', 'odps', 'DATETIME', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'tinyint', 'hive', 'INT', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'tinyint', 'odps', 'INT', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varbinary', 'hive', 'BINARY', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varbinary', 'odps', 'BINARY', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varchar', 'db2', 'varchar', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varchar', 'hive', 'STRING', '2019-07-31 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varchar', 'odps', 'STRING', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varchar', 'oracle', 'varchar', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varchar', 'sqlserver', 'varchar', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'varchar', 'sybase', 'varchar', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'enum', 'hive', 'string', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'longtext', 'hive', 'string', '2019-05-06 00:00:00');
INSERT INTO dim_ddl_convert (source, data_type1, target, data_type2, update_time) VALUES ('mysql', 'date', 'hive', 'string', '2019-05-06 00:00:00');

打开SQL查询工具,执行以下转换查询语句:

SET SESSION group_concat_max_len = 102400;
SELECT
    a.TABLE_NAME ,
    b.TABLE_COMMENT ,
    concat('DROP TABLE IF EXISTS ',a.TABLE_NAME,';CREATE EXTERNAL TABLE IF NOT EXISTS ',a.TABLE_NAME ,' (',group_concat(concat(a.COLUMN_NAME,' ',
    c.data_type2," COMMENT '",COLUMN_COMMENT,"'") order by a.TABLE_NAME,a.ORDINAL_POSITION) ,
    ") COMMENT '",b.TABLE_COMMENT ,"' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS orc;") AS col_name
FROM
    (
        SELECT
            TABLE_SCHEMA,
            TABLE_NAME,
            COLUMN_NAME,
            ORDINAL_POSITION,
            DATA_TYPE,
            COLUMN_COMMENT
        FROM
            information_schema.COLUMNS
        WHERE
            TABLE_SCHEMA='你的库名'
        ) AS a
LEFT JOIN
    information_schema.TABLES AS b
ON
    a.TABLE_NAME=b.TABLE_NAME
AND a.TABLE_SCHEMA=b.TABLE_SCHEMA
#选择源为mysql,目标为hive
LEFT JOIN
    (
	    select
	    *
	    from dim_ddl_convert
	    where source='mysql' and target='hive'
    ) AS c
ON
    a.DATA_TYPE=c.data_type1
where b.TABLE_TYPE='BASE TABLE'
  and a.TABLE_NAME not like 'dim_%'
GROUP BY
    a.TABLE_NAME,
    b.TABLE_COMMENT
;

到此,就可以得到生成的hive DDL语句了,其他异构数据源类似同样的操作。

注意: 必须设置 group_concat_max_len, 否则拼接的DDL语句不全。该设置默认是1024
可以设置为 SET SESSION group_concat_max_len = 10240; 设置当前session的group_concat长度,其他session连接不受影响
SET GLOBAL group_concat_max_len = 10240; 设置全局group_concat长度

你可能感兴趣的:(hive,hive)