Datax从Hive中导出数据写入到MySQL

Datax从Hive中导出数据写入到MySQL

    • 读取Hive数据时,Datax内部支持的数据类型
    • 脏数据错误类型的解决方法

读取Hive数据时,Datax内部支持的数据类型

Hive1.2.x 版本支持的数据类型已经很丰富了,但是Datax目前还支持不了这么多的数据类型,如果不注意的话,将会抛出很多奇怪的问题:比如 脏数据 的问题。

DataX 内部类型 Hive表 数据类型
Long TINYINT,SMALLINT,INT,BIGINT
Double FLOAT,DOUBLE
String String,CHAR,VARCHAR,STRUCT,MAP,ARRAY,UNION,BINARY
Boolean BOOLEAN
Date Date,TIMESTAMP

这里在官网中也有说明:传送门

脏数据错误类型的解决方法

通常在数仓中,为保证计算数据的准确性,我们会设计一些精度较高的数据类型,比如timestamp。
但当我们借助datax-web来自动完成字段映射的时候,通常不会将timestamp修改为datax内容支持的date类型,这个时候需要手动修改json中对应的字段为date。

2020-05-08 18:18:10.284 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.285 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.286 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.286 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.287 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.287 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.288 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.289 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.289 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
{"message":"No enum constant com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.Type.TIMESTAMP","record":[],"type":"reader"}
2020-05-08 18:18:10.293 [0-0-0-reader] INFO  Reader$Task - end read source files...

这里就需要将json中的timestamp类型修改为date类型处理,如果精度很高的话,可以将类型修改为String。

如果还有其他脏数据类型的错误,可以参考上面提到的datax内部支持的数据类型,将字段调整为支持的数据类型即可解决脏数据类型错误。

附上脱敏后的json文件:

{
    "job": {
        "setting": {
            "speed": {
                "channel": 3
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "hdfsreader",
                    "parameter": {
                        "hadoopConfig": {
                            "dfs.nameservices": "nameservice1",
                            "dfs.ha.namenodes.nameservice1": "namenode177,namenode238",
                            "dfs.namenode.rpc-address.nameservice1.namenode177": "dataware-1:8020",
                            "dfs.namenode.rpc-address.nameservice1.namenode238": "dataware-2:8020",
                            "dfs.client.failover.proxy.provider.nameservice1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
                        },
                        "defaultFS": "hdfs://nameservice1",
                        "fileType": "orc",
                        "path": "/data1/user/hive/warehouse/result_temp",
                        "writeMode": "append",
                        "fieldDelimiter": ",",
                        "column": [
                            {
                                "index": "0",
                                "type": "string"
                            },
                            {
                                "index": "1",
                                "type": "string"
                            },
                            {
                                "index": "2",
                                "type": "string"
                            },
                            {
                                "index": "3",
                                "type": "string"
                            },
                            {
                                "index": "4",
                                "type": "string"
                            },
                            {
                                "index": "5",
                                "type": "string"
                            },
                            {
                                "index": "6",
                                "type": "string"
                            },
                            {
                                "index": "7",
                                "type": "string"
                            },
                            {
                                "index": "8",
                                "type": "string"
                            },
                            {
                                "index": "9",
                                "type": "string"
                            },
                            {
                                "index": "10",
                                "type": "string"
                            },
                            {
                                "index": "11",
                                "type": "string"
                            },
                            {
                                "index": "12",
                                "type": "string"
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "username": "xxxx",
                        "password": "xx",
                        "column": [
                            "column1",
                            "column2",
                            "column3",
                            "column4",
                            "column5",
                            "column6",
                            "column7",
                            "column8",
                            "column9",
                            "column10",
                            "column11",
                            "column12",
                            "column13"
                        ],
                        "preSql": [
                            ""
                        ],
                        "connection": [
                            {
                                "table": [
                                    "rpt_stat_result"
                                ],
                                "jdbcUrl": "jdbc:mysql://xx.xx.xx.xx:3306/xxx?useUnicode=true&characterEncoding=utf8"
                            }
                        ]
                    }
                }
            }
        ]
    }
}

你可能感兴趣的:(Datax)