Sqoop抽取MySQL中tinyint类型数据只有0和1的问题

问题描述

使用Sqoop抽取MySQL数据到Hive时,会发现MySQL中tinyint类型的数据在抽到Hive中只有0和1,其他的数值都被替换了,很诡异。其实Sqoop的官方文档给出了解释并给出了解决方案:
27.2.5. MySQL: Import of TINYINT(1) from MySQL behaves strangely
Problem: Sqoop is treating TINYINT(1) columns as booleans, which is for example causing issues with HIVE import. This is because by default the MySQL JDBC connector maps the TINYINT(1) to java.sql.Types.BIT, which Sqoop by default maps to Boolean.

Solution: A more clean solution is to force MySQL JDBC Connector to stop converting TINYINT(1) to java.sql.Types.BIT by adding tinyInt1isBit=false into your JDBC path (to create something like jdbc:mysql://localhost/test?tinyInt1isBit=false). Another solution would be to explicitly override the column mapping for the datatype TINYINT(1) column. For example, if the column name is foo, then pass the following option to Sqoop during import: --map-column-hive foo=tinyint. In the case of non-Hive imports to HDFS, use --map-column-java foo=integer.

问题:
Sqoop在抽取数据到Hive或者HDFS时,会自动将类型为Tinyint(1)的列转为boolean类型,这就是导致抽取到Hive或HDFS中的数据中只有0和1的原因。因为默认情况下,MySQL JDBC connector 会将Tinyint(1)映射为java.sql.Types.BIT类型,而Sqoop默认会映射为Boolean类型。

解决方案:
一个比较简单的解决方案是在MySQL JDBC connector上添加tinyInt1isBit=false,比如:jdbc:mysql://localhost/test?tinyInt1isBit=false)。
另外一种解决方案是显式覆盖数据类型TINYINT(1)列的映射,例如,如果列名为foo,则在导入期间将以下选项传递给Sqoop:–map-column-hive foo=tinyint。在非Hive导入HDFS的情况下,使用 --map-column-java foo=Integer。

你可能感兴趣的:(Sqoop)